hands-on lab

Deploying Large Language Models Using Ray Serve

Beginner

Up to 1h 30m

5/5

Start lab

Get guided in a real environmentPractice with a step-by-step scenario in a real, provisioned environment.

Learn and validateUse validations to check your solutions every step of the way.

See resultsTrack your knowledge and monitor your progress.

Lab description

Ray Serve is a framework for deploying and serving machine learning and large language model (LLM) inference workloads. It is designed to be scalable, support complex multi-model workflows, and efficiently utilize costly resources such as GPUs. The Phi-3 model released by Microsoft is a capable LLM model that has been optimized for use with CPUs and low memory environments.

Learning how to use Ray Serve to deploy a large language model will benefit anyone working with machine learning models and looking to deploy them in a production environment.

In this hands-on lab, you will use a development environment to implement a Ray Serve deployment, and you will run your deployment on a virtual machine.

Learning objectives

Upon completion of this beginner-level lab, you will be able to:

Implement a Ray Serve deployment that allows you to interact with a large language model
Test your deployment on a virtual machine
Deploy your Ray Serve deployment to a Ray cluster

Intended audience

Anyone looking to learn about deploying machine learning models
Cloud Architects
Data Engineers
DevOps Engineers
Machine Learning Engineers
Software Engineers

Prerequisites

Familiarity with the following will be beneficial but is not required:

Large Language Models
The Python programming language

The following content can be used to fulfill the prerequisites:

Environment before

Environment after

About the author

Andrew Burchill, opens in a new tab

Labs Developer

Students

66,988

Labs

166

Courses

Learning paths

Andrew is a Labs Developer with previous experience in the Internet Service Provider, Audio Streaming, and CryptoCurrency industries. He has also been a DevOps Engineer and enjoys working with CI/CD and Kubernetes.

He holds multiple AWS certifications including Solutions Architect Associate and Professional.

Covered topics

Development

Artificial Intelligence

Python

Ray Serve

Lab steps

Implementing a Ray LLM Deployment

Logging In to the Amazon Web Services Console

Connecting to the Virtual Machine Using EC2 Instance Connect

Manually Running Your Deployment

Lab rules apply