AWS | Community | Streaming support for SageMaker Endpoints| S02 E04

Before we start, if you are intersted in Cost Optimization, make sure to check out our Build On Live show that is happening on the 28th of September 2023, at 8AM PST, LIVE 🟣 right here on https://twitch.tv/aws. More information (and a funny video) can be found HERE

Image not found

Architecture diagram of how this thing works

In today's episode Darko is joined by Raghu, as they explore the wonderfull world of Amazon SageMaker hosting endpoints. This time, looking into the brand new feature - Streaming. No, not that kind of streaming, rather it's the kind where we get to stream the respones out of a LLM. This means that instead of waiting for the LLM to generate the response and then get the whole response back to the user, we are streaming the response as it is being made. Definitely making the experience working with a LLM hosted by SageMaker that much better.

To enable this, just make sure your serving.properties file contains something like this:

1
2
3
4
5
6
7
8
9
engine=MPI 
option.model_id=tiiuae/falcon-7b-instruct
option.trust_remote_code=true
option.tensor_parallel_degree=1
option.max_rolling_batch_size=32
option.rolling_batch=auto
option.output_formatter=jsonlines
option.paged_attention=false
option.enable_streaming=true

The last option option.enable_streaming=true is the one that does the magic 🪄

Check out the recording here:

To view this Twitch stream, please accept cookies.

Links from today's episode

Reach out to the hosts and guests:

Raghu: https://www.linkedin.com/in/raghunandannr/
Darko: https://www.linkedin.com/in/darko-mesaros/

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Select your cookie preferences

Site Terms, Privacy, and more.

Streaming support for SageMaker Endpoints| S02 E04 | Build On Generative AI

Sometimes you just do not want to wait for the LLM to generate all the text, and then read it. Maybe you want to read it as it is generating. Well, fear no more - you now do that on Amazon SageMaker thanks to the new streaming feature

Links from today's episode

Comments