Streaming support for SageMaker Endpoints| S02 E04 | Build On Generative AI

Sometimes you just do not want to wait for the LLM to generate all the text, and then read it. Maybe you want to read it as it is generating. Well, fear no more - you now do that on Amazon SageMaker thanks to the new streaming feature

AWS Admin
Amazon Employee
Published Sep 11, 2023
Last Modified Jun 25, 2024
Before we start, if you are intersted in Cost Optimization, make sure to check out our Build On Live show that is happening on the 28th of September 2023, at 8AM PST, LIVE 🟣 right here on https://twitch.tv/aws. More information (and a funny video) can be found HERE
Architecture diagram of how this works
Architecture diagram of how this thing works
In today's episode Darko is joined by Raghu, as they explore the wonderfull world of Amazon SageMaker hosting endpoints. This time, looking into the brand new feature - Streaming. No, not that kind of streaming, rather it's the kind where we get to stream the respones out of a LLM. This means that instead of waiting for the LLM to generate the response and then get the whole response back to the user, we are streaming the response as it is being made. Definitely making the experience working with a LLM hosted by SageMaker that much better.
To enable this, just make sure your serving.properties file contains something like this:
The last option option.enable_streaming=true is the one that does the magic 🪄
Check out the recording here:

Links from today's episode

Reach out to the hosts and guests:

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.