Streaming support for SageMaker Endpoints| S02 E04 | Build On Generative AI
Sometimes you just do not want to wait for the LLM to generate all the text, and then read it. Maybe you want to read it as it is generating. Well, fear no more - you now do that on Amazon SageMaker thanks to the new streaming feature
serving.properties
file contains something like this:1
2
3
4
5
6
7
8
9
engine=MPI
option.model_id=tiiuae/falcon-7b-instruct
option.trust_remote_code=true
option.tensor_parallel_degree=1
option.max_rolling_batch_size=32
option.rolling_batch=auto
option.output_formatter=jsonlines
option.paged_attention=false
option.enable_streaming=true
option.enable_streaming=true
is the one that does the magic 🪄Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.