Make Programs, not Prompts: DSPy Pipelines with Llama 3 on Amazon SageMaker JumpStart
Learn how to create ML pipelines with DSPy powered by Meta's Llama 3 70B Instruct model running on Amazon SageMaker.
João Galego
Amazon Employee
Published Jul 4, 2024
Last Modified Jul 8, 2024
"If a LLM is like a database of millions of vector programs, then a prompt is like a search query in that database." ― François Chollet, How I think about LLM prompt engineering
In the past few months, I kept hearing so many great things about this new kid on the block framework called DSPy and how it was going to revolutionize the way we interact with language models (LM), that I decided to give it a go.
According to the (mini-)FAQs,
DSPy
is just a fanciful backronym for Declarative Self-improving Language Programs, pythonically, which is now my 3rd favorite backcronym after COLBERT and SPECTRE. And yes, I keep a list...In essence, DSPy replaces prompting with programming by treating LM pipelines as text-transformation graphs. And it does so declaratively (hence the D), replacing how we prompt the LM with what the LM is expected to do.
Take Question Answering (QA) as an example. Traditionally, we would use a prompt template like this one (written as Python f-string)
🤨 If you're wondering about the overabundance of tags<|start_header_id|>
or<|eot_id|>
, just go over the Model Cards & Prompt Formats for Meta Llama 3.
In DSPy, we just say
question -> answer
. Easy-peasy! "What if I want to add a bit of context?" you may ask. No worries, just extend it to context, question -> answer
.As you may have guessed, I'm grossly oversimplifying things here. DSPy introduces a lot of new terminology, which makes things more difficult (for instance, the QA declarations in the previous paragraph are called signatures). For the purposes of this post, we don't have to go that deep.
💡 Don't know where to start? Check out the Using DSPy in 8 Steps chapter (if you're an experience ML practitioner, they'll sound familiar).
🐛 “Every great s̶t̶o̶r̶y̶ demo seems to begin with a s̶n̶a̶k̶e̶ bug.” ― Nicolas Cage
The best way to learn about a new library or framework is to fix something wrong with it.
People often ask me how I decide to work on something and the answer is quite simple: 1/ select an interesting project, 2/ search for keywords like
aws
, bedrock
or sagemaker
, and 3a/ if there's a funny-looking issue, try to fix it; 3b/ otherwise, just try one of the demos (PS: I used to work as software tester and bugs just seem to find me wherever I go). Today, it's all about this one. The crux of the problem is that SageMaker and Bedrock providers work differently. Both have runtime clients (
SagemakerRuntime
and BedrockRuntime
), but the way we engage with the models is a bit different:- API: Bedrock has
InvokeModel
/InvokeModelWithResponseStream
and the newConverse
API, while SageMaker hasInvokeEndpoint
/InvokeEndpointWithResponseStream
and the asynchronousInvokeEndpointAsync
. Behind the scenes, Bedrock is powered by SageMaker endpoints, so the naming shouldn't come as a surprise. - Requests: as we will see later, each accepts a different payload, both in terms of inference parameters, their names and how they're structured, and produces a different response (although, if we correctly map the parameters and fix the model version, the
generated_text
should be the same).
So, what exactly is the issue? Well, the root cause is that for some AWS models make no distinction between SageMaker- and Bedrock-provided models. In the case of
AWSMeta
, SageMaker support is completely absent.So today, I'd like to show you just that. We're going to deploy Meta's Llama 3 70B Instruct model, which is available on Amazon Bedrock, via Amazon SageMaker and then connect it with DSPy.
🎉 Update (08-07-2024): The proposed fix has since been merged into themain
line branch.
Ready to see it in action?
We'll start by installing DSPy
and importing some libraries
Next, we'll use the SageMaker JumpStart SDK to initialize the Llama 3 70B Instruct model
and deploy it
By the way, the SDK includes some useful functions for listing models
and each model comes with its own set of examples
Output:
that we can use to test the endpoint
Equivalently, you can use Boto3 directly to make the call
✋ Time out! Remember when I said that the payloads for SageMaker endpoints and Bedrock models don't have to be the same? Let's look at that first sample in a little more detail...
Now, the equivalent for a Bedrock request
Can you spot the differences? Here's the solution...
and let's check this actually works (yes, it's a recipe for mayo!) 🥣
Output:
Break's over! Back to the main thread...
Once deployment finishes, we can initialize the
SageMaker
providerand pass it along to the model
For this demo, I'm creating a oh-my-god-you're-so-basic QA Chain of Thought (CoT) pipeline
💡 This example comes from the original paper by Khattab et al. (2023). If you need a refresher, the Prompt Engineering Guide has a great introduction to CoT prompting.
and let's run a query to see if it's working
Output:
As you can see, the output includes both the final
answer
and the rationale
behind it.If you're interested, you can also check the actual model invocations by browsing the model's history
Finally, don't forget to clean up everything when you're done 🧹
Thanks for reading and see you next time! 👋
- (Khattab et al., 2023) DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
- (Opsahl-Ong et al., 2024) Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs
- (Zhang et al., 2022) Automatic Chain of Thought Prompting in Large Language Models
- How I think about LLM prompt engineering by François Chollet
- Deploy Llama 3 on Amazon SageMaker by Phil Schmid
- An Introduction To DSPy by Cobus Greyling
- stanfordnlp/DSPy - programming—not prompting—Foundation Models
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.