Make Programs, not Prompts: DSPy Pipelines with Llama 3 on Amazon SageMaker JumpStart

Make Programs, not Prompts: DSPy Pipelines with Llama 3 on Amazon SageMaker JumpStart

Learn how to create ML pipelines with DSPy powered by Meta's Llama 3 70B Instruct model running on Amazon SageMaker.

João Galego
Amazon Employee
Published Jul 4, 2024
Last Modified Jul 8, 2024
"If a LLM is like a database of millions of vector programs, then a prompt is like a search query in that database." ― François Chollet, How I think about LLM prompt engineering
In the past few months, I kept hearing so many great things about this new kid on the block framework called DSPy and how it was going to revolutionize the way we interact with language models (LM), that I decided to give it a go.
According to the (mini-)FAQs, DSPy is just a fanciful backronym for Declarative Self-improving Language Programs, pythonically, which is now my 3rd favorite backcronym after COLBERT and SPECTRE. And yes, I keep a list...
In essence, DSPy replaces prompting with programming by treating LM pipelines as text-transformation graphs. And it does so declaratively (hence the D), replacing how we prompt the LM with what the LM is expected to do.
Take Question Answering (QA) as an example. Traditionally, we would use a prompt template like this one (written as Python f-string)
🤨 If you're wondering about the overabundance of tags <|start_header_id|> or <|eot_id|>, just go over the Model Cards & Prompt Formats for Meta Llama 3.
In DSPy, we just say question -> answer. Easy-peasy! "What if I want to add a bit of context?" you may ask. No worries, just extend it to context, question -> answer.
As you may have guessed, I'm grossly oversimplifying things here. DSPy introduces a lot of new terminology, which makes things more difficult (for instance, the QA declarations in the previous paragraph are called signatures). For the purposes of this post, we don't have to go that deep.
💡 Don't know where to start? Check out the Using DSPy in 8 Steps chapter (if you're an experience ML practitioner, they'll sound familiar).

Demo ✨

🐛 “Every great s̶t̶o̶r̶y̶ demo seems to begin with a s̶n̶a̶k̶e̶ bug.” ― Nicolas Cage
The best way to learn about a new library or framework is to fix something wrong with it.
People often ask me how I decide to work on something and the answer is quite simple: 1/ select an interesting project, 2/ search for keywords like aws, bedrock or sagemaker, and 3a/ if there's a funny-looking issue, try to fix it; 3b/ otherwise, just try one of the demos (PS: I used to work as software tester and bugs just seem to find me wherever I go).
Today, it's all about this one. The crux of the problem is that SageMaker and Bedrock providers work differently. Both have runtime clients (SagemakerRuntime and BedrockRuntime), but the way we engage with the models is a bit different:
  • API: Bedrock has InvokeModel/InvokeModelWithResponseStream and the new Converse API, while SageMaker has InvokeEndpoint/InvokeEndpointWithResponseStream and the asynchronous InvokeEndpointAsync. Behind the scenes, Bedrock is powered by SageMaker endpoints, so the naming shouldn't come as a surprise.
  • Requests: as we will see later, each accepts a different payload, both in terms of inference parameters, their names and how they're structured, and produces a different response (although, if we correctly map the parameters and fix the model version, the generated_text should be the same).
So, what exactly is the issue? Well, the root cause is that for some AWS models make no distinction between SageMaker- and Bedrock-provided models. In the case of AWSMeta, SageMaker support is completely absent.
So today, I'd like to show you just that. We're going to deploy Meta's Llama 3 70B Instruct model, which is available on Amazon Bedrock, via Amazon SageMaker and then connect it with DSPy.
🎉 Update (08-07-2024): The proposed fix has since been merged into the mainline branch.
Ready to see it in action?
We'll start by installing DSPy
and importing some libraries
Next, we'll use the SageMaker JumpStart SDK to initialize the Llama 3 70B Instruct model
and deploy it
By the way, the SDK includes some useful functions for listing models
and each model comes with its own set of examples
that we can use to test the endpoint
Equivalently, you can use Boto3 directly to make the call
✋ Time out! Remember when I said that the payloads for SageMaker endpoints and Bedrock models don't have to be the same? Let's look at that first sample in a little more detail...
Now, the equivalent for a Bedrock request
Can you spot the differences? Here's the solution...
and let's check this actually works (yes, it's a recipe for mayo!) 🥣
Break's over! Back to the main thread...
Once deployment finishes, we can initialize the SageMaker provider
and pass it along to the model
For this demo, I'm creating a oh-my-god-you're-so-basic QA Chain of Thought (CoT) pipeline
💡 This example comes from the original paper by Khattab et al. (2023). If you need a refresher, the Prompt Engineering Guide has a great introduction to CoT prompting.
Let's think step by step... (Source: Zhang et al., 2022)
and let's run a query to see if it's working
As you can see, the output includes both the final answer and the rationale behind it.
If you're interested, you can also check the actual model invocations by browsing the model's history
Finally, don't forget to clean up everything when you're done 🧹
Thanks for reading and see you next time! 👋


Articles 📄

Blogs ✍️

Code 👨‍💻


Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.