Make Programs, not Prompts: DSPy Pipelines with Llama 3 on Amazon SageMaker JumpStart

"If a LLM is like a database of millions of vector programs, then a prompt is like a search query in that database." ― François Chollet, How I think about LLM prompt engineering

In the past few months, I kept hearing so many great things about this new kid on the block framework called DSPy and how it was going to revolutionize the way we interact with language models (LM), that I decided to give it a go.

According to the (mini-)FAQs, DSPy is just a fanciful backronym for Declarative Self-improving Language Programs, pythonically, which is now my 3rd favorite backcronym after COLBERT and SPECTRE. And yes, I keep a list...

In essence, DSPy replaces prompting with programming by treating LM pipelines as text-transformation graphs. And it does so declaratively (hence the D), replacing how we prompt the LM with what the LM is expected to do.

Take Question Answering (QA) as an example. Traditionally, we would use a prompt template like this one (written as Python f-string)

🤨 If you're wondering about the overabundance of tags <|start_header_id|> or <|eot_id|>, just go over the Model Cards & Prompt Formats for Meta Llama 3.

In DSPy, we just say question -> answer. Easy-peasy! "What if I want to add a bit of context?" you may ask. No worries, just extend it to context, question -> answer.

As you may have guessed, I'm grossly oversimplifying things here. DSPy introduces a lot of new terminology, which makes things more difficult (for instance, the QA declarations in the previous paragraph are called signatures). For the purposes of this post, we don't have to go that deep.

💡 Don't know where to start? Check out the Using DSPy in 8 Steps chapter (if you're an experience ML practitioner, they'll sound familiar).

Demo ✨

🐛 “Every great s̶t̶o̶r̶y̶ demo seems to begin with a s̶n̶a̶k̶e̶ bug.” ― Nicolas Cage

The best way to learn about a new library or framework is to fix something wrong with it.

People often ask me how I decide to work on something and the answer is quite simple: 1/ select an interesting project, 2/ search for keywords like aws, bedrock or sagemaker, and 3a/ if there's a funny-looking issue, try to fix it; 3b/ otherwise, just try one of the demos (PS: I used to work as software tester and bugs just seem to find me wherever I go).

Today, it's all about this one. The crux of the problem is that SageMaker and Bedrock providers work differently. Both have runtime clients (SagemakerRuntime and BedrockRuntime), but the way we engage with the models is a bit different:

API: Bedrock has InvokeModel/InvokeModelWithResponseStream and the new Converse API, while SageMaker has InvokeEndpoint/InvokeEndpointWithResponseStream and the asynchronous InvokeEndpointAsync. Behind the scenes, Bedrock is powered by SageMaker endpoints, so the naming shouldn't come as a surprise.
Requests: as we will see later, each accepts a different payload, both in terms of inference parameters, their names and how they're structured, and produces a different response (although, if we correctly map the parameters and fix the model version, the generated_text should be the same).

So, what exactly is the issue? Well, the root cause is that for some AWS models make no distinction between SageMaker- and Bedrock-provided models. In the case of AWSMeta, SageMaker support is completely absent.

So today, I'd like to show you just that. We're going to deploy Meta's Llama 3 70B Instruct model, which is available on Amazon Bedrock, via Amazon SageMaker and then connect it with DSPy.

🎉 Update (08-07-2024): The proposed fix has since been merged into the mainline branch.

Ready to see it in action?

We'll start by installing DSPy

and importing some libraries

Next, we'll use the SageMaker JumpStart SDK to initialize the Llama 3 70B Instruct model

and deploy it

By the way, the SDK includes some useful functions for listing models

and each model comes with its own set of examples

Output:

that we can use to test the endpoint

Equivalently, you can use Boto3 directly to make the call

✋ Time out! Remember when I said that the payloads for SageMaker endpoints and Bedrock models don't have to be the same? Let's look at that first sample in a little more detail...

Now, the equivalent for a Bedrock request

Can you spot the differences? Here's the solution...

and let's check this actually works (yes, it's a recipe for mayo!) 🥣

Output:

Break's over! Back to the main thread...

Once deployment finishes, we can initialize the SageMaker provider

and pass it along to the model

For this demo, I'm creating a oh-my-god-you're-so-basic QA Chain of Thought (CoT) pipeline

💡 This example comes from the original paper by Khattab et al. (2023). If you need a refresher, the Prompt Engineering Guide has a great introduction to CoT prompting.

Let's think step by step... (Source: Zhang et al., 2022)

and let's run a query to see if it's working

Output:

As you can see, the output includes both the final answer and the rationale behind it.

If you're interested, you can also check the actual model invocations by browsing the model's history

Finally, don't forget to clean up everything when you're done 🧹

Thanks for reading and see you next time! 👋

References

Articles 📄

(Khattab et al., 2023) DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
(Opsahl-Ong et al., 2024) Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs
(Zhang et al., 2022) Automatic Chain of Thought Prompting in Large Language Models

Blogs ✍️

(AWS ML Blog) Deploy and fine-tune foundation models in Amazon SageMaker JumpStart with two lines of code
(AWS ML Blog) Meta Llama 3 models are now available in Amazon SageMaker JumpStart
How I think about LLM prompt engineering by François Chollet
Deploy Llama 3 on Amazon SageMaker by Phil Schmid
Building Robust AI Systems with DSPy and Amazon Bedrock: From Prompt Magic to Prompt Engineering by Davide Gallitelli
An Introduction To DSPy by Cobus Greyling

Code 👨‍💻

stanfordnlp/DSPy - programming—not prompting—Foundation Models

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Select your cookie preferences

Site Terms, Privacy, and more.

Make Programs, not Prompts: DSPy Pipelines with Llama 3 on Amazon SageMaker JumpStart

Learn how to create ML pipelines with DSPy powered by Meta's Llama 3 70B Instruct model running on Amazon SageMaker.

Demo ✨

References

Articles 📄

Blogs ✍️

Code 👨‍💻

Comments