Make Programs, not Prompts: DSPy Pipelines with Llama 3 on Amazon SageMaker JumpStart

"If a LLM is like a database of millions of vector programs, then a prompt is like a search query in that database." ― François Chollet, How I think about LLM prompt engineering

In the past few months, I kept hearing so many great things about this new kid on the block framework called DSPy and how it was going to revolutionize the way we interact with language models (LM), that I decided to give it a go.

According to the (mini-)FAQs, DSPy is just a fanciful backronym for Declarative Self-improving Language Programs, pythonically, which is now my 3rd favorite backcronym after COLBERT and SPECTRE. And yes, I keep a list...

In essence, DSPy replaces prompting with programming by treating LM pipelines as text-transformation graphs. And it does so declaratively (hence the D), replacing how we prompt the LM with what the LM is expected to do.

Take Question Answering (QA) as an example. Traditionally, we would use a prompt template like this one (written as Python f-string)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
Given the fields `question`, produce the fields `answer`.

---

Follow the following format.

Question: $\{question\}
Answer: $\{answer\}

---

Question: {question}
Answer: <|eot_id|><|start_header_id|>assistant<|end_header_id|>

🤨 If you're wondering about the overabundance of tags <|start_header_id|> or <|eot_id|>, just go over the Model Cards & Prompt Formats for Meta Llama 3.

In DSPy, we just say question -> answer. Easy-peasy! "What if I want to add a bit of context?" you may ask. No worries, just extend it to context, question -> answer.

As you may have guessed, I'm grossly oversimplifying things here. DSPy introduces a lot of new terminology, which makes things more difficult (for instance, the QA declarations in the previous paragraph are called signatures). For the purposes of this post, we don't have to go that deep.

💡 Don't know where to start? Check out the Using DSPy in 8 Steps chapter (if you're an experience ML practitioner, they'll sound familiar).

Demo ✨

🐛 “Every great s̶t̶o̶r̶y̶ demo seems to begin with a s̶n̶a̶k̶e̶ bug.” ― Nicolas Cage

The best way to learn about a new library or framework is to fix something wrong with it.

People often ask me how I decide to work on something and the answer is quite simple: 1/ select an interesting project, 2/ search for keywords like aws, bedrock or sagemaker, and 3a/ if there's a funny-looking issue, try to fix it; 3b/ otherwise, just try one of the demos (PS: I used to work as software tester and bugs just seem to find me wherever I go).

Today, it's all about this one. The crux of the problem is that SageMaker and Bedrock providers work differently. Both have runtime clients (SagemakerRuntime and BedrockRuntime), but the way we engage with the models is a bit different:

API: Bedrock has InvokeModel/InvokeModelWithResponseStream and the new Converse API, while SageMaker has InvokeEndpoint/InvokeEndpointWithResponseStream and the asynchronous InvokeEndpointAsync. Behind the scenes, Bedrock is powered by SageMaker endpoints, so the naming shouldn't come as a surprise.
Requests: as we will see later, each accepts a different payload, both in terms of inference parameters, their names and how they're structured, and produces a different response (although, if we correctly map the parameters and fix the model version, the generated_text should be the same).

So, what exactly is the issue? Well, the root cause is that for some AWS models make no distinction between SageMaker- and Bedrock-provided models. In the case of AWSMeta, SageMaker support is completely absent.

So today, I'd like to show you just that. We're going to deploy Meta's Llama 3 70B Instruct model, which is available on Amazon Bedrock, via Amazon SageMaker and then connect it with DSPy.

🎉 Update (08-07-2024): The proposed fix has since been merged into the mainline branch.

Ready to see it in action?

We'll start by installing DSPy

1
2
# See https://github.com/stanfordnlp/dspy/pull/1241
pip install git+https://github.com/stanfordnlp/dspy

and importing some libraries

1
2
3
4
5
6
7
8
9
10
11
12
import json

import boto3
import dspy

# SageMaker JumpStart SDK
from sagemaker.jumpstart.model import JumpStartModel
from sagemaker.jumpstart.notebook_utils import list_jumpstart_models
from sagemaker.jumpstart.filters import And

# Get AWS region
region_name = boto3.session.Session().region_name

Next, we'll use the SageMaker JumpStart SDK to initialize the Llama 3 70B Instruct model

1
2
3
4
5
6
model = JumpStartModel(
    model_id="meta-textgeneration-llama-3-70b-instruct",
    model_version="2.0.2",
    instance_type="ml.g5.48xlarge",
    region=region_name
)

and deploy it

1
2
3
predictor = model.deploy(
    accept_eula=input("Accept EULA? [y/n]") == "y"
)

By the way, the SDK includes some useful functions for listing models

1
2
3
4
5
6
7
# List all text generation models provided by Meta
list_jumpstart_models(
    filter=And(
        "framework == meta",
        "task == textgeneration"
    )
)

and each model comes with its own set of examples

1
examples = [payload.body for payload in model.retrieve_all_examples()]

Output:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
[
    {
        "inputs": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nwhat is the recipe of mayonnaise?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
        "parameters": {
            "details": true,
            "max_new_tokens": 256,
            "stop": "<|eot_id|>",
            "temperature": 0.6,
            "top_p": 0.9
        }
    },
    {
        "inputs": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nI am going to Paris, what should I see?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nParis, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris:\n\n1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city.\n2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa.\n3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows.\n\nThese are just a few of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the most popular tourist destinations in the world.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nWhat is so great about #1?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
        "parameters": {
            "max_new_tokens": 256,
            "stop": "<|eot_id|>",
            "temperature": 0.6,
            "top_p": 0.9
        }
    },
    {
        "inputs": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nAlways answer with Haiku<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nI am going to Paris, what should I see?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
        "parameters": {
            "max_new_tokens": 256,
            "stop": "<|eot_id|>",
            "temperature": 0.6,
            "top_p": 0.9
        }
    },
    {
        "inputs": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nAlways answer with emojis<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nHow to go from Beijing to NY?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
        "parameters": {
            "max_new_tokens": 256,
            "stop": "<|eot_id|>",
            "temperature": 0.6,
            "top_p": 0.9
        }
    }
]

that we can use to test the endpoint

1
predictor.predict(examples[0])

Equivalently, you can use Boto3 directly to make the call

1
2
3
4
5
6
7
8
9
10
11
# Initialize client
sm_runtime = boto3.client("sagemaker-runtime")

# Call model
response = sm_runtime.invoke_endpoint(
    EndpointName=predictor.endpoint_name,
    Body=json.dumps(examples[0])
)

# and process response
print(json.loads(response["Body"].read())['generated_text'])

✋ Time out! Remember when I said that the payloads for SageMaker endpoints and Bedrock models don't have to be the same? Let's look at that first sample in a little more detail...

1
2
3
4
5
6
7
8
9
10
{
    "inputs": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nwhat is the recipe of mayonnaise?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
    "parameters": {
        "details": true,
        "max_new_tokens": 256,
        "stop": "<|eot_id|>",
        "temperature": 0.6,
        "top_p": 0.9
    }
}

Now, the equivalent for a Bedrock request

1
2
3
4
5
6
{
    "prompt": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nwhat is the recipe of mayonnaise?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n,
    "max_gen_len": 256,
    "temperature": 0.6,
    "top_p": 0.9
}

Can you spot the differences? Here's the solution...

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
from copy import deepcopy

def sm2bedrock_meta(payload):
    """
    Converts a SageMaker- to a Bedrock-compatible payload for Meta models. TODO: error handling!!!
    https://aws.amazon.com/blogs/machine-learning/llama-2-foundation-models-from-meta-are-now-available-in-amazon-sagemaker-jumpstart/
    https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-meta.html#model-parameters-meta-request-response
    """

    # Copy payload
    payload_b = deepcopy(payload)

    # Inference parameters are added at prompt level
    payload_b = payload_b | payload_b.pop('parameters')

    # Renamed in Amazon Bedrock 🏷️
    payload_b['prompt'] = payload_b.pop('inputs')
    payload_b['max_gen_len'] = payload_b.pop('max_new_tokens')

    # Not supported by Amazon Bedrock ❌
    payload_b.pop('details', None)
    payload_b.pop('stop', None)
    
    return payload_b

examples_b = list(map(lambda example: sm2bedrock_meta(example), examples))

and let's check this actually works (yes, it's a recipe for mayo!) 🥣

1
2
3
4
5
6
7
8
9
10
11
# Initialize client
bedrock_runtime = boto3.client("bedrock-runtime")

# Call model
response = bedrock_runtime.invoke_model(
    modelId="meta.llama3-70b-instruct-v1:0",
    body=json.dumps(examples_b[0])
)

# and process the response
print(json.load(response['body'])['generation'])

Output:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
The classic condiment! Mayonnaise is a thick, creamy, and tangy sauce made from a combination of oil, egg yolks, acid (such as vinegar or lemon juice), and seasonings. Here's a simple recipe to make mayonnaise at home:

**Ingredients:**

* 2 egg yolks
* 1 tablespoon lemon juice or vinegar (white wine vinegar or apple cider vinegar work well)
* 1/2 teaspoon Dijon mustard (optional, but recommended for flavor)
* 1/2 cup (120 ml) neutral-tasting oil, such as canola, grapeseed, or sunflower oil
* Salt, to taste

**Instructions:**

1. **Start with room temperature ingredients**: This is crucial for emulsification to occur.
2. **In a medium-sized bowl**, whisk together the egg yolks, lemon juice or vinegar, and Dijon mustard (if using) until well combined.
3. **Slowly add the oil**: While continuously whisking the egg yolk mixture, slowly pour in the oil in a thin, steady stream. Start with a very slow drizzle and gradually increase the flow as the mixture thickens.
4. **Whisk constantly**: Keep whisking until the

Break's over! Back to the main thread...

Once deployment finishes, we can initialize the SageMaker provider

1
2
3
sagemaker = dspy.Sagemaker(
    region_name=region_name
)

and pass it along to the model

1
2
3
4
5
lm_sagemaker = dspy.AWSMeta(
    aws_provider=sagemaker,
    model=predictor.endpoint_name
)
dspy.configure(lm=lm_sagemaker)

For this demo, I'm creating a oh-my-god-you're-so-basic QA Chain of Thought (CoT) pipeline

💡 This example comes from the original paper by Khattab et al. (2023). If you need a refresher, the Prompt Engineering Guide has a great introduction to CoT prompting.

Image not found

Let's think step by step... (Source: Zhang et al., 2022)

1
qa = dspy.ChainOfThought('question -> answer')

and let's run a query to see if it's working

1
qa(question="Who was Albert Einstein?")

Output:

1
2
3
4
Prediction(
    rationale="Here is the completed response:\n\nQuestion: Who was Albert Einstein?\nReasoning: Let's think step by step in order to identify the famous physicist. We know that Albert Einstein was a renowned German-born physicist who is widely regarded as one of the most influential scientists of the 20th century.",
    answer='Albert Einstein was a renowned German-born physicist who is widely regarded as one of the most influential scientists of the 20th century.'
)

As you can see, the output includes both the final answer and the rationale behind it.

If you're interested, you can also check the actual model invocations by browsing the model's history

1
lm_sagemaker.history

Finally, don't forget to clean up everything when you're done 🧹

1
predictor.delete_endpoint()

Thanks for reading and see you next time! 👋

References

Articles 📄

(Khattab et al., 2023) DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
(Opsahl-Ong et al., 2024) Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs
(Zhang et al., 2022) Automatic Chain of Thought Prompting in Large Language Models

Blogs ✍️

(AWS ML Blog) Deploy and fine-tune foundation models in Amazon SageMaker JumpStart with two lines of code
(AWS ML Blog) Meta Llama 3 models are now available in Amazon SageMaker JumpStart
How I think about LLM prompt engineering by François Chollet
Deploy Llama 3 on Amazon SageMaker by Phil Schmid
Building Robust AI Systems with DSPy and Amazon Bedrock: From Prompt Magic to Prompt Engineering by Davide Gallitelli
An Introduction To DSPy by Cobus Greyling

Code 👨‍💻

stanfordnlp/DSPy - programming—not prompting—Foundation Models

Image not found

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Select your cookie preferences

Site Terms, Privacy, and more.

Make Programs, not Prompts: DSPy Pipelines with Llama 3 on Amazon SageMaker JumpStart

Learn how to create ML pipelines with DSPy powered by Meta's Llama 3 70B Instruct model running on Amazon SageMaker.

Demo ✨

References

Articles 📄

Blogs ✍️

Code 👨‍💻

Comments