
Make Programs, not Prompts: DSPy Pipelines with Llama 3 on Amazon SageMaker JumpStart
Learn how to create ML pipelines with DSPy powered by Meta's Llama 3 70B Instruct model running on Amazon SageMaker.
"If a LLM is like a database of millions of vector programs, then a prompt is like a search query in that database." ― François Chollet, How I think about LLM prompt engineering
DSPy
is just a fanciful backronym for Declarative Self-improving Language Programs, pythonically, which is now my 3rd favorite backcronym after COLBERT and SPECTRE. And yes, I keep a list...1
2
3
4
5
6
7
8
9
10
11
12
13
14
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
Given the fields `question`, produce the fields `answer`.
---
Follow the following format.
Question: $\{question\}
Answer: $\{answer\}
---
Question: {question}
Answer: <|eot_id|><|start_header_id|>assistant<|end_header_id|>
🤨 If you're wondering about the overabundance of tags<|start_header_id|>
or<|eot_id|>
, just go over the Model Cards & Prompt Formats for Meta Llama 3.
question -> answer
. Easy-peasy! "What if I want to add a bit of context?" you may ask. No worries, just extend it to context, question -> answer
.💡 Don't know where to start? Check out the Using DSPy in 8 Steps chapter (if you're an experience ML practitioner, they'll sound familiar).
🐛 “Every great s̶t̶o̶r̶y̶ demo seems to begin with a s̶n̶a̶k̶e̶ bug.” ― Nicolas Cage
aws
, bedrock
or sagemaker
, and 3a/ if there's a funny-looking issue, try to fix it; 3b/ otherwise, just try one of the demos (PS: I used to work as software tester and bugs just seem to find me wherever I go). SagemakerRuntime
and BedrockRuntime
), but the way we engage with the models is a bit different:- API: Bedrock has
InvokeModel
/InvokeModelWithResponseStream
and the newConverse
API, while SageMaker hasInvokeEndpoint
/InvokeEndpointWithResponseStream
and the asynchronousInvokeEndpointAsync
. Behind the scenes, Bedrock is powered by SageMaker endpoints, so the naming shouldn't come as a surprise. - Requests: as we will see later, each accepts a different payload, both in terms of inference parameters, their names and how they're structured, and produces a different response (although, if we correctly map the parameters and fix the model version, the
generated_text
should be the same).
AWSMeta
, SageMaker support is completely absent.🎉 Update (08-07-2024): The proposed fix has since been merged into themain
line branch.
1
2
# See https://github.com/stanfordnlp/dspy/pull/1241
pip install git+https://github.com/stanfordnlp/dspy
1
2
3
4
5
6
7
8
9
10
11
12
import json
import boto3
import dspy
# SageMaker JumpStart SDK
from sagemaker.jumpstart.model import JumpStartModel
from sagemaker.jumpstart.notebook_utils import list_jumpstart_models
from sagemaker.jumpstart.filters import And
# Get AWS region
region_name = boto3.session.Session().region_name
1
2
3
4
5
6
model = JumpStartModel(
model_id="meta-textgeneration-llama-3-70b-instruct",
model_version="2.0.2",
instance_type="ml.g5.48xlarge",
region=region_name
)
1
2
3
predictor = model.deploy(
accept_eula=input("Accept EULA? [y/n]") == "y"
)
1
2
3
4
5
6
7
# List all text generation models provided by Meta
list_jumpstart_models(
filter=And(
"framework == meta",
"task == textgeneration"
)
)
1
examples = [payload.body for payload in model.retrieve_all_examples()]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
[
{
"inputs": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nwhat is the recipe of mayonnaise?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
"parameters": {
"details": true,
"max_new_tokens": 256,
"stop": "<|eot_id|>",
"temperature": 0.6,
"top_p": 0.9
}
},
{
"inputs": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nI am going to Paris, what should I see?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nParis, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris:\n\n1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city.\n2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa.\n3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows.\n\nThese are just a few of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the most popular tourist destinations in the world.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nWhat is so great about #1?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
"parameters": {
"max_new_tokens": 256,
"stop": "<|eot_id|>",
"temperature": 0.6,
"top_p": 0.9
}
},
{
"inputs": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nAlways answer with Haiku<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nI am going to Paris, what should I see?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
"parameters": {
"max_new_tokens": 256,
"stop": "<|eot_id|>",
"temperature": 0.6,
"top_p": 0.9
}
},
{
"inputs": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nAlways answer with emojis<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nHow to go from Beijing to NY?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
"parameters": {
"max_new_tokens": 256,
"stop": "<|eot_id|>",
"temperature": 0.6,
"top_p": 0.9
}
}
]
1
predictor.predict(examples[0])
1
2
3
4
5
6
7
8
9
10
11
# Initialize client
sm_runtime = boto3.client("sagemaker-runtime")
# Call model
response = sm_runtime.invoke_endpoint(
EndpointName=predictor.endpoint_name,
Body=json.dumps(examples[0])
)
# and process response
print(json.loads(response["Body"].read())['generated_text'])
1
2
3
4
5
6
7
8
9
10
{
"inputs": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nwhat is the recipe of mayonnaise?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
"parameters": {
"details": true,
"max_new_tokens": 256,
"stop": "<|eot_id|>",
"temperature": 0.6,
"top_p": 0.9
}
}
1
2
3
4
5
6
{
"prompt": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nwhat is the recipe of mayonnaise?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n,
"max_gen_len": 256,
"temperature": 0.6,
"top_p": 0.9
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
from copy import deepcopy
def sm2bedrock_meta(payload):
"""
Converts a SageMaker- to a Bedrock-compatible payload for Meta models. TODO: error handling!!!
https://aws.amazon.com/blogs/machine-learning/llama-2-foundation-models-from-meta-are-now-available-in-amazon-sagemaker-jumpstart/
https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-meta.html#model-parameters-meta-request-response
"""
# Copy payload
payload_b = deepcopy(payload)
# Inference parameters are added at prompt level
payload_b = payload_b | payload_b.pop('parameters')
# Renamed in Amazon Bedrock 🏷️
payload_b['prompt'] = payload_b.pop('inputs')
payload_b['max_gen_len'] = payload_b.pop('max_new_tokens')
# Not supported by Amazon Bedrock ❌
payload_b.pop('details', None)
payload_b.pop('stop', None)
return payload_b
examples_b = list(map(lambda example: sm2bedrock_meta(example), examples))
1
2
3
4
5
6
7
8
9
10
11
# Initialize client
bedrock_runtime = boto3.client("bedrock-runtime")
# Call model
response = bedrock_runtime.invoke_model(
modelId="meta.llama3-70b-instruct-v1:0",
body=json.dumps(examples_b[0])
)
# and process the response
print(json.load(response['body'])['generation'])
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
The classic condiment! Mayonnaise is a thick, creamy, and tangy sauce made from a combination of oil, egg yolks, acid (such as vinegar or lemon juice), and seasonings. Here's a simple recipe to make mayonnaise at home:
**Ingredients:**
* 2 egg yolks
* 1 tablespoon lemon juice or vinegar (white wine vinegar or apple cider vinegar work well)
* 1/2 teaspoon Dijon mustard (optional, but recommended for flavor)
* 1/2 cup (120 ml) neutral-tasting oil, such as canola, grapeseed, or sunflower oil
* Salt, to taste
**Instructions:**
1. **Start with room temperature ingredients**: This is crucial for emulsification to occur.
2. **In a medium-sized bowl**, whisk together the egg yolks, lemon juice or vinegar, and Dijon mustard (if using) until well combined.
3. **Slowly add the oil**: While continuously whisking the egg yolk mixture, slowly pour in the oil in a thin, steady stream. Start with a very slow drizzle and gradually increase the flow as the mixture thickens.
4. **Whisk constantly**: Keep whisking until the
SageMaker
provider1
2
3
sagemaker = dspy.Sagemaker(
region_name=region_name
)
1
2
3
4
5
lm_sagemaker = dspy.AWSMeta(
aws_provider=sagemaker,
model=predictor.endpoint_name
)
dspy.configure(lm=lm_sagemaker)
💡 This example comes from the original paper by Khattab et al. (2023). If you need a refresher, the Prompt Engineering Guide has a great introduction to CoT prompting.
1
qa = dspy.ChainOfThought('question -> answer')
1
qa(question="Who was Albert Einstein?")
1
2
3
4
Prediction(
rationale="Here is the completed response:\n\nQuestion: Who was Albert Einstein?\nReasoning: Let's think step by step in order to identify the famous physicist. We know that Albert Einstein was a renowned German-born physicist who is widely regarded as one of the most influential scientists of the 20th century.",
answer='Albert Einstein was a renowned German-born physicist who is widely regarded as one of the most influential scientists of the 20th century.'
)
answer
and the rationale
behind it.1
lm_sagemaker.history
1
predictor.delete_endpoint()
- (Khattab et al., 2023) DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
- (Opsahl-Ong et al., 2024) Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs
- (Zhang et al., 2022) Automatic Chain of Thought Prompting in Large Language Models
- How I think about LLM prompt engineering by François Chollet
- Deploy Llama 3 on Amazon SageMaker by Phil Schmid
- An Introduction To DSPy by Cobus Greyling
- stanfordnlp/DSPy - programming—not prompting—Foundation Models
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.