# Automatic LLM prompt optimization with DSPY

## Automatically create optimized prompts and avoid manual prompt crafting

Randy D

Amazon Employee

Published May 30, 2024

Prompt engineering is the universal fine-tuning step when using generative AI. We always need to write an instruction to use a foundation model. In non-trivial cases, the prompt has a big impact on the ultimate output quality. And of course, if we switch from one model to another, we need to go through the manual prompt crafting process again.

Automatically optimizing prompts for a specific use case is an area of active research. One of the most promising techniques is DSPy, a library for "programming, not prompting, foundation models."

DSPy is a framework for algorithmically optimizing LM prompts and weights

DSPy lets us define a GenAI program or pipeline that captures how we're going to use the foundation model. For example, we can define a program that performs retrieval-augmented generation and uses few-shot prompting. Then DSPy can optimize how this program performs for a specific task. It has several optimizers, ranging from simple ones that optimize few-shot examples, to more complex ones that also optimize the prompt instructions and even model weights.

DSPy has several examples, but they recently added support for foundation models provided in Amazon Bedrock, and that's not immediately clear unless you dig through the documentation. I adapted their simplest working example to use the Mixtral 8x7B model via Bedrock.

First, we just define the foundation model we want to use.

`1`

2

3

4

5

6

import dspy

from dspy.datasets.gsm8k import GSM8K, gsm8k_metric

bedrock = dspy.Bedrock(region_name="us-west-2")

lm = dspy.AWSMistral(bedrock, "mistral.mixtral-8x7b-instruct-v0:1")

dspy.settings.configure(lm=lm)

Next, we define our dataset. I'll use one of the ones that DSPy supports out of the box.

`1`

2

gsm8k = GSM8K()

gsm8k_trainset, gsm8k_devset = gsm8k.train, gsm8k.dev

Now we define a DSPy module that defines the input and output formats for our task.

`1`

2

3

4

5

6

7

class CoT(dspy.Module):

def __init__(self):

super().__init__()

self.prog = dspy.ChainOfThought("question -> answer")

def forward(self, question):

return self.prog(question=question)

At this point we can run the optimizer. I ran just a handful of trials to save time. I used the MIPRO optimizer as I wanted to optimize the prompt instructions.

`1`

2

3

4

5

6

7

8

9

10

11

12

13

14

from dspy.teleprompt import BootstrapFewShot, MIPRO

config = dict(num_candidates=4)

teleprompter = MIPRO(metric=gsm8k_metric, **config)

kwargs = dict(num_threads=1, display_progress=True, display_table=0)

optimized_cot = teleprompter.compile(

CoT(),

trainset=gsm8k_trainset,

num_trials=3,

max_bootstrapped_demos=3,

max_labeled_demos=5,

eval_kwargs=kwargs,

requires_permission_to_run=False)

In this very simple example, the optimizer started with this base prompt instruction:

`1`

Given the fields `question`, produce the fields `answer`.

During the first trial, it evolved to this prompt instruction.

`1`

2

3

4

5

6

7

8

9

10

11

Given the context provided in the `question` field,

your task is to accurately solve and provide the `answer` field.

This may involve mathematical reasoning, logical deductions, or

multi-step problem-solving. The `question` will often involve

various mathematical concepts and real-world scenarios, with a

progressive increase in complexity. You are expected to handle

diverse numerical values and various problem types, making your

responses versatile and adaptable for a wide range of complex problems.

This task is designed to develop your abilities in mathematical reasoning,

problem-solving, and deductive logic, similar to those required in

educational tools, AI tutors, or automated problem-solving systems.

The best trial ended up with this prompt instruction:

`1`

2

3

4

5

6

7

Given the fields `question` and `reasoning`, generate the fields `answer`.

The `question` will be a complex mathematical or logical problem,

often involving real-world scenarios. The `reasoning` will provide

a step-by-step solution to the problem. Your task is to reproduce the final

numerical `answer` based on the provided `reasoning`. This task will help

in developing educational tools, AI tutors, or automated problem-solving

systems that can manage intricate mathematical and logical problems.

And we can see how the score changed over the course of the three trials.

The DSPy documentation and GitHub site are worth a look. I'd encourage you to give DSPy a try if you want to get the most out of your prompts and foundation models.

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.