Best Prompt Engineering Practice For LLMs In Bedrock
Understand the science of prompt engineering, unlocking the secrets of getting the best results from Large Language Models.
Published Jan 24, 2024
AWS has recently released a groundbreaking news service, Bedrock. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models from leading AI companies through a single interface.
With Bedrock, You Can:
- Choose from a diverse range of LLMs: Bedrock boasts models like Jurassic-2 from AI21 Labs, offering multilingual generation, and Anthropic's Claude, adept at various conversational and text processing tasks.
- Fine-tune models for specific needs: Bedrock allows you to tailor LLMs to your specific domains and tasks, further amplifying their potential.
- Deploy applications seamlessly: Integrate powerful LLMs into your existing workflows or build entirely new applications, all through Bedrock's API.
Some of the use cases of these models:
- Text generation
- Virtual assistants
- Text and image search
- Text summarization
- Image generation
To derive the best solution from these models, understanding a system of interaction (also known as known as prompt engineering) is very important.
In this blog, we will focus on understanding prompt engineering and various prompting techniques for LLMs (Large Language Models), and generative AI. We'll explore a spectrum of techniques in prompt engineering, ranging from fundamental to advanced approaches. Once we've acquired a solid understanding of crafting prompts that yield precise results, our conversation will expand to discussing actionable tips customized for Anthropic models.
In today's world, large language models (LLMs) are being used as powerful tools. They can create text that looks like it's written by humans, can translate languages, write different types of creative content, and even answer your questions in a human-like way. But how do we make these LLMs work best for us? That's where prompt engineering comes in.
Let us understand the very concept of prompt engineering by considering a simple question:
What is 10 plus 10?
Using a large language model, we get the correct answer: 20. However, the same question can yield different answers when approached differently. By adding context, such as labeling the operation as an addition problem, the question "What is 10 plus 10?" becomes more precise. With this additional context, the correct response becomes evident: 10 plus 10 is an addition problem. Therefore, our objective is to add context and influence the output in favor of the desired answer. This is the essence of prompt engineering.
Here's a key takeaway:
AI models are trained on data curated by humans. Therefore, when prompting them, it is essential to consider the details, context, tone, or instructions that a human would need for the same task. Using Large Language Models (LLMs) is like talking to a friend. Since neither humans nor LLMs can read minds, clarity and specificity are vital when seeking specific outcomes!
In prompt engineering, especially in customer-facing scenarios like chat interfaces, a crucial best practice is to establish a persona consistently. Ensure that the tone you set aligns with the desired user experience for that chat interface.
To illustrate, let's add a persona: "
You're a high school physics teacher. Explain quantum entanglement in one sentence.
"The model responds, "
Quantum entanglement is a strange phenomenon where two or more quantum particles become "entangled" such that they instantly influence each other, even if they are separated by a large distance.
"Now, switch the persona: "Y
ou're an excited 3-year-old child on a sugar high. Explain quantum entanglement in one sentence.
"The response becomes playful, "
Wheeeeeeee, two tiny things that were once together now act like magical twins even when far apart!
" Despite the identical query—explaining quantum entanglement—the model's output varies dramatically in tone and vocabulary based on the given persona.A similar concept to persona prompting is role prompting, which involves using roles to shape the tone, demeanor, or complexity level of responses. Whether tailoring Claude's answers for a kindergartener or a PhD student, role prompting serves this purpose.
Interestingly, role prompting doesn't only personalize responses but also enhances accuracy. For example, presenting the same logic puzzle to Claude without any role specification yields good results. However, when prefaced by stating Claude is a master logic bot designed for complex problems, it surprisingly performs even better on the same puzzle. Placing Claude in a role suitable for the task improves the overall quality of results, even when not aiming for a specific tone or demeanor.
Let's explore another simple technique: one-shot prompting. The key is to show one example when training the model, biasing its output toward the desired format. On the other hand, when feeding the LLM multiple examples, we transition into this technique called few-shot prompting. This proves beneficial for classification tasks like categorizing text into distinct labels.
Now, moving on to more advanced prompting techniques. Chain-of-thought prompting allows breaking down complex problems into individual steps, revealing not just the answer but the model's thought process and the steps leading to the result. In chain-of-thought prompting, we stimulate reasoning in large language models.
Chain-of-thought prompting involves guiding a language model through a series of connected prompts to generate a coherent response. For instance:
Prompt 1: "Describe a beautiful sunset."
Model Response 1: "The sky is ablaze with orange and red hues as the sun slowly sinks below the horizon. Rays of golden sunlight dance across scattered white clouds, highlighting their graceful shapes and curves. A soft breeze carries the sounds of birds chirping their evening melody. The colorful display casts a warm glow over the landscape, enhancing the natural beauty of the surroundings. As the sun dips further down, bright pinks and purples emerge and blend with the fiery oranges and reds. The colorful spectacle makes you catch your breath in wonder. But soon the sky darkens, the landscape fades into the shadows of twilight, and nightfall arrives to end the stunning sunset show."
Model Response 1: "The sky is ablaze with orange and red hues as the sun slowly sinks below the horizon. Rays of golden sunlight dance across scattered white clouds, highlighting their graceful shapes and curves. A soft breeze carries the sounds of birds chirping their evening melody. The colorful display casts a warm glow over the landscape, enhancing the natural beauty of the surroundings. As the sun dips further down, bright pinks and purples emerge and blend with the fiery oranges and reds. The colorful spectacle makes you catch your breath in wonder. But soon the sky darkens, the landscape fades into the shadows of twilight, and nightfall arrives to end the stunning sunset show."
Prompt 2 (Building on the previous response): "What emotions does this evocative scene evoke?"
Model Response 2: "A sense of awe and wonder at the stunning beauty of nature
Model Response 2: "A sense of awe and wonder at the stunning beauty of nature
- Peace and tranquility from the calming tones and ambient sounds
- Joy and elation at being able to witness such a spectacular display
- Melancholy or bittersweetness as the day comes to an end"
This approach maintains a logical flow, allowing the model to build on its previous responses for a more connected and contextually rich output. Similarly, we can use this to solve complex logical problems like how teachers advise writing down each step in a math class to aid problem-solving and coaching large language models to adopt a step-by-step approach, known as chain-of-thought, which proves to be effective.
To develop effective prompts, start by creating a diverse set of test cases tailored to your specific goals. Whether you're working with five or 5,000 test cases depends on your objectives but prioritize diversity to comprehensively represent your distribution. Include common edge cases, as these challenging scenarios highlight potential prompt struggles.
A good test case consists of an expected input and a "golden answer." For tasks like math problem-solving, the golden answer might be a specific expected value. In freeform writing tasks, it could be a well-crafted example, allowing for some flexibility in output interpretation.
Once your test cases are ready, proceed to engineer your preliminary prompt using established frameworks. Test your prompt against the cases, identify strengths and weaknesses, refine, iterate, and repeat the process until satisfied. Whether deploying it for customers or sharing insights with colleagues, adhering to this prompt development philosophy ensures better outcomes.
Let's delve into some aspects specific to Claude, the model from Anthropic's family of models. A brief introduction to Anthropic reveals it as an AI research company dedicated to developing reliable AI systems using AWS for model training. They have instant models; and more powerful Claude 2.0 and 2.1 models. To access these models on Bedrock, you need to request model access.
We will see how using different techniques we can improve the performance of Claude. These are practices that may not be necessary or advisable with other large language models but are essential for optimizing performance with Claude.
- Formatting prompts for Claude is crucial. During Claude's training, a sequence of alternating human and assistant dialogue is employed. Utilize special tokens like "new line, new line" and colons to indicate user and Claude dialogue. To enhance accuracy, consider instructing Claude to think within XML tags before answering.
- Another useful technique is often tied to output formatting. While you may desire a specific response format for user presentation, Claude might provide unnecessary details or preambles. To address this, guide Claude's output by initiating the response with specific indicators, like an open curly brace in the case of JSON.
In wrapping up, we need to delve into prompt engineering and maximize the performance of models like Claude through hands-on experimentation. We need to keep refining our skills by exploring various prompt-generation techniques and implementing them.