
Optimizing Prompts for Text Classification in Amazon Bedrock: From Manual to Automatic Approaches
Strategies for optimizing text classification prompts in Amazon Bedrock, from manual approaches to fully automated solutions.
Business Use Cases for Text Classification
Traditional ML vs. Generative AI Approach
The Challenge of High-Quality Classification
Approaches to Prompt Optimization
1. Hand-Crafted Prompts with Best Practices
3. Amazon Bedrock Prompt Optimization
Using Amazon Bedrock Prompt Optimization
4. Algorithmic Prompt Optimization
How OPRO works for prompt optimization
- Document Classification: Organizations deal with vast amounts of documents that need to be categorized by type (contracts, invoices, reports), department (legal, finance, marketing), or priority level. Proper classification enables efficient storage, retrieval, and workflow automation.
- Customer Support Triage: Customer inquiries arriving via email, chat, or web forms need to be classified by issue type (technical, billing, account access) and urgency to route them to the appropriate support team and prioritize resolution.
- Content Moderation: Online platforms must classify user-generated content to identify and filter inappropriate material, detect policy violations, and ensure community guidelines are maintained.
- Compliance Monitoring: In regulated industries, communications need to be classified to identify potential compliance risks, such as disclosure of sensitive information or improper financial advice.
- Many business classification tasks require domain-specific knowledge beyond general language understanding
- Real-world applications often involve many classification labels that aren't always mutually exclusive
- Edge cases and ambiguous inputs can lead to inconsistent results
- Business requirements may evolve, requiring an adaptive framework
- A well-defined set of classification labels with clear criteria
- Ground truth data with examples representing different scenarios
- A robust evaluation framework to measure performance
- An iterative approach to prompt optimization
1
2
3
4
5
6
7
You are a document classification expert. Classify the following document into EXACTLY ONE of these categories:
- Invoice (contains billing details, amounts, payment terms)
- Contract (contains legal terms, signatures, binding language)
- Report (contains analysis, findings, recommendations)
- Email (contains correspondence between parties)
Respond ONLY with the category name.
Document: [DOCUMENT TEXT]
- Clear, specific instructions
- Explicit classification criteria
- Few-shot examples for complex cases
- Consistent output format requirements
- Task-specific context setting
- Prompt Engineering Guidelines on the Amazon bedrock documentation
- Prompting best practices for Amazon Nova understanding models
- Amazon Bedrock Prompting GitHub repository
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
You are an expert in creating effective prompts for text classification.
I need a prompt that will help an LLM classify customer support tickets into these categories:
[Technical Issue, Billing Question, Account Access, Feature Request]
Some challenging examples include:
[EXAMPLES OF DIFFICULT CASES]
Create an optimized prompt that:
1. Achieves high classification accuracy
2. Handles ambiguous cases consistently
3. Works well across different writing styles
4. Properly distinguishes between similar categories
Your optimized prompt should include instructions and examples.
- Anthropic Prompt Generator
- Anthropic Meta-prompt notebook
- Write a prompt in an Amazon Bedrock playground or using Prompt management
- Select a model for optimization
- Click the optimization icon (wand symbol)
- Review the analysis and optimized prompt
- Choose to use the optimized prompt or keep your original
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import boto3
# Set values
TARGET_MODEL_ID = "anthropic.claude-3-sonnet-20240229-v1:0"
PROMPT = "Classify this document as either an invoice, contract, or report: "
client = boto3.client('bedrock-agent-runtime')
response = client.optimize_prompt(
input={"textPrompt": {"text": PROMPT}},
targetModelId=TARGET_MODEL_ID
)
# Handle the response stream
for event in response['optimizedPrompt']:
if 'optimizedPromptEvent' in event:
print("OPTIMIZED PROMPT:", event['optimizedPromptEvent'])
else:
print("ANALYSIS:", event['analyzePromptEvent'])
- Meta-Prompt Construction: Create a prompt containing:
- Previously tested prompts with their performance scores
- Task description and evaluation criteria
- Example inputs and outputs for the classification task
- Solution Generation: The LLM generates multiple candidate prompts
- Evaluation: Each prompt is tested on a validation set to measure performance
- Feedback Loop: The best-performing prompts and their scores are incorporated into the meta-prompt for the next iteration
- Iteration: Steps 2-4 are repeated, with each cycle potentially producing better prompts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
meta_prompt = f"""
You are an AI assistant tasked with generating instructions for a text classification task.
The goal is to classify text into one of these categories: {", ".join(ALLOWED_CATEGORIES)}.
Here are some example texts with their correct categories:
<examples>
{generate_examples(dataset, NUM_EXAMPLES)}
</examples>
Previous instructions and their scores (higher is better):
<previous_instructions>
{format_instructions(best_instructions, best_scores)}
</previous_instructions>
Please generate 3 new instruction sets that could potentially achieve higher accuracy.
Each instruction-set should be enclosed inside <inst></inst> tags.
Leverage previous instructions and examples to enrich category descriptions and strive for better classification results.
"""
- Cost: The heavy use of LLMs across iterations for prompt generation and testing has its associated costs, which can be assessed beforehand.
- Overfitting: At some point the iterations may stop producing better scoring prompts and the process should be stopped. It is advisable to retain a final test data set with previously unseen examples to help ensure we are not overfitting.
- Start simple: Begin with a basic hand-crafted prompt based on best practices
- Leverage automation: Use Amazon Bedrock's prompt optimization to refine your initial prompt
- Evaluate thoroughly: Test against a diverse set of inputs to identify edge cases
- Iterate systematically: For critical applications, consider implementing an OPRO-like approach for continuous optimization
- Monitor performance: Classification needs evolve over time, so establish regular evaluation cycles
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.