Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

AWS Logo
Menu

Migrating GenAI workloads to AWS

Migration guidance for GenAI workloads to AWS

NInad Joshi
Amazon Employee
Published Mar 24, 2025
With 2023 being an year of Proof of concepts (PoCs) , 2024 has been year of production for GenAI workloads. With increasing Production workloads customers have been exploring possibilities to migrate their existing GenAI workloads to AWS GenAI stack. In this post we'll be exploring possible reasons on migrating your genAI workloads to AWS.
In this post we will seek to address following questions :
  • Common reasons for migrating GenAI
    workloads to AWS
  • What are the considerations while migrating your GenAI workload?
  • Why evaluations are so important?
  • How to Migrate?
  • What are possible next steps?

Common reasons for migrating GenAI
workloads to AWS

  1. Model Choice - Many customers want to use more than one model / model provider to keep their GenAI workloads agnostic to model providers and keep their workloads upto date with latest model capabilities. Amazon Bedrock provides customers broadest selection of models/ model providers to choose from depending upon cost, relevancy and performance of the model.
  2. Ease of getting started - Hosting data, apps and GenAI services within the same cloud simplifies system maintainability and ability to innovate
  3. Safe, private, and secure - Security is the top priority at AWS - Customer data remains confidential
  4. Cost optimization - Best performance/cost for GenAI services with flexible pricing models
    Consumption of services (including models on Bedrock) draws from EDPs
  5. Reliability - Hundreds of thousands of customers use AWS for GenAI/ML and trust AWS for their mission critical applications
  6. Customer Obsession - AWS Builder teams and APN Partners are committed to helping customers be successful with generative AI

Considerations while migrating GenAI workload

  1. Don't forget to optimise your prompts - This is crucial , prompt that works well for one model (source model) may not work for the other (target model). Make sure you are optimising your prompts using some prompt optimiser tool. This can be done by Automatic Prompt Optmiser (APO) via Amazon Bedrock
  2. Managing Prompts well - As you scale your GenAI applications , the size of prompts starts becoming large. Its becomes difficult to manage prompts lifecycle difficult hence impacting the reusability, maintainability of the prompts. Amazon Bedrock Prompt Management is a great tool to manage , test , version and optimise your prompts.
  3. Evaluations : Highly important to choose right metrics to evaluate your models against , the benchmarks (GPQA, MMLU etc) may give right direction but may not reflect real world scenarios especially the usecase you are working on. Its highly important to choose metric that reflect your usecases. Using Amazon Bedrock evaluations its possible to evaluate model quickly. LLM-as-a-Judge is a quickest way to evaluate models for qualitative metrics like helpfulness, coherence, readability etc rather than Human evals that can rise evaluation costs high.
  4. Trade-offs between Latency/costs/performance - Not one model fits all sizes , identify for what you would like to optimise your GenAI workload for, must strike right trade-off between all three to achieve your goals.

Why evaluations are important

  1. Benchmarking of new or custom models - Large language models are advancing rapidly, with new models being developed frequently. Evaluation helps the users benchmark progress in capabilities like reasoning, common sense, and factual knowledge.
  2. Understanding task specific strengths & weaknesses - Evaluation sheds light on what tasks large models are good at and where they still struggle. This helps guide research to improve weaknesses.
  3. Comparing models - Standardized evaluations allow for head-to-head comparison of different language model architectures and training approaches. This helps determine which approaches are most promising.
  4. Monitoring biases - Large models risk inheriting harmful societal biases from their training data. Evaluation helps detect biases so they can be addressed.
  5. Feedback for model customization- Results of evaluations provide feedback to application developers on where models need improvement and guide the selection of better training data, architectures, and objectives.
  6. Evaluate real-world applicability using your own data – Evaluating models using customer data aims to indicate how capable models are for real-world deployment and use. Evaluation proxies practical use cases.
  7. User trust & safety - Rigorous testing is important for ensuring large language models behave reliably and safely before being integrated into applications used by millions of people.

Migrate GenAI workloads in 3-steps

Image not found
Migrate GenAI workload in 3 steps

  • Step 1 - Evaluate Source model
  • Step 2 - Migrate Prompt (using Automatic Prompt Optimiser on Bedrock Prompt Mgmt)
Image not found
Prompt Mgmt/ Migrate
  • Step 3 - Evaluate source model
Compare and choose -
Use Bedrock evaluations to choose right model , leverage LLM-as-a-Judge to evaluate models in addition to programatic approaches.
Get started with model evaluations using notebook here
Observability and Evaluation Custom Solution for Amazon Bedrock Applications here
Sample LLM-as-a-Judge evaluation - full notebook here
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# Available Generator Models
GENERATOR_MODELS = [
"anthropic.claude-3-haiku-20240307-v1:0",
"amazon.nova-micro-v1:0"
]

# Consistent Evaluator
EVALUATOR_MODEL = "anthropic.claude-3-haiku-20240307-v1:0"

def run_model_comparison(
generator_models: List[str],
evaluator_model: str
) -> List[Dict[str, Any]]:
evaluation_jobs = []

for generator_model in generator_models:
job_name = f"llmaaj-{generator_model.split('.')[0]}-{evaluator_model.split('.')[0]}-{datetime.now().strftime('%Y-%m-%d-%H-%M-%S')}"

try:
response = create_llm_judge_evaluation(
client=bedrock_client,
job_name=job_name,
role_arn=ROLE_ARN,
input_s3_uri=input_data,
output_s3_uri=f"{output_path}/{job_name}/",
evaluator_model_id=evaluator_model,
generator_model_id=generator_model,
task_type="General"
)

job_info = {
"job_name": job_name,
"job_arn": response["jobArn"],
"generator_model": generator_model,
"evaluator_model": evaluator_model,
"status": "CREATED"
}
evaluation_jobs.append(job_info)

print(f"✓ Created job: {job_name}")
print(f" Generator: {generator_model}")
print(f" Evaluator: {evaluator_model}")
print("-" * 80)

except Exception as e:
print(f"✗ Error with {generator_model}: {str(e)}")
continue

return evaluation_jobs

# Run model comparison
evaluation_jobs = run_model_comparison(GENERATOR_MODELS, EVALUATOR_MODEL)

Deploy using Bedrock Flows

Deploy, maintain and reuse your genAI workflows using Bedrock Flows, that helps manage whole prompt lifecycle with ease. and iterate over versions of your prompts.
Image not found
Flows

Next Steps

-> Engage with AWS specialists to get migration support
-> partner Up - Connect with AWS Partner offering GenAI migration accelerators here

Resources

Prompt Engineering using Bedrock here
Get started with model evaluations using notebook here
Observability and Evaluation Custom Solution for Amazon Bedrock Applications here
Sample LLM-as-a-Judge evaluation - full notebook here

Feel free to reach out to us or comment your feedback below.
 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments

Log in to comment