Migrating GenAI workloads to AWS

With 2023 being an year of Proof of concepts (PoCs) , 2024 has been year of production for GenAI workloads. With increasing Production workloads customers have been exploring possibilities to migrate their existing GenAI workloads to AWS GenAI stack. In this post we'll be exploring possible reasons on migrating your genAI workloads to AWS.

In this post we will seek to address following questions :

Common reasons for migrating GenAI
workloads to AWS
What are the considerations while migrating your GenAI workload?
Why evaluations are so important?
How to Migrate?
What are possible next steps?

Common reasons for migrating GenAI
workloads to AWS

Model Choice - Many customers want to use more than one model / model provider to keep their GenAI workloads agnostic to model providers and keep their workloads upto date with latest model capabilities. Amazon Bedrock provides customers broadest selection of models/ model providers to choose from depending upon cost, relevancy and performance of the model.
Ease of getting started - Hosting data, apps and GenAI services within the same cloud simplifies system maintainability and ability to innovate
Safe, private, and secure - Security is the top priority at AWS - Customer data remains confidential
Cost optimization - Best performance/cost for GenAI services with flexible pricing models
Consumption of services (including models on Bedrock) draws from EDPs
Reliability - Hundreds of thousands of customers use AWS for GenAI/ML and trust AWS for their mission critical applications
Customer Obsession - AWS Builder teams and APN Partners are committed to helping customers be successful with generative AI

Considerations while migrating GenAI workload

Don't forget to optimise your prompts - This is crucial , prompt that works well for one model (source model) may not work for the other (target model). Make sure you are optimising your prompts using some prompt optimiser tool. This can be done by Automatic Prompt Optmiser (APO) via Amazon Bedrock
Managing Prompts well - As you scale your GenAI applications , the size of prompts starts becoming large. Its becomes difficult to manage prompts lifecycle difficult hence impacting the reusability, maintainability of the prompts. Amazon Bedrock Prompt Management is a great tool to manage , test , version and optimise your prompts.
Evaluations : Highly important to choose right metrics to evaluate your models against , the benchmarks (GPQA, MMLU etc) may give right direction but may not reflect real world scenarios especially the usecase you are working on. Its highly important to choose metric that reflect your usecases. Using Amazon Bedrock evaluations its possible to evaluate model quickly. LLM-as-a-Judge is a quickest way to evaluate models for qualitative metrics like helpfulness, coherence, readability etc rather than Human evals that can rise evaluation costs high.
Trade-offs between Latency/costs/performance - Not one model fits all sizes , identify for what you would like to optimise your GenAI workload for, must strike right trade-off between all three to achieve your goals.

Why evaluations are important

Benchmarking of new or custom models - Large language models are advancing rapidly, with new models being developed frequently. Evaluation helps the users benchmark progress in capabilities like reasoning, common sense, and factual knowledge.
Understanding task specific strengths & weaknesses - Evaluation sheds light on what tasks large models are good at and where they still struggle. This helps guide research to improve weaknesses.
Comparing models - Standardized evaluations allow for head-to-head comparison of different language model architectures and training approaches. This helps determine which approaches are most promising.
Monitoring biases - Large models risk inheriting harmful societal biases from their training data. Evaluation helps detect biases so they can be addressed.
Feedback for model customization- Results of evaluations provide feedback to application developers on where models need improvement and guide the selection of better training data, architectures, and objectives.
Evaluate real-world applicability using your own data – Evaluating models using customer data aims to indicate how capable models are for real-world deployment and use. Evaluation proxies practical use cases.
User trust & safety - Rigorous testing is important for ensuring large language models behave reliably and safely before being integrated into applications used by millions of people.

Migrate GenAI workloads in 3-steps

Image not found
Migrate GenAI workload in 3 steps

Step 1 - Evaluate Source model
Step 2 - Migrate Prompt (using Automatic Prompt Optimiser on Bedrock Prompt Mgmt)

Image not found

Prompt Mgmt/ Migrate

Step 3 - Evaluate source model

Compare and choose -
Use Bedrock evaluations to choose right model , leverage LLM-as-a-Judge to evaluate models in addition to programatic approaches.

Get started with model evaluations using notebook here

Observability and Evaluation Custom Solution for Amazon Bedrock Applications here

Sample LLM-as-a-Judge evaluation - full notebook here

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# Available Generator Models
GENERATOR_MODELS = [
    "anthropic.claude-3-haiku-20240307-v1:0",
    "amazon.nova-micro-v1:0"
]

# Consistent Evaluator
EVALUATOR_MODEL = "anthropic.claude-3-haiku-20240307-v1:0"

def run_model_comparison(
    generator_models: List[str],
    evaluator_model: str
) -> List[Dict[str, Any]]:
    evaluation_jobs = []
    
    for generator_model in generator_models:
        job_name = f"llmaaj-{generator_model.split('.')[0]}-{evaluator_model.split('.')[0]}-{datetime.now().strftime('%Y-%m-%d-%H-%M-%S')}"
        
        try:
            response = create_llm_judge_evaluation(
                client=bedrock_client,
                job_name=job_name,
                role_arn=ROLE_ARN,
                input_s3_uri=input_data,
                output_s3_uri=f"{output_path}/{job_name}/",
                evaluator_model_id=evaluator_model,
                generator_model_id=generator_model,
                task_type="General"
            )
            
            job_info = {
                "job_name": job_name,
                "job_arn": response["jobArn"],
                "generator_model": generator_model,
                "evaluator_model": evaluator_model,
                "status": "CREATED"
            }
            evaluation_jobs.append(job_info)
            
            print(f"✓ Created job: {job_name}")
            print(f"  Generator: {generator_model}")
            print(f"  Evaluator: {evaluator_model}")
            print("-" * 80)
            
        except Exception as e:
            print(f"✗ Error with {generator_model}: {str(e)}")
            continue
            
    return evaluation_jobs

# Run model comparison
evaluation_jobs = run_model_comparison(GENERATOR_MODELS, EVALUATOR_MODEL)

Deploy using Bedrock Flows

Deploy, maintain and reuse your genAI workflows using Bedrock Flows, that helps manage whole prompt lifecycle with ease. and iterate over versions of your prompts.

Image not found

Flows

Next Steps

-> Engage with AWS specialists to get migration support
-> partner Up - Connect with AWS Partner offering GenAI migration accelerators here

Resources

Prompt Engineering using Bedrock here

Get started with model evaluations using notebook here

Observability and Evaluation Custom Solution for Amazon Bedrock Applications here

Sample LLM-as-a-Judge evaluation - full notebook here

Feel free to reach out to us or comment your feedback below.

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Select your cookie preferences

Site Terms, Privacy, and more.

Migrating GenAI workloads to AWS

Migration guidance for GenAI workloads to AWS

Common reasons for migrating GenAI
workloads to AWS

Considerations while migrating GenAI workload

Why evaluations are important

Migrate GenAI workloads in 3-steps

Image not found
Migrate GenAI workload in 3 steps

Deploy using Bedrock Flows

Next Steps

Resources

Comments

Select your cookie preferences

Site Terms, Privacy, and more.

Site Terms, Privacy, and more.

Migrating GenAI workloads to AWS

Migration guidance for GenAI workloads to AWS

Common reasons for migrating GenAIworkloads to AWS

Considerations while migrating GenAI workload

Why evaluations are important

Migrate GenAI workloads in 3-steps

Image not foundMigrate GenAI workload in 3 steps

Deploy using Bedrock Flows

Next Steps

Resources

Comments

Common reasons for migrating GenAI
workloads to AWS

Image not found
Migrate GenAI workload in 3 steps