
Introducing the AWS Guidance for Multi-Provider LLM Access
Managing diverse Large Language Models (LLMs) from providers like Amazon Bedrock, OpenAI, and Anthropic can be complex. This post introduces the AWS Guidance for Multi-Provider Generative AI Gateway, a deployable solution for AWS builders seeking to simplify this process. Discover how this gateway provides a unified, OpenAI-compatible API, centralizes governance, enhances security with features like Bedrock Guardrails integration, and helps control costs.
Todd Fortier
Amazon Employee
Published Apr 16, 2025
Last Modified Apr 17, 2025
There's no doubt generative artificial intelligence is experiencing explosive growth, with Large Language Models (LLMs) emerging from numerous providers like AWS (via Amazon Bedrock), OpenAI, Anthropic, Cohere, Google, Meta, and many others. This presents incredible opportunities for AWS builders, allowing them to select the most suitable model—the best "tool"—for specific tasks, optimizing for performance, cost, or specialized capabilities.
However, this diversity brings its own set of challenges. Integrating and managing applications that leverage multiple LLMs often leads to significant operational complexity. Developers grapple with disparate APIs, inconsistent security postures, fragmented governance frameworks, and convoluted cost allocation strategies. While having a wide array of specialized tools can be powerful, managing them all efficiently (knowing which one to use when, keeping them secure, tracking their usage) becomes a substantial undertaking without a centralized system.
To address this growing need for streamlined management, AWS now offers the Guidance for Multi-Provider Generative AI Gateway on AWS. This solution, available as part of the AWS Solutions Library, provides a deployable architecture that acts as a central, standardized gateway for accessing various LLM providers from within your AWS environment. This post is for AWS builders—developers, architects, and engineers—seeking a practical, secure, and efficient way to manage multi-provider LLM access and harness the full potential of generative AI without getting bogged down in integration complexities.
Leveraging multiple LLMs offers flexibility, but without a unifying layer, several significant challenges emerge:
- API Fragmentation: Each LLM provider typically exposes its own unique API, complete with distinct request/response schemas, authentication mechanisms, and error handling logic. Requiring developers to learn, implement, and maintain code for each specific API significantly increases development time, introduces potential inconsistencies, and creates friction when experimenting with or switching between models.
- Governance Gaps: Implementing consistent governance across a diverse set of LLM endpoints is difficult. How can an organization ensure that only authorized teams access specific (potentially costly) models? How can universal responsible AI policies, like content filtering or guardrails, be applied uniformly when interacting with different providers? Centralized auditing and policy enforcement become complex without a single control plane.
- Security Complexities: Managing credentials, particularly API keys for multiple external services, securely is a critical concern. Ensuring consistent network security postures, data protection standards, and secure authentication/authorization across various endpoints adds layers of complexity and potential risk.
- Cost Management Hurdles: Accurately tracking LLM consumption and attributing costs back to specific projects, teams, or applications becomes a major challenge when usage is fragmented across providers with potentially different pricing models (e.g., per token, per request, provisioned throughput). This lack of visibility hinders effective budget management and cost optimization efforts.
- Operational Overhead: Cumulatively, these factors result in significant operational overhead. Teams spend more time on integration plumbing, security management, and cost reconciliation, diverting resources from core application development and innovation.
The absence of a unified approach creates more than just technical debt; it acts as a bottleneck, hindering the rapid experimentation and adoption of new, potentially better-suited models for specific tasks. This directly impacts an organization's ability to innovate quickly and realize the full business value of generative AI in a fast-evolving field. Centralizing access through a gateway directly addresses this bottleneck by simplifying integration, standardizing governance, and streamlining operations.
To help builders overcome these challenges, the AWS Solutions Library recently launched the Guidance for Multi-Provider Generative AI Gateway on AWS. AWS Guidance provides prescriptive technical expertise, reference architectures, best practices, and deployable code samples to accelerate cloud solutions.
This specific Guidance delivers a ready-to-deploy solution that functions as a central proxy, streamlining and standardizing access to a multitude of LLM providers from within your AWS environment. Its core purpose is to directly address the API fragmentation, governance gaps, security complexities, and cost management hurdles associated with multi-LLM strategies.
Under the hood, the solution leverages the popular open-source LiteLLM proxy. LiteLLM is designed to provide a consistent, OpenAI-compatible API interface for interacting with over 100 different LLMs. By utilizing this widely adopted standard, the gateway significantly flattens the learning curve for development teams already familiar with the OpenAI API structure.
Deployment is managed via Terraform, enabling infrastructure-as-code practices for repeatable and predictable provisioning. Builders have the flexibility to deploy the gateway onto either Amazon Elastic Container Service (ECS) or Amazon Elastic Kubernetes Service (EKS), accommodating different preferences and existing investments in container orchestration platforms.
This Guidance offers more than just LiteLLM running on AWS. It provides LiteLLM deeply integrated with essential AWS services, creating a robust, scalable, and secure solution native to AWS. By packaging a well-architected implementation using familiar tools like Terraform and a popular open-source project like LiteLLM, AWS provides a practical accelerator. This approach lowers the adoption barrier while delivering the substantial benefits of tight AWS integration for security, scalability, and manageability, saving builders the significant effort of manually configuring LiteLLM and its dependencies on AWS. It effectively balances standardization with the flexibility builders need.
The core of the AWS Guidance for Multi-Provider Generative AI Gateway is the LiteLLM proxy application, containerized and running within either Amazon ECS (typically on AWS Fargate for serverless operation) or Amazon EKS pods. This proxy acts as the single, unified endpoint for all application requests destined for the various configured LLMs.
LiteLLM's primary function within this architecture is to provide a consistent API interface, adhering to the OpenAI specification, regardless of the target LLM provider. This abstraction layer shields application developers from the underlying complexities of provider-specific APIs, authentication methods, and response formats. A request formatted for the OpenAI API can be sent to the gateway, which then intelligently routes and translates it for the intended backend model, be it on Amazon Bedrock, OpenAI, Anthropic, or others configured in the system.

A typical request traverses the following path through the architecture:
- Client Request: An application sends an API request (e.g., for chat completion) to the gateway's endpoint, exposed via Amazon Route 53. Security Inspection: AWS WAF inspects the incoming traffic, protecting the gateway against common web exploits.
- Load Balancing: The request is forwarded to an Application Load Balancer (ALB), which handles TLS/SSL termination using a certificate managed by AWS Certificate Manager (ACM). The ALB distributes traffic across the available gateway instances running in ECS or EKS for high availability and scalability.
- Container Execution: The ALB routes the request to a healthy ECS Task or EKS Pod. These containers run the LiteLLM application image, which was built during the Guidance deployment and stored in Amazon Elastic Container Registry (ECR).
- Gateway Processing: The LiteLLM instance processes the request based on its configuration (defined primarily in config.yaml). This involves identifying the target model, retrieving necessary credentials, applying configured policies (like rate limits or routing rules), and potentially checking the cache.
- Backend Interaction: LiteLLM interacts with various backend services as needed:
- AWS Secrets Manager: Securely retrieves API keys or other sensitive credentials required for accessing external LLM providers.
- Amazon ElastiCache (using Redis OSS): Checks for cached responses to potentially serve the request faster and cheaper. Also used for distributing configuration settings in a multi-instance setup.
- Amazon Relational Database Service (RDS) (using PostgreSQL): Accesses persistent storage for configurations like virtual API keys, user/team data, and potentially other metadata managed by LiteLLM.
- Amazon Bedrock: Invokes models hosted on Bedrock, potentially leveraging Bedrock-specific features like Guardrails or managed prompts.
- External LLM APIs: Makes calls over the internet (likely via a NAT Gateway if deployed in private subnets) to external providers like OpenAI, Anthropic, etc.
- Response Handling: LiteLLM receives the response from the chosen LLM provider.
- Client Response: LiteLLM formats the response according to the OpenAI standard and sends it back through the ALB to the originating client application.
- Logging: Application logs from LiteLLM and the supporting middleware are captured and stored in a designated Amazon S3 bucket for operational monitoring, troubleshooting, and analysis.
This architecture leverages managed AWS services extensively. Route 53 provides DNS resolution, WAF adds a security layer, ACM manages certificates, ALB handles traffic distribution and scaling, ECR stores container images, ECS/EKS provides container orchestration, Secrets Manager secures credentials, ElastiCache offers managed caching, RDS provides a managed database, and S3 stores logs. This design emphasizes security, scalability, and resilience by offloading the undifferentiated heavy lifting of managing this infrastructure to AWS, allowing builders to focus on configuring and utilizing the gateway's capabilities. The option to deploy on either ECS or EKS provides flexibility to align with an organization's existing container strategy.
Deploying the AWS Guidance for Multi-Provider Generative AI Gateway unlocks several key benefits for builders managing diverse LLM landscapes:
Benefit: Dramatically simplifies application development and maintenance by providing a single, consistent API endpoint based on the OpenAI specification. This allows developers to interact with over 100 LLMs, including those on Amazon Bedrock, OpenAI, Anthropic, Cohere, Google Vertex AI, Hugging Face models, and even locally run models via Ollama, without writing provider-specific code.
How: The LiteLLM core translates the standardized incoming request into the format required by the designated backend LLM provider. Switching between models or adding support for a new LLM often only requires updating the central config.yaml file, enabling rapid experimentation and adaptation without application code changes.
Benefit: Enables the enforcement of consistent security policies, access controls, and responsible AI guardrails across all integrated LLM providers from a single point.
How: Bedrock Guardrails Integration: A standout feature is the integration with AWS Bedrock Guardrails. This allows organizations to define and apply content filters, deny sensitive topics, and enforce other responsible AI policies centrally. Significantly, these Bedrock-managed guardrails can be applied even to requests ultimately destined for non-Bedrock models accessed via the gateway. This provides a powerful mechanism for standardizing safety protocols across the entire LLM landscape used by the organization, leveraging AWS's managed capabilities.
Secrets Management: Sensitive credentials, such as API keys for external LLM providers, are securely stored and managed using AWS Secrets Manager, removing the need to hardcode them or manage them insecurely.
Authentication: The gateway supports API key-based authentication, managed via the LiteLLM Admin UI. For more robust enterprise integration, it also offers optional support for Okta OAuth 2.0 JWT token authentication.
Network Security: Deployment within a Virtual Private Cloud (VPC), combined with AWS WAF, ALB security features, and network security groups, provides multiple layers of network protection. There's also an option to configure the ALB as private for internal-only access.
Benefit: Provides enhanced visibility into LLM usage patterns and associated costs, facilitating better budget management, cost allocation, and optimization efforts.
How: Usage Tracking: LiteLLM inherently tracks LLM calls and token counts, allowing usage to be attributed based on API keys, which can be mapped to users, teams, or projects.
Budgets & Rate Limits: The LiteLLM configuration allows administrators to set spending budgets and enforce rate limits (requests per minute/day, tokens per minute/day) on a per-key or per-model basis, preventing runaway costs and ensuring fair resource allocation.
Logging: Centralized logging of requests, responses, and errors is directed to an Amazon S3 bucket, providing a data source for usage analysis, cost calculation, and troubleshooting. LiteLLM also supports configurable callbacks to send telemetry data to various observability platforms like Langfuse, Langsmith, OpenTelemetry, and others, although integrating these may require additional configuration beyond the default Guidance setup. This aligns with broader AWS patterns for implementing multi-tenant cost tracking via gateways.
Benefit: Improves application responsiveness and reliability through caching, load balancing, and intelligent routing.
How: Prompt Caching: Leverages Amazon ElastiCache for Redis to implement semantic caching. Responses to frequently asked or semantically similar prompts can be served directly from the cache, reducing latency, lowering API call costs to backend LLMs, and decreasing load on the models.
Load Balancing: The Application Load Balancer automatically distributes incoming traffic across multiple healthy gateway instances deployed in ECS or EKS, ensuring scalability and availability.
Routing & Fallbacks: LiteLLM's configuration allows defining routing strategies (e.g., simple shuffle, least-busy, latency-based) for distributing requests among multiple deployments of the same logical model. It supports defining fallback models. If a request to a primary LLM fails or times out, the gateway can automatically retry the request with a pre-configured backup model or provider, enhancing application resilience.
A/B Testing Configuration: The routing mechanism can be used for basic A/B testing. By defining multiple backend model deployments under the same model_name in the configuration and assigning different weight values, traffic can be split proportionally between them, allowing for comparison of different models or providers
Benefit: Accelerates the setup process and leverages familiar AWS management paradigms.
How: Terraform: The entire infrastructure is defined in Terraform scripts, enabling automated, repeatable deployments.
ECS/EKS Flexibility: Offers a choice between AWS's primary container orchestration services.
Admin UI: Includes a web-based user interface for administrators to manage virtual API keys, users, teams, view basic usage, and potentially test models directly.
While the gateway provides robust routing features like fallbacks and weighted load balancing via LiteLLM's configuration, it's important to note its scope. More sophisticated dynamic routing strategies discussed in AWS blogs, such as LLM-assisted routing (using a classifier LLM to determine the best downstream model) or semantic routing (using embeddings to match prompts to model capabilities), are generally not built-in features of this specific LiteLLM-based Guidance package out-of-the-box. Implementing these advanced patterns would likely require building custom logic on top of the gateway or integrating it with other AWS services like Amazon Bedrock Agents or custom AWS Lambda routing functions. The gateway provides a solid foundation, but complex, context-aware routing may necessitate further development.
The table below summarizes key features and the components enabling them:
Feature Area | Specific Capability | How it's Enabled (AWS Service / LiteLLM Feature) |
---|---|---|
Unified API | Access 100+ LLMs via OpenAI format | LiteLLM Proxy Core Function |
Security | Bedrock Guardrails Integration | AWS Bedrock Integration |
Security | Secure API Key Storage | AWS Secrets Manager |
Security | Authentication | LiteLLM Keys / Optional Okta JWT |
Security | Network Protection | AWS WAF, ALB, VPC Security Groups |
Cost/Observability | Usage Tracking (Key/Team) | LiteLLM Core Function |
Cost/Observability | Budgets & Rate Limits | LiteLLM Configuration |
Cost/Observability | Centralized Logging | Amazon S3 / LiteLLM Callbacks |
Performance | Prompt Caching | Amazon ElastiCache (Redis) + LiteLLM |
Resilience | Load Balancing | Application Load Balancer (ALB) |
Resilience | Routing & Fallbacks | LiteLLM Routing Configuration |
Resilience | A/B Testing Setup | LiteLLM Weighted Deployments |
Deployment | Infrastructure as Code | HashiCorp Terraform |
Deployment | Container Orchestration | Amazon ECS / Amazon EKS |
Management | User/Key/Model Admin | LiteLLM Admin UI |
This section provides a high-level guide to deploying the AWS Guidance for Multi-Provider Generative AI Gateway.
Disclaimer: This guide is based on the implementation guide documentation available at the time of writing. Always refer to the official for the most up-to-date instructions, configuration details, and potential changes. Deploying this solution will incur costs for the AWS resources used (e.g., ECS/EKS, ALB, RDS, ElastiCache). As an AWS Guidance/Sample solution, support is typically provided on a best-effort basis by the community via the GitHub repository's Issues section.
Ensure the following tools and access are configured before proceeding:
- An active AWS account with IAM permissions sufficient to create the required resources.
- AWS Command Line Interface (CLI) installed and configured with credentials.
- Git installed.
- Terraform (check repository for recommended version) installed.
- kubectl installed if you plan to deploy to EKS.
- yq (a lightweight YAML processor) installed.
- Docker installed (useful for understanding container images or local testing).
- (Optional) A registered domain name for a custom endpoint.
- (Optional) An SSL/TLS certificate managed in AWS Certificate Manager (ACM) for the custom domain.
Clone the official Guidance repository from GitHub:
Copy the sample
.env
template file locallyConfiguration is primarily managed through environment variables and the
.env
file located within the repository structure.Sample
.env
file- Environment Variables: Set necessary environment variables for Terraform, such as AWS_REGION. You might also need to set variables for domain names, certificate ARNs, or API keys if you choose not to reference them directly from Secrets Manager within the
config.yaml
. Ensure your AWS credentials are configured correctly (e.g., via~/.aws/credentials
, environment variables, or an IAM role). - VPC Configuration: Decide whether to deploy into a new VPC (default) or an existing one. To use an existing VPC, set the
EXISTING_VPC_ID
variable in your environment or Terraform configuration. Ensure the existing VPC meets the requirements: public and private subnets across at least two Availability Zones, and at least one NAT Gateway for outbound internet access from private subnets. - Deployment Platform: Choose your container orchestrator by setting the
DEPLOYMENT_PLATFORM
variable to either"ECS"
or"EKS"
. Note the documented limitation: switching an active deployment from EKS to ECS requires destroying the stack first (undeploy.sh
), though switching from ECS to EKS is supported. config/config.yaml
: This file is central to configuring LiteLLM's behavior. Key sections include:- By default, this Guidance is deployed with Redis OSS caching enabled, and with most popular model providers enabled. At first, the CDK deployment will automatically use the
config/default-config.yaml
and copy it toconfig.yaml
. Make sure you enable model access on Amazon Bedrock. - If you want to remove support for certain models, or add more models, you can create and edit your own
config/config.yaml
file. - model_list: Define each LLM you want the gateway to serve. Each entry needs a model_name (the alias used in API calls) and litellm_params specifying the actual provider model ID (e.g., "
bedrock/anthropic.claude-3-sonnet-20240229-v1:0
", "openai/gpt-4-turbo
"). Within litellm_params, configure api_key (use "os.environ/YOUR_ENV_VAR_NAME
" to pull from environment variables, which Terraform can populate from Secrets Manager),api_base
if needed, and other model-specific parameters like temperature or max_tokens. You can also set weight here for load balancing/A/B testing. - router_settings: Configure the routing_strategy (e.g., "
simple-shuffle
", "least-busy
", "latency-based-routing
") and define fallbacks to specify backup models. - litellm_settings: Configure global LiteLLM behaviors, such as enabling Redis caching (cache: type: redis).
- general_settings: Set server-level configurations, like the
master_key
for accessing administrative functions of the proxy.
The following table highlights some critical
config.yaml
parameters for initial setup:Parameter Path | Description | Example Value / Notes |
---|---|---|
model_list.model_name | Alias for the model used in API calls | "my-claude-sonnet" |
model_list.litellm_params.model | Actual provider model ID | "bedrock/anthropic.claude-3-sonnet-20240229-v1:0" |
model_list.litellm_params.api_key | API Key (use env var syntax, populated by TF from Secrets Mgr) | "ANTHROPIC_API_KEY" |
model_list.litellm_params.api_base | Provider API base URL (if needed, e.g., for non-AWS/OpenAI) | "https://api.anthropic.com" |
model_list.weight | Weight for load balancing/A/B testing across same model_name | 1.0 |
router_settings.routing_strategy | How to route requests across deployments with the same model_name | "least-busy" (Options: simple-shuffle, latency, usage) |
router_settings.fallbacks | Define fallback models if primary fails | [{"bedrock/claude-sonnet": ["bedrock/claude-haiku"]}] |
litellm_settings.cache.type | Enable caching (requires ElastiCache Redis to be deployed) | "redis" |
general_settings.master_key | Admin key for proxy management (use env var syntax for security) | "LITELLM_MASTER_KEY" |
Use the standard Terraform workflow to provision the AWS resources:
Enter yes when prompted to confirm the deployment. Provisioning the full stack (VPC, EKS/ECS cluster, ALB, RDS, ElastiCache, etc.) can take a significant amount of time.
- Check Outputs: Once terraform apply completes successfully, examine the Terraform outputs. These will typically include important endpoints, such as the Application Load Balancer's DNS name, which serves as the entry point for API calls and the Admin UI.
- Access Admin UI: Navigate to the ALB DNS name (or your custom domain if configured) in a web browser. The LiteLLM Admin UI should be accessible (the exact path might vary; check the Guidance documentation). Use the
master_key
defined during configuration if prompted for login. - Test API Call: Use curl or an OpenAI-compatible client library (like Python's openai package) to send a test request to the gateway's chat completions endpoint (usually /chat/completions relative to the ALB DNS name). Ensure you target a model_name defined in your
config.yaml
. Example using curl:
- Check Logs: If you encounter issues, check the application logs stored in the configured S3 bucket or potentially CloudWatch Logs depending on the specific configuration.
- Admin UI: Use the LiteLLM Admin UI to manage virtual API keys for different users or applications, potentially adjust model settings dynamically (if supported by the UI version), and monitor basic usage statistics.
- Bastion Host: If you configured the gateway with a private load balancer (
PUBLIC_LOAD_BALANCER="false"
), you will need to set up a bastion host or use AWS Systems Manager Session Manager to access the Admin UI or API endpoint from within your VPC.
Andy Jassy's recent Letter to Shareholders embodies the thinking of everything we do on the AWS Solutions Library team. In the same YQ spirit, why does this matter? Implementing the AWS Guidance for Multi-Provider Generative AI Gateway provides tangible advantages for teams building on AWS:
- Strategic Flexibility: It liberates builders from vendor lock-in, making it easy to select, experiment with, and switch between the best LLMs for specific tasks without costly re-engineering efforts. This agility is crucial in the rapidly evolving generative AI space.
- Operational Efficiency: By standardizing API interactions, centralizing configuration, and automating deployment with Terraform, the gateway significantly reduces the development and ongoing management overhead associated with multi-LLM environments.
- Enhanced Governance & Security: It provides a single point of control for enforcing access policies, applying responsible AI guardrails (leveraging Amazon Bedrock's capabilities even for external models), and securely managing credentials using native AWS services like Secrets Manager and IAM.5
- Cost Optimization: Centralized tracking, caching via ElastiCache, and the ability to set budgets and rate limits provide the necessary tools for understanding, controlling, and optimizing LLM expenditures across the organization.
- Seamless AWS Integration: The solution is built using and deeply integrated with scalable, secure, and familiar AWS services (ALB, ECS/EKS, RDS, ElastiCache, Secrets Manager, WAF, S3), ensuring it operates efficiently within the AWS ecosystem.
Ultimately, this gateway empowers AWS builders. It removes common infrastructure roadblocks and operational friction points, allowing teams to focus their energy on leveraging the diverse power of generative AI to innovate and deliver value faster.
The proliferation of LLMs offers immense potential, but managing this diversity effectively is key to unlocking its value. The AWS Guidance for Multi-Provider Generative AI Gateway provides a practical, secure, and scalable solution for AWS builders facing this challenge. By acting as a central proxy built on open-source LiteLLM and integrated with core AWS services, it simplifies API interactions, centralizes governance and security, enhances observability, and improves cost control.
Builders can leverage this Guidance to gain strategic flexibility, boost operational efficiency, and confidently manage their multi-provider generative AI workloads within the robust AWS environment.
We encourage you to explore the code on GitHub for the complete source code, detailed configuration options, and the latest documentation. Consider deploying the solution in a development environment to evaluate how it can streamline your access to the ever-expanding world of generative AI models on AWS. Embracing centralized management is a crucial step towards building smarter, more agile, and more governable generative AI applications.
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.