Tracking Amazon Bedrock usage and costs per tenant in a multi-tenant environment with application inference profiles
Learn how to track Amazon Bedrock usage and costs per tenant in a multi-tenant environment with application inference profiles
Ujwal Bukka
Amazon Employee
Published Apr 1, 2025
Organizations design multi-tenant architectures to efficiently manage resources and scale their applications. With the rise of generative AI, many are leveraging Amazon Bedrock to build multi-tenant GenAI solutions. Bedrock’s capabilities enable businesses to deliver personalized experiences while optimizing resource utilization. However, a key challenge in multi-tenant environments is accurately tracking usage and allocating costs across tenants. This is where Amazon Bedrock's inference profiles provide a crucial solution, allowing organizations to manage GenAI resources effectively and derive precise tenant-level costs. This blog, explores how to use application inference profiles in a multi-tenant architecture to track usage and enhance cost transparency.
Amazon Bedrock inference profiles are resources that define a model and one or more AWS Regions to which model invocation requests can be routed. These profiles are available in two types: system-defined and application inference profiles. System-defined inference profiles, pre-configured by Bedrock, facilitate cross-region inference with a standardized resource management approach. On the other hand, application inference profiles are user-defined, allowing organizations to create custom configurations tailored to their specific needs. With application inference profiles, businesses can apply custom cost allocation tags, enabling detailed tracking of resource usage and expenses. This flexibility is especially valuable in multi-tenant environments, where accurate cost allocation is key to optimizing resource management. Currently Inference profiles can be created for on-demand and batch inference.
Many SaaS providers are building multi-tenant solutions using Amazon Bedrock to interact with foundation models. However, they often face challenges in gaining insights into tenant-specific inference usage and costs. To mitigate this challenge, they can create application inference profiles per tenant, which can help with tracking tenant-specific inference usage and costs. This involves defining custom profiles using a single model ARN or by copying from existing system-defined profiles. Once set up, these profiles can be enriched with custom cost allocation tags to uniquely identify each tenant, enabling precise tenant-specific tracking of resource usage and associated costs.
These profiles provide detailed tracking of usage and costs, for model inference, supporting operations such as
InvokeModel
, InvokeModelWithResponseStream
, Converse
, and ConverseStream
. For knowledge base, when generating response after querying or processing non-textual data. For Amazon Bedrock agents, when agents interact with LLMs. Additionally, they can track the usage and costs for model evaluation, prompt management, and prompt flows.The approach outlined above represents a static configuration for Bedrock agents. However, if agents are shared across multiple tenants, you must dynamically assign the appropriate tenant inference profile at runtime. To achieve this, you can leverage Bedrock inline agents, enabling real-time configuration of inference profiles based on tenant-specific needs.
Let's assume you have created application inference profiles for two tenants: tenant1 (
Tenant1InferenceProfile
) and tenant2 (Tenant2InferenceProfile
). When an inference profile is created, it is assigned a unique inferenceProfileId
. For example, the inferenceProfileId
for Tenant1InferenceProfile
could be '053a3ulue40l'
, while Tenant2InferenceProfile
might have an inferenceProfileId
of '44suuwn3n98t'
.To monitor the usage of Amazon Bedrock LLM models, you need to enable Amazon Bedrock model invocation logging. This ensures that all tenant model invocations made through tenant inference profiles are logged to Amazon CloudWatch, providing visibility into usage patterns.
By default, Amazon CloudWatch logs capture metrics such as
InputTokenCount
, OutputTokenCount
, and other relevant data for each tenant inference profile, as shown below. Using Amazon CloudWatch APIs, you can retrieve these metrics for each tenant inference profile ID, allowing you to track the number of input and output tokens consumed by each tenant.
You can access this usage data in near real-time through Amazon CloudWatch. Once collected, this data can be utilized in various ways, such as implementing throttling mechanisms based on the input/output tokens consumed by each tenant.
In the AWS Billing and Cost Management service, under Cost and Usage Analysis, you can use Cost Explorer to track Amazon Bedrock usage and costs for tenants like tenant1 and tenant2. This service allows you to retrieve data at hourly, daily, or monthly intervals. You can use this data for non-real-time use cases, such as calculating cost per tenant and analyzing total usage per tenant on an hourly, daily, or monthly basis.
In the AWS Console, navigate to the Billing and Cost Management service. Under Cost and Usage Analysis, select Cost Explorer, then choose the Amazon Bedrock service. Apply the tenant tag (e.g.,
tenantId
) to filter usage and costs for a specific tenant. To make this tenant tag available in Cost Explorer, activate it by clicking Cost Allocation Tags under Cost Organization in the Billing and Cost Management service. You can also retrieve this data programmatically using the GetCostAndUsage
API, as shown below.Additionally, in the AWS Billing and Cost Management service, you can export the AWS Cost and Usage Report (CUR) through Data Exports (under Cost and Usage Analysis) to an S3 bucket. By leveraging Amazon Athena, you can analyze or aggregate the data by tenant tags (e.g.,
tenantId
) to derive Amazon Bedrock usage and costs per tenant as shown below
In this blog, we explored the challenges SaaS providers face in tracking tenant-specific usage and costs while building multi-tenant architectures with Amazon Bedrock. We then demonstrated how to leverage Amazon Bedrock inference profiles to create an application inference profile for each tenant. Using these profiles, we showcased how you can monitor tenant-specific Amazon Bedrock usage and costs through Amazon CloudWatch and the AWS Billing and Cost Management service.
AWS allows you to create up to 1,000 inference profiles per region per account, with adjustable limits. For more details, refer to the Amazon Bedrock quotas and limits documentation.
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.