Getting started with Bedrock application inference profile
Learn how Amazon Bedrock's application inference profiles help track and allocate foundation model costs across teams using AWS cost allocation tags.
Published Nov 8, 2024
Last Modified Nov 25, 2024
Amazon Bedrock introduced a new feature called application inference profiles, enabling customers to track their generative AI costs and resource utilization. With this feature, customers can allocate costs for their on-demand foundation model usage across different departments, teams, and applications.
This feature addresses a critical requirement for enterprise cost management and attribution. Customers can now apply custom cost allocation tags to their Amazon Bedrock on-demand model invocations. This blog post provides you with steps and resources to get started.
Amazon Bedrock offers two kinds of inference profiles
- Cross region inference profiles – These are inference profiles predefined by Bedrock service (system defined) and routs the model requests to other regions and include multiple regions to which requests for a model can be routed. This helps to improve resilience and improve throughputs. You can effectively manage traffic bursts with this feature.
- Application Inference profiles - These are inference profiles created by users (user defined). This helps our customers to track costs and model usage. You can create an inference profile that routes model invocation requests to one region (with foundation model identifier) or to multiple regions (by using cross region inference profile identifier).
With application inference profile you have following benefits
- Track usage metrics – When you enable model invocation logging and record to CloudWatch logs, you can track requests submitted with an application inference profile to view usage metrics.
- Use tags to monitor costs – You can attach tags to an application inference profile and track costs for on-demand model invocation requests.
- Cross-region inference – Increase your throughput by using a cross regional inference profile when creating the application inference profile to distribute invocations across regions.
Application inference profiles can be used in following scenarios.
- Model inference – this supports InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream operations
- Model evaluation – model evaluation job to perform evaluation with inference profile ARN
- Knowledge base - use an inference profile when generating a response after querying a knowledge base or when parsing non-textual information in a data source using Advanced parsing options.
- Prompt management – Use inference profile when constructing a prompt with specific foundation models
- Prompt flows – You can use an inference profile in a prompt flow
Before using application inference profiles, you must:
- Have appropriate IAM permissions through either the AmazonBedrockFullAccess policy or custom policies
- Request access to the models and regions defined in the inference profiles
- Ensure proper configuration of the required API permissions for inference profile-related actions
Specifically the role you are assuming needs to have permissions for following actions in the IAM policy
You can restrict to specific resources by applying "Resources" tag in the IAM policy.
Now that you have met the requisites lets get started
Ensure that you have the latest version of boto3 APIs to make the API calls. You can upgrade the via pip install upgrade command
Now that we have the libraries imported, clients setup, lets check if we have any inference profiles created in your account already. You can do that via list_inference_profiles API call. Ensure you pass APPLICATION as the type of the profile. By default, it lists SYSTEM_DEFINED profiles that are cross region inference profiles.
You can setup application inference profile with single region or multi-region to track usage and costs when invoking a model.
To create an application inference profile for one Region, specify a foundation model's ARN. Usage and costs for requests made to that Region with that model will be tracked. When creating the request you will supply following parameters
- inferenceProfileName - name for the inference profile
- modelSource - For single region you specify ARN of the foundation model in the copyFrom attribute
- description - Description of the inference profile (optional)
- tags - Attach tags to the inference profile. You can track costs using AWS cost allocation tags. This could be your project ID, department ID or how you want to track the cost
You will pass model ARN as part of the input. model ARN is the ARN of specific model for example, arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0'. You can get model ARN from Bedrock console or by executing list_foundation_models API call.
The tags you attach to the inference profile, In the above example tags are assigned to a specific project. It can be used to filter the cost allocation report. You can assign tags and attribute the costs to different departments or teams.
In the response of the API call, you will get ARN of the inference profile.
Now that we have created the application inference profile, we can get its details with get_inference_profile command as below
To use an inference profile specify the ARN) of the inference profile in the modelId field
After the profile is created, you can associate additional tags with it. You can also remove tags associated.
You can view the current tags attached to the resource with list_tags_for_resource API call.
Add new tag with tag_resource API call.
You can view the new list of tags again with list_tags_for_resource API call.
If you want to remove a tag already associated with the resource, use untag_resource API call.
To create an application inference profile across regions, specify cross region inference profile's ARN and rest of the parameters remain same as single region application inference profile
- inferenceProfileName - name for the inference profile
- modelSource - For multi region application profile you specify ARN of the cross region (system-defined) inference profile in the copyFrom attribute
- description - Description of the inference profile (optional)
- tags - Attach tags to the inference profile. You can track costs using AWS cost allocation tags. This could be your project ID, department ID or how you want to track the cost
The key difference between single region profile and multi-region profile is the model source. Here you will pass the ARN of the cross region inference profile
To complete the cycle, you can remove inference profiles that are longer longer required via delete_inference_profile API. Below code removes all application inference profiles.
You can monitor requests and usage metrics at the application inference profile level. To perform this, enable model invocation logging in Bedrock service settings and record to CloudWatch logs. Then from CloudWatch console, you can track requests submitted with an application inference profile to view usage metrics.
You can find an example notebook in this repo.
Set up a model invocation resource using inference profiles
IAM Policies and permissions for application inference profile
Create an application inference profile
Bedrock boto3 - API reference
IAM Policies and permissions for application inference profile
Create an application inference profile
Bedrock boto3 - API reference
Thank you for taking the time to read and engage with this article. Your support in the form of following me and sharing the article is highly valued and appreciated. The views expressed in this article are my own and do not necessarily represent the views of my employer. If you have any feedback and topics you want to cover, please reach me at LinkedIn.