Getting started with Bedrock application inference profile

Introduction

Amazon Bedrock introduced a new feature called application inference profiles, enabling customers to track their generative AI costs and resource utilization. With this feature, customers can allocate costs for their on-demand foundation model usage across different departments, teams, and applications.

This feature addresses a critical requirement for enterprise cost management and attribution. Customers can now apply custom cost allocation tags to their Amazon Bedrock on-demand model invocations. This blog post provides you with steps and resources to get started.

Amazon Bedrock Inference Profiles

Amazon Bedrock offers two kinds of inference profiles

Cross region inference profiles – These are inference profiles predefined by Bedrock service (system defined) and routs the model requests to other regions and include multiple regions to which requests for a model can be routed. This helps to improve resilience and improve throughputs. You can effectively manage traffic bursts with this feature.
Application Inference profiles - These are inference profiles created by users (user defined). This helps our customers to track costs and model usage. You can create an inference profile that routes model invocation requests to one region (with foundation model identifier) or to multiple regions (by using cross region inference profile identifier).

With application inference profile you have following benefits

Track usage metrics – When you enable model invocation logging and record to CloudWatch logs, you can track requests submitted with an application inference profile to view usage metrics.
Use tags to monitor costs – You can attach tags to an application inference profile and track costs for on-demand model invocation requests.
Cross-region inference – Increase your throughput by using a cross regional inference profile when creating the application inference profile to distribute invocations across regions.

Usage Scenarios

Application inference profiles can be used in following scenarios.

Model inference – this supports InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream operations
Model evaluation – model evaluation job to perform evaluation with inference profile ARN
Knowledge base - use an inference profile when generating a response after querying a knowledge base or when parsing non-textual information in a data source using Advanced parsing options.
Prompt management – Use inference profile when constructing a prompt with specific foundation models
Prompt flows – You can use an inference profile in a prompt flow

Prerequisites

Before using application inference profiles, you must:

Have appropriate IAM permissions through either the AmazonBedrockFullAccess policy or custom policies
Request access to the models and regions defined in the inference profiles
Ensure proper configuration of the required API permissions for inference profile-related actions

Specifically the role you are assuming needs to have permissions for following actions in the IAM policy

You can restrict to specific resources by applying "Resources" tag in the IAM policy.

Now that you have met the requisites lets get started

Install Dependencies

Ensure that you have the latest version of boto3 APIs to make the API calls. You can upgrade the via pip install upgrade command

List Inference Profiles

Now that we have the libraries imported, clients setup, lets check if we have any inference profiles created in your account already. You can do that via list_inference_profiles API call. Ensure you pass APPLICATION as the type of the profile. By default, it lists SYSTEM_DEFINED profiles that are cross region inference profiles.

You can setup application inference profile with single region or multi-region to track usage and costs when invoking a model.

Setup single Region application inference profile

To create an application inference profile for one Region, specify a foundation model's ARN. Usage and costs for requests made to that Region with that model will be tracked. When creating the request you will supply following parameters

inferenceProfileName - name for the inference profile
modelSource - For single region you specify ARN of the foundation model in the copyFrom attribute
description - Description of the inference profile (optional)
tags - Attach tags to the inference profile. You can track costs using AWS cost allocation tags. This could be your project ID, department ID or how you want to track the cost

You will pass model ARN as part of the input. model ARN is the ARN of specific model for example, arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0'. You can get model ARN from Bedrock console or by executing list_foundation_models API call.

The tags you attach to the inference profile, In the above example tags are assigned to a specific project. It can be used to filter the cost allocation report. You can assign tags and attribute the costs to different departments or teams.

In the response of the API call, you will get ARN of the inference profile.

Now that we have created the application inference profile, we can get its details with get_inference_profile command as below

Example usage with Converse API

To use an inference profile specify the ARN) of the inference profile in the modelId field

Example usage with Invoke model API

Tag and Untag profiles

After the profile is created, you can associate additional tags with it. You can also remove tags associated.

You can view the current tags attached to the resource with list_tags_for_resource API call.

Add new tag with tag_resource API call.

You can view the new list of tags again with list_tags_for_resource API call.

If you want to remove a tag already associated with the resource, use untag_resource API call.

Multiple regions application Inference Profile

To create an application inference profile across regions, specify cross region inference profile's ARN and rest of the parameters remain same as single region application inference profile

inferenceProfileName - name for the inference profile
modelSource - For multi region application profile you specify ARN of the cross region (system-defined) inference profile in the copyFrom attribute
description - Description of the inference profile (optional)
tags - Attach tags to the inference profile. You can track costs using AWS cost allocation tags. This could be your project ID, department ID or how you want to track the cost

The key difference between single region profile and multi-region profile is the model source. Here you will pass the ARN of the cross region inference profile

Delete Inference Profiles

To complete the cycle, you can remove inference profiles that are longer longer required via delete_inference_profile API. Below code removes all application inference profiles.

Monitor in CloudWatch Logs

You can monitor requests and usage metrics at the application inference profile level. To perform this, enable model invocation logging in Bedrock service settings and record to CloudWatch logs. Then from CloudWatch console, you can track requests submitted with an application inference profile to view usage metrics.

Amazon Bedrock- CloudWatch logs monitoring application inference profiles

You can find an example notebook in this repo.

References:

Set up a model invocation resource using inference profiles
IAM Policies and permissions for application inference profile
Create an application inference profile
Bedrock boto3 - API reference

Thank you for taking the time to read and engage with this article. Your support in the form of following me and sharing the article is highly valued and appreciated. The views expressed in this article are my own and do not necessarily represent the views of my employer. If you have any feedback and topics you want to cover, please reach me at LinkedIn.

Site Terms, Privacy, and more.