Monitoring foundation models with Amazon CloudWatch
Use Amazon CloudWatch to understand your generative AI chatbot's performance and usage by monitoring Amazon Bedrock foundation model logs and metrics
Bennett Borofka
Amazon Employee
Published Aug 20, 2024
Last Modified Aug 22, 2024
As businesses deploy generative AI applications using Amazon Bedrock, it becomes crucial to monitor foundation model performance and user behavior to understand the health and adoption of the application. Amazon Bedrock provides built-in publishing of metrics and logs to Amazon CloudWatch. If you operate a chatbot that uses Amazon Bedrock's Converse API, Amazon CloudWatch provides an easy method for viewing data about your chatbot's usage in a consolidated dashboard of metrics and logs. In this post, I'll walk through how to get started using Amazon CloudWatch dashboards to gain live observability into all of the Amazon Bedrock foundation models used by a generative AI chatbot.
By default, there are nine runtime metrics published to Amazon CloudWatch that provide performance details about individual foundation models used by Amazon Bedrock. These metrics provide insights about how your generative AI chatbot is performing and being used:
CloudWatch runtime metric | Monitoring insight |
---|---|
Invocations | Understanding high and low chatbot usage over time; understanding overall chatbot adoption. |
InvocationLatency , InvocationClientErrors , InvocationServerErrors , InvocationThrottles | Identify issues with your chatbot that are affecting user experience. |
InputTokenCount , OutputTokenCount | Validate the average or trending size of user input prompts over time; verify the response size is expected for the selected foundation model configuration. |
Amazon Bedrock also supports model invocation logging, which is disabled by default. While logs can be sent to either Amazon S3 or Amazon CloudWatch Logs, this post will focus on Amazon CloudWatch Logs.
Foundation model logs provide detailed information about each invocation by your chatbot's users. The logs keep a record of all input and output text:
I'll use these logs to parse the initial prompts users input when using an Amazon Bedrock chatbot. In the above example, the input prompt is
"What is AWS?"
.To create a single dashboard showing consolidated foundation model metrics and logs, I'll use Amazon CloudWatch. Automatic Dashboards provide a starting point dashboard of metrics that we'll modify to include logs.
Before enabling model invocation logging, you'll need to create an Amazon CloudWatch log group. In this example (figure 1), I create a log group named
/aws/bedrock
and set the retention setting to 1 month (30 days)
. Leave the log class as Standard
. Click Create and a new, empty log group will be created.
Note: The retention setting is a balance between how much log history you want retain vs. how much you're willing to pay for storage. For more information about CloudWatch Logs costs, visit the pricing page.
In the Amazon Bedrock console, you'll need to turn on model invocation logging. In figure 2, I select
Cloudwatch Logs only
, and fill-in the /aws/bedrock
Log group name I created in step 1. I also select Create and use a new role in IAM, naming it bedrock-cloudwatch-logs
.Click Save settings and Amazon Bedrock will begin publishing logs to a new log stream under the Amazon CloudWatch log group.
To monitor input prompts from your chatbot's users invoking foundation models, I'll create and save a query in Amazon CloudWatch Logs Insights. This example focuses on the newer Converse API, which is the recommended API for foundation models that support messages. For a Converse API overview, review this post.
The query below will gather log messages from the foundation model logs published to Amazon CloudWatch, with the following conditions:
- Select logs where
Converse
orConverseStream
APIs are used. - Ignore common error logs:
ThrottlingException
andValidationException
. - Parse the first user input message in the log, and remove duplicates.
- Limit the result to the most recent ten logs.
This simple query will give insight to some of the initial prompts used the chatbot's users, which I'll display on the Amazon CloudWatch dashboard.
Note: due to the structure of the log, a single log entry may contain multiplemessages
in an array. In this example, I use theinput.inputBodyJson.messages.0.content.0.text
field to display the first instance of text in a message array.
To create the query, navigate to the Amazon CloudWatch console and open Log Insights. Select the
/aws/bedrock/
log group created in step 1, then paste in the above query and click Run query. Like shown in figure 3, you will see results of the query in the bottom table, so long as there are log messages matching the query within the specified time period.After you run your query successfully, click Save and give it a Query name of
ModelInput
.In the Amazon CloudWatch console, you'll navigate to the Dashboards page and click the Automatic dashboards tab. Bedrock will appear as an available dashboard since the service is publishing metrics to Amazon CloudWatch. Click Bedrock and we'll use the example dashboard as a starting point.
As you'll see in figure 4, the automatic dashboard has 6 metrics displayed on line widgets, showing multiple foundation model datapoints in each chart. Click Add to dashboard and create a new dashboard using this one as the starting point.
After you create your new Amazon CloudWatch dashboard, click the + in the top right to add a new widget. In this example, I'll create a new Logs table widget under the Logs tab as shown in figure 5.
To configure your Logs table widget, select the
ModelInput
query you created earlier in step 3. Click Create widget at the top, as shown in figure 6.After you save your widget, you'll now have a logs table displayed underneath the metrics in the live dashboard, showing a recent list of input prompts for your foundation models. The combined dashboard is shown in figure 7.
Some potential enhancements you can make in Amazon CloudWatch, not covered in this post, are:
- Add additional widgets to the top of the dashboard highlighting key metrics, such as a number widget showing current invocation count and invocation latency.
- Add a static threshold alarm, such as when high throttling or errors occur.
- Add an anomaly detection alarm, such as when any metric goes above or below a range of normal values.
In this post, I walk through creating an Amazon CloudWatch dashboard to view live metrics and logs for an Amazon Bedrock chatbot that uses the Converse API. Using Automatic Dashboards and Logs Insights, you can easily setup a view of your foundation model's usage to understand live chatbot performance. The monitoring capabilities in Amazon CloudWatch helps businesses that deploy and iterate new generative AI chatbot applications to quickly understand user adoption and diagnose issues.
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.