
Claude's Token Efficient Tool Use on Amazon Bedrock
Learn to implement Claude 3.7's token efficient tool use on Amazon Bedrock. Reduce output tokens by up to 70%, improve latency, and optimize costs.
Jonathan Evans
Amazon Employee
Published Mar 18, 2025
Claude's tool use (function calling) capability enables AI applications to interact with external systems, extract structured data, and perform complex tasks. However, these interactions can consume a significant number of tokens, impacting both cost and latency.
Anthropic has introduced a new beta feature called token efficient tool use for Claude 3.7 Sonnet on Amazon Bedrock. This feature can reduce output token consumption by up to 70% (with an average reduction of 14%) when using tools or function calling with Claude.
In this article, I'll show you how to implement token efficient tool use with three different methods:
- Amazon Bedrock's
InvokeModel
API - Amazon Bedrock's
Converse
API - The AnthropicBedrock SDK
I'll also compare the performance of each method and provide best practices for implementing token efficient tool use in your applications.
Note: All code examples in this article are available in the companion Jupyter notebook in the anthropic-on-aws GitHub repository. This repository contains many other useful examples for working with Claude on AWS!
To follow along with this tutorial, you'll need:
- An AWS account with access to Amazon Bedrock
- Python 3.6+
- Basic understanding of Amazon Bedrock and Claude APIs
Let's start by installing the required dependencies:
First, let's set up our AWS clients:
For our examples, we'll use a simple weather tool:
Let's implement token efficient tool use with Bedrock's InvokeModel API:
The key difference is the addition of the
"anthropic_beta": ["token-efficient-tools-2025-02-19"]
parameter in the request body.The Converse API provides a unified interface for conversational interactions. Here's how to implement token efficient tool use with Converse:
With the Converse API, we add the beta flag in the
additionalModelRequestFields
parameter.The AnthropicBedrock SDK provides a Python-native interface to Anthropic models on Bedrock:
Notice that for token efficient tool use, we use the
beta.messages
client instead of the standard messages
client.I ran several tests with different weather-related prompts to compare token usage between standard and token efficient tool use across all three methods. Here are the results:


Our tests showed consistent token savings of around 14-15% across all three methods, which aligns with Anthropic's claim of an average 14% reduction. Some prompts saw savings as high as 18%, while others were closer to 12%.
Based on our experiments and the documentation, here are some best practices for using token efficient tool use:
- Always benchmark with your specific use case: The token savings can vary significantly depending on your prompts and tools. While the average reduction is around 14%, you may see anywhere from 5% to 70% savings.
- Consistency in caching: If you're using prompt caching along with token efficient tool use, make sure to use the beta header consistently for requests you'd like to cache. Selective use will cause prompt caching to fail.
- SDK version compatibility: Make sure you're using the latest version of the AnthropicBedrock SDK that supports the beta features.
- Not compatible with disable_parallel_tool_use: Token efficient tool use doesn't currently work with the
disable_parallel_tool_use
option. - Response quality monitoring: As a beta feature, it's important to evaluate the quality of responses when using token efficient tool use to ensure that the reduction in tokens doesn't affect quality.
- Latency benefits: Besides token savings, token efficient tool use often results in reduced latency, which can be a significant benefit for interactive applications.
- Use with caution in production: Since this is a beta feature, consider testing thoroughly before deploying to production systems.
Token efficient tool use is a valuable feature for reducing output token consumption and improving latency when using Claude's tool use capabilities. Our experiments showed:
- Token savings ranging from 0% to 29% across different prompts and methods
- An average token savings of about 18% when using the InvokeModel API
- An average token savings of about 13% when using the Converse API
- An average token savings of about 18% when using the AnthropicBedrock SDK
- Significant latency improvements in many cases, sometimes up to 80%
This feature is particularly valuable for applications that make heavy use of tool calls, as the savings can add up significantly over time, reducing both costs and improving the user experience through lower latency.
To enable token efficient tool use, simply add the beta header
token-efficient-tools-2025-02-19
to your requests with Claude 3.7 Sonnet on Amazon Bedrock.Note: All code examples in this article are available in the companion Jupyter notebook in the anthropic-on-aws GitHub repository. This repository contains many other useful examples for working with Claude on AWS!
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.