AWS Logo
Menu
Claude 3.5 Sonnet v2: Double Output Tokens on AWS Bedrock

Claude 3.5 Sonnet v2: Double Output Tokens on AWS Bedrock

Claude 3.5 Sonnet v2 doubles output limit to 8K tokens on Bedrock at the same price. See empirical evidence and learn how this upgrade benefits AI applications still using it.

Jonathan Evans
Amazon Employee
Published Mar 27, 2025
Last Modified Mar 28, 2025

Claude 3.5 Sonnet v2: Double the Output Capacity with 8k Token Limit

While Claude 3.5 Sonnet is now somewhat legacy with the release of Claude 3.7, many developers still use it in their applications. If you're among those continuing to use Claude 3.5, you should know that Anthropic's previous flagship mid-size model on Amazon Bedrock received a significant upgrade over v1. The new version (v2) doubles the output token limit from 4k to 8k tokens while maintaining the same pricing. This post provides empirical evidence of this improvement and demonstrates how it benefits developers still building with Claude 3.5 on Amazon Bedrock.

Why Output Token Limits Matter

Output token limits constrain how much content an AI model can generate in a single response. For applications requiring detailed explanations, code examples, or comprehensive analyses, having a higher output limit means:
  • Fewer API calls to get complete responses
  • More comprehensive answers in a single interaction
  • Reduced latency from fewer round trips
  • Better user experience with cohesive outputs

The Experiment

I designed a simple experiment to verify the difference between Claude 3.5 Sonnet v1 and v2 by:
  1. Creating an identical prompt that would push both models to their limits
  2. Using the AWS Bedrock `converse` API with identical settings
  3. Measuring the actual token usage and stop reasons
  4. Comparing the completeness of responses
You can find a notebook running through this experiment for your own testing at the Anthropic on AWS Github Repo: Here

Setup

I used the following model IDs for testing:
And created a prompt asking for a verbose response:

Results

The results were definitive:
ModelOutput TokensInput TokensStop Reason
Claude 3.5 Sonnet v14096128max_tokens
Claude 3.5 Sonnet v26270128end_turn
Both models attempted to complete the task but Claude 3.5 Sonnet v1 hit it's respective output token limit. The v2 model was able to generate almost twice as much content as v1 before naturally exiting.

Visual Comparison

Chart showing Claude 3.5 Sonnet v1 with 4096 output tokens and v2 with 8192 output tokens
Chart showing Claude 3.5 Sonnet v1 with 4096 output tokens and v2 with 8192 output tokens

Implications for Developers

This doubling of output capacity offers significant advantages:
  1. More comprehensive responses: Get complete answers to complex questions without breaking them into multiple requests
  2. More efficient token usage: Instead of spending tokens on multiple prompt repeats to continue a response, you can get everything in one go
  3. Better for code generation: Code examples, especially with tests (like in our experiment), require significant space to be properly demonstrated
  4. Improved documentation tasks: Generate more detailed documentation, tutorials, or explanations in a single response
  5. Cost efficiency: Same pricing with twice the output capacity means better value

Conclusion

The upgrade from Claude 3.5 Sonnet v1 to v2 represents a significant improvement in output capacity at no additional cost. This means developers can build more sophisticated applications with fewer API calls and better user experiences.
For tasks requiring detailed, lengthy outputs (code generation, documentation, analysis), Claude 3.5 Sonnet v2 is now a much more capable solution on Amazon Bedrock.
Have you tried Claude 3.5 Sonnet v2? Share your experiences in the comments below!
 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments