AWS Logo
Menu
On Large Context Transfers in Agentic Workflows

On Large Context Transfers in Agentic Workflows

This blog discusses the challenges of transferring large documents in complex agentic workflows.

Chae
Amazon Employee
Published Jun 3, 2025
In the rapidly evolving landscape of generative AI, agentic frameworks have emerged as powerful tools for orchestrating complex workflows. These frameworks allow large language models (LLMs) to interact with external tools, functions, and even other agents, creating sophisticated systems capable of solving intricate problems. But as these agent architectures grow in complexity, we encounter significant challenges in efficiently transferring large amounts of text between them. This blog explores these challenges and presents a practical solution using AWS services.

The Challenge of Document Transfer in Agentic Workflows

Agentic workflows are changing how we interact with LLMs by enabling them to call tools or functions when needed. The ability of modern LLMs to understand when to use a tool and how to format the necessary parameters has opened up a range of possibilities for automation and augmentation of human tasks. These models can write function calls with appropriate parameters, making them excellent orchestrators for complex workflows.
As agent architectures become more sophisticated, we increasingly see patterns where one agent calls another agent or an LLM as a tool within its workflow. This nested structure often requires passing substantial context between components, such as documents, previous conversation history, or large chunks of generated text. This is where we encounter our challenge.
Protocols like Model Context Protocol (MCP) and Agent2Agent (A2A) require all communications to be formatted as JSON. While this works well for small parameters (like integers or short strings), it becomes problematic when dealing with large text documents. The issue isn't merely about transferring large strings (although that can become a bottleneck); the core problem is that the LLM or agent must write the entire text string as a tool parameter. If multiple tool or agent calls are required, this text may be transferred repeatedly, significantly increasing token usage and cost while also raising the probability of transcription errors.
Consider a scenario where an agent needs to summarize a 50-page document and then pass both the original document and the summary to another agent for analysis. The naive approach would require the first agent to include the entire document text in its call to the second agent, consuming tokens unnecessarily and risking errors in the transcription process.

A Document-Reference Solution

Rather than passing entire documents directly as parameters, a more efficient approach is to pass references to documents stored in a persistent location (for developers this might evoke the pass-by reference/value scheme present in programming languages). These references could be local file paths or URIs pointing to cloud storage locations like S3. This approach requires supporting file operations within the agent framework, such as write_file, read_file, and similar functions.
With this, we can dramatically reduce token usage and eliminate transcription errors when transferring large documents between agents or tools. Instead of passing the entire document content, we simply pass a reference string like s3://your-bucket/documents/large-report.txt, which the receiving tool/agent can use to retrieve the document when needed.

Practical Implementation with Amazon S3

Let's explore how this solution would work using S3 as our document storage layer. First, we need to implement file operation tools that our agents can use:
1. write_to_s3 - takes content and a destination path, then writes the content to S3
2. read_from_s3 - takes an S3 URI and returns the content

Sample Implementation

A Novel-Writing Example

To illustrate the benefits of this approach, let's consider an agentic workflow designed to write a novel. This is an apt example because it involves generating and maintaining a large, growing document that needs to be referenced repeatedly.
In a naive implementation, each time the agent generates a new chapter, it would need to include all previous chapters in its prompt to maintain consistency in plot, characters, and setting. For message passing between other agents/tools, that context has to be written as a JSON to be passed. As the novel grows, this becomes increasingly inefficient and error-prone.
Using our reference-based approach with S3, the workflow would look something like this:
In the above example, instead of passing the full text of previous chapters directly in the prompt, we're passing S3 references. The agent can then retrieve these chapters only when needed, reducing token usage and eliminating the risk of transcription errors.
As the novel grows from 1 to 5 chapters, the token savings become increasingly significant. With a direct approach, by chapter 5, we might be passing 4 full chapters of context (potentially hundreds of thousands of tokens). With our reference-based approach, we're only passing a few URI strings, regardless of how large the novel becomes.

Benefits Beyond Token Efficiency

While token efficiency is the most obvious benefit here, there are a few others worth considering:
  1. Reduced Error Rates: Eliminating the need for the LLM to transcribe large documents, reduces the chance of errors or hallucinations in the transferred content.
  2. Scalability: This approach scales linearly regardless of document size, making it suitable for transferring very large documents that might exceed context windows.
  3. Persistent Storage: Documents stored in S3 persist beyond the lifetime of a single agent run, enabling long-running or multi-session workflows (an error in your code would necessitate starting from scratch in the naive approach).
  4. Access Control: S3 provides robust access control mechanisms, allowing you to restrict which agents or systems can access specific documents.
  5. Version Control: S3 supports versioning, enabling you to maintain a history of document changes throughout a complex agent workflow.

Some Limitations

There are a few limitations that should be considered with this approach:
  1. Access Control (Yep. The same as above): The tool itself needs the appropriate permissions to read-from and write-to S3
  2. Cost Considerations: While you save on token costs, you incur S3 storage and request costs. However, these should be much lower than the equivalent token costs for large documents.
  3. Tool Implementation: This approach requires implementing and maintaining additional tools for file operations.

In Closing

As agent frameworks continue to evolve and tackle increasingly complex tasks, efficient document transfer becomes a requirement for building scalable and cost-effective solutions. By leveraging a reference-based approach, we can overcome the limitations of direct document transfer in JSON parameters, enabling agents to work with documents of any size while minimizing token usage and error rates.
This pattern is particularly valuable for workflows involving iterative document generation or processing, such as our novel-writing example. By storing intermediate results in S3 and passing references rather than full content, we can build agent systems that scale efficiently with document size and complexity.
As you continue to design your agent workflows, consider implementing similar patterns for document transfer. The small upfront investment in building file operation tools will pay significant dividends in reduced token usage, improved reliability, and enhanced scalability as your agent workflows grow in complexity.

References

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments