logo
Menu

AI Agents Delegating to Other Agents

Improve your AI agent systems through agent task delegation.

Ross Alas
Amazon Employee
Published Oct 1, 2024
Follow me on LinkedIn for more AWS & GenAI-related content: www.linkedin.com/in/ross-alas
In my previous discussions, I explored the concepts of building custom AI agents and designing tools for autonomous AI agents. Today, let's explore these ideas further by introducing a new concept: Agent Task Delegation.

Limitations of Single AI Agents

To understand why we need agent task delegation, it's important to first consider the limitations of a single AI agent.
One of the primary limitation of large language models powering AI agents are their context window. The context window is how many tokens, a rule of thumb is 100 tokens is about 75 words, a large language model can process at a time. Anthropic Claude 3.5 Sonnet, for example, has a context window of 200k tokens. As a conversation progresses and as the AI agent do things, your application needs to record the user and assistant messages in the Messages List (see my previous blog on how this works). The entire messages list is passed into the LLM in every call. As you can imagine, you can quickly hit context window limitations especially for long-running and complex tasks. There are different ways to deal with the growing messages list such as summarization, retaining only the last k messages, a combination of both, or using a vector store and so on. There are pros and cons to each of these. For example, one of the challenges with a summarization approach or last k messages approach is that the agent tends to lose track of what it has done. Another challenge is that as the message list grows and the number of tokens grows, LLM performance tends to degrade as the context grows in terms of quality and retrieval performance. I'll talk more about these approaches in future blog posts but for now let's focus on Agent Task delegation as a way to overcome these challenges.

Understanding Agent Task Delegation

agent task delegation patterns including parallelization, specialization, and dynamic model
Figure 1. Agent task delegation patterns
The idea behind Agent Task Delegation is simple: Provide an agent a tool to be able to create other sub-agents and provide them an objective that they must accomplish and what is the expected output. The tool, in my case the DelegateToAgent tool, is able to create these agent workers, run them independently of the calling agent, and then return its response. The agent workers themselves can be equipped with the same DelegateToAgent tool and they too can call other agents resulting in nested delegation.
Since the sub-agents that gets started have independent message lists from the calling agent, this results in greatly multiplying the working capacity (context window) of the entire system as a whole. For example, if the main agent uses Anthropic Claude 3.5 Sonnet which has 200k context window, creates three other agents, each of those agents have its own 200k context windows.
Fig 1. shows just a few examples of agent delegation patterns. Let's go through each one.
Increasing working capacity and parallelization
In Fig1a. shows an agent being given a task to research 30 different companies. The first agent creates a plan and determines to delegate to 3 agents with 10 companies each. Each of these agents are also equipped with the DelegateToAgent tool. Perhaps the agent that receives 10 companies maybe will delegate it to two other agents because each set represents two different industries. How these agents are delegated to you can control through prompt engineering, you could very well spin off 30 different agents with 1 agent per company.
Agent task delegation enables you to parallelize your agent systems leading to faster and potentially higher quality results.
Improving performance through specialization
Jack-of-all-trades agents seldom do well in every kind of task. By creating specialized agents, agents that are equipped with specialized tools and prompts that are tuned for the task such as a SQL Agent, you can delegate requests to these specialized agents to get higher quality results. You can see in Fig 1b that the first agent will route requests to a SQL Agent, then feed those results into a Data Science agent, then finally into a Copy Writing Agent.
Specialization is important as your toolset grows, maybe you'll have 30-50 different tools, trying to do it all in one agent isn't going to perform as well, as agents will have to reason through more tools, and will be more costly to run as each tool adds to the input tokens of the agent.
Optimizing costs through Dynamic Model Selection
Some tasks are more simple than others, such as summarization, and more complex than others, such as complex data analysis. You can leverage more lightweight yet capable models such as Anthropic Claude 3 Haiku for simple tasks. Then for more complex tasks you can leverage more capable but more expensive models such as Claude 3.5 Sonnet. In Fig 1c, you can see that the main agent will delegate to a simple task agent for summarization and data analysis to the complex task agent.
In implementation, you can include a Complexity parameter in the call and select the appropriate model id for the task at hand.

Let's see this in action

I'll show you an example of how parallelization works in action. Let's research generative AI in three industries: Banking, Automotive, and Energy.
starting the agent
Figure 2. Starting the agent
After receiving this request, the agent will create a plan:
the plan of the agent
Figure 3. The plan of the agent
As you can see above, it will create three other agents to perform the work. Finally, once the work has been done, the final output is generated.
the final output
Figure 4. The final output

Considerations

While agent systems like these are very powerful, do consider the following as you are developing for production:
  • Increased latencies to fulfill a request as more agents come into play. You can provide visual indicators that it is working or provide intermediate outputs so that your application seems more responsive.
  • You can parallelize up to a point where you may hit your AWS quota limits for Amazon Bedrock or the instances that you're using.
  • Combine all three patterns in Fig 1. such as dynamic model selection combined with specialized agents to optimize for cost and performance.

Next Steps

Check out Amazon Bedrock Agents to build a fully-managed agent on AWS
Follow me on LinkedIn for more GenAI content!
 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments