logo
Menu

Designing tools for autonomous AI Agents

Build more effective AI agents by combining general purpose tools with purpose-built specialized tools.

Ross Alas
Amazon Employee
Published Sep 24, 2024
In today's world of generative AI, you can now create agents to automatically perform work on a user's behalf, given just a natural language input. The benefits range from improving employee productivity and automating mundane tasks to executing complex workflows that typically require human input. Understanding how to make these agents more effective is crucial for real-world applications.
Agents are software powered by large language models (LLMs) that act as reasoning engines. They can understand a user's request in natural language, develop a step-by-step plan to fulfill the request, execute each step, and ultimately complete the task—all without requiring explicit programming of exact steps. Amazon Bedrock Agents enables you to create fully-managed agents equipped with a set of tools to fulfill user requests. These tools can range from calling API endpoints and querying databases to creating and running code, and much more—the possibilities are virtually endless.
In this blog, I will discuss how to create more effective agents by introducing several concepts:
  1. General-purpose tools: These can be applied to a wide range of tasks, leading to greater overall capability. However, they require more powerful reasoning to use correctly.
  2. Specialized tools: These are more specific to particular tasks, resulting in faster and more accurate responses but providing a narrower set of capabilities.
Throughout the blog, I will use the example of a SQL agent to illustrate these principles. However, the same concepts can be applied to other types of agents.
Basic agent loop
The basic agent is composed of a large language model (LLM), such as Anthropic Claude 3.5 Sonnet, and programming that enables a back-and-forth conversation between a user and the agent, or between the agent and the tools it has access to.
To showcase the basic agent, refer to Fig. 1 below. The process follows these steps:
  1. The initial user input begins the agent-tool use loop.
  2. The LLM is called and decides whether to:
    a) Call a tool, or
    b) End its turn and return control back to the application/user.
  3. If the LLM decides to call a tool, your application executes the tool and returns the response.
  4. The LLM is invoked again to process the tool's response.
This cycle continues until the agent determines it has fulfilled the user's request or requires further user input.
The basic agent loop
Figure 1. The basic agent loop
Building a SQL Agent with a general purpose SQL tool
A SQL agent is an agent that is specifically designed to answer natural language queries from users using a relational database such as Postgres. The agent will understand the user’s query and given it’s understanding of the database, will create a SQL query, execute it, interpret the results, and finally answer the user’s query.
An agent equipped with a InvokeSQLQuery tool.
Figure 2. An agent equipped with a InvokeSQLQuery tool.
Building on the basic agent, let’s equip the agent with a general-purpose tool that can execute a SQL query against a database called InvokeSQLQuery. The agent is able to create any SQL query given a user’s input then this tool will allow an agent to execute any arbitrary SQL statement such as “SELECT <column> FROM <table>” against a database.
Immense capability with general purpose tools
This SQL tool is highly versatile, allowing potential connections to any relational database that supports SQL statements, such as PostgreSQL, MySQL, or Oracle. Its broad applicability makes it an immensely useful general-purpose tool for a wide range of queries across various domains.
Other examples of general-purpose tools that can significantly enhance an agent's capabilities include:
  1. Web Search Tool: Enables the agent to search the internet for up-to-date information, expanding its knowledge base beyond pre-trained data.
  2. File System Tool: Grants the agent access to read from and write to files, enabling data persistence and manipulation of local resources.
  3. Mathematical Computation Tool: Equips the agent to perform complex calculations or statistical analyses.
These general-purpose tools significantly expand an agent's capabilities, allowing it to tackle a diverse range of tasks across different domains. However, it's important to note that while these tools offer great flexibility, they also require more sophisticated reasoning from the agent to use them effectively in various contexts.
Challenges with general purpose tools
While the capabilities that is added on the agent with a general purpose tool like, InvokeSQLQuery, is immense, it does require significant LLM reasoning capabilities in order to use the tool correctly across different use cases. With SQL, depending on the database version, there’s different SQL syntax and functions that are supported. Even with the same relational database management system (RDBMS) like Postgres, there are differences between versions of Postgres. So, quite frequently, the agent will create a query that results in an error and it would have to reason through the error and correct it’s SQL statement.
Similarly, this applies to other general purpose tools. Imagine you have an API endpoint and you provide the agent with the entire API specification to be able to make requests to the endpoint. You can fulfill a large variety of requests with the agent, but the accuracy and effectiveness of the agent is highly dependent on the LLM capabilities.
Building a SQL agent with a general purpose tools plus purpose-built specialized tools
The flexibility provided by general purpose tools are very useful for an agent but comes at a cost of sometimes inefficient executions as the agents tries to reason through the errors that it encounters. To make agents more effective, we can supplement this general purpose tool with purpose-built specialized tools that act as shortcuts for agents for very common or complex tasks.
An agent equipped both with a general-purpose tool and specialized tools for SQL
Figure 3. An agent equipped both with a general-purpose tool and specialized tools for SQL
Let’s make our basic SQL agent better by adding in four specialized tools:
  • GetDatabaseSchema. Returns all available schemas in the database
  • GetSchemaTables. Returns all available tables for a given schema
  • GetTableColumns. Returns all available columns given a schema and table
  • GetForeignKeyConstraints. Returns all foreign key relationships in the database.
You can provide instructions to the agent to utilize more specific tools first before trying to use general purpose tools such as “Make sure to use more specific SQL tools first before using InvokeSQLQuery”.
These specialized tools are then implemented in your application code and you provide the logic that will consistently and return the right information every single time.
Efficient, effective, and accurate execution using specialized Tools
By providing these specialized tools to the agents in addition to the general purpose tools, you can improve the efficiency of your agents as they no longer have to reason through what is the correct way to use the general purpose tool. They can simply execute the specialized tool that is already available for them leading to more effective execution and more accurate execution.
Narrow scope and more development work with specialized tools
Although specialized tools enable more robust executions of an agent, it does require more development work on part of the developer. The set of capabilities brought on by each specialized tool is a lot more narrower than a general purpose tool. On the other hand, it provides for a more efficient agent.
The capability-efficiency tradeoff of general and specialized tools.
Figure 4. The capability-efficiency tradeoff of general and specialized tools.
Conclusion
As you can see in Fig 4., there is the capability-efficiency tradeoff when designing tools for agents. If you want to give broad set of capabilities in exchange for efficiency, you can create general purpose tools. If you want to give agents fast and effective tools, you can provide it specialized tools. You can combine both to provide the best of both worlds by having specialized tools prioritized first in usage then falling back on using the general purpose tools when needed.
When you are creating your agent and the tools to go along with it, it’s important to take into consideration on what tools you present to your agent to ensure both fast and efficient execution and as well as capabilities.
Further Reading
To learn more agents, check out How Amazon Bedrock Agents works.
For understanding how you can implement tool use, check out Use a tool to complete an Amazon Bedrock model response.
 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments