Agentic AI on AWS, the design patterns and considerations
In this article, we will dive deeper into the understanding of Agentic AI, its components and mapping to AWS service stack, common design patterns and the considerations.
Published Dec 26, 2024
In the fast-evolving landscape of Artificial Intelligence (AI), Agentic AI workflows represent a significant leap forward and a paradigm shift in AI systems design to the structure of AI with features of autonomy, adaptability, and decision-making at its core. Agentic AI moves beyond simple zero-shot prompting to create more sophisticated, autonomous, and capable systems. This is not just about making the AI smarter but is focused on creating systems that can work on their own, make decisions based on data and learn to optimize their implementation. In this article, we will discuss Agentic AI, Its key components and AWS services mapping, common design patterns and the considerations.
At its core, Agentic AI refers to artificial intelligence systems leveraging distributed, intelligent agents that not only act independently but also collaborate to solve highly complex, real-world problems. Unlike traditional AI models that simply respond to prompts or execute predefined tasks, Agentic AI goes beyond it by incorporating a "chaining" capability and are capable of handling complex, multi-step activities utilizing components called Agents that can perceive its environment, make decisions, plan and take actions, and even learn from its experiences – all in pursuit of objectives set by its human creators.
Though Large Language Models (LLMs) are prominent component of Agentic AI. They by themselves are not agents as they are rigid, have no goal and are initiated to perform a particular job, whereas agents are capable of socializing with the AI components, learning from it and volunteering to change its behavior accordingly as time goes on. Therefore, instead of adhering to a binary classification of whether a system is an "agent" or not, the concept of "agentic" allows us to consider a spectrum of agent-like qualities, encompassing a wide range of systems and approaches.
For example, when asked to create a website, an Agentic AI system would autonomously generate a series of goals:
- Develop the website structure and screen layouts
- Generate content for each page
- Write the necessary HTML, CSS, and back-end code
- Design visuals and incorporate graphics
- Test for responsiveness and debug any issues
Think of Agentic AI as a digital assistant on steroids. Instead of just answering your questions or performing simple tasks, it can take initiative, solve complex problems, and adapt its approach based on changing circumstances. It's like having a tireless, hyper-intelligent intern who not only follows your instructions but also anticipates your needs and comes up with creative solutions you might never have considered.
An Agentic AI framework is a mental model to design a smart, autonomous system. It works seamlessly by connecting below six key components that work together to make the AI smarter, faster, and more efficient.
This is where the AI gathers data to understand and comprehend what’s happening around it, which is key to making smart decision. Think of this module as “eyes and ears” of the AI. It collects raw multi-modal data from variety of sources such as cameras, microphones, or other devices, cleans and organizes it for ease of use, and extracts feature and meaningful information from the data to form a complete picture.
AWS offers a comprehensive suite of services to build a robust perception module. These services seamlessly and securely collect, process, analyze, store and manage data at scale from diverse sources, including edge devices, on-premises, and cloud. This empowers organizations to harness the power of multi-modal data, extract valuable insights from data and vectors, which is critical for AI system to comprehend and get grounded with the truth.
This is the core of the Agentic AI framework where intelligence comes into play. Think of this as brain of the AI system. It uses 1) deep learning (DL) models to grasp data and understand what it’s trying to achieve, whether that is improving efficiency or solving a problem. 2) reinforcement learning (RL) to enhance through trial and error and decides the best way to meet those goals. 3) probabilistic reasoning where it works through problems using logic and experience to make decisions under uncertainty. 4) meta-learning adds an extra layer of resilience, permitting your AI to learn how to learn, making it smarter over time.
AWS provides a breadth and depth of generative AI and machine learning services to support the cognitive module of Agentic AI. Some of the key services include Amazon Bedrock, Amazon SageMaker AI, AWS RoboMaker, AWS DeepRacer, AWS Deep Learning Containers (DLC). There are other AWS AI services that may apply based on the use case. These services enable the development of autonomous systems and cognitive module to achieve specific goals, while continuously improving and adapting over time.
Once the AI has a plan, it needs to take action. AI can automate tasks, control devices such as robots’ arm or a drone, monitor execution to track the progress and adjust if needed. Control algorithms direct its actions, robotics, and actuation bring those directions to life, and feedback loops ensure it constantly enhances. In brief, this module allows AI to move and interact with the world. Think of this as hands and legs of AI.
AWS has a range of services to enable the action module. Services like Amazon Q Business, Amazon Bedrock Agents, AWS RoboMaker, Amazon Lex, Amazon Polly, AWS IoT Greengrass, AWS IoT Core, AWS Lambda facilitate integration with various tooling, environments and systems, chaining and execution of complex actions and workflow, as well as development of control algorithms, robotics, and actuation. Additional AWS services may also apply to this module based on the use case, environment, scope and constraints.
This module helps the AI self-improve and get better over time. Like human, it can learn from its experiences using reinforcement learning, historical analysis and continuous optimization techniques.
AWS provides a range of services to help build a continuous learning and feedback loop. Some of the key services include AWS RoboMaker, Amazon SageMaker AI, Amazon Augmented AI (A2I), Amazon Bedrock Knowledge Bases, AWS Glue, AWS Lambda, AWS Batch and AWS Step Functions.
The AI isn’t working alone. It needs to communicate and collaborate with humans, and other systems and technologies such as Customer Relationship Management (CRM), Enterprise Resource Planning (ERP) systems to streamline the workflow. This module ensures smooth teamwork between the AI, humans, and other technologies.
AWS offers an array of purpose build services for communication and collaborations across applications and humans. Some of the key services include Amazon Q Index for ISVs, Amazon Bedrock multi-agent collaboration, Plugins for Amazon Q Business, Amazon Lex, Amazon Polly, Amazon A2I. In addition, there are other AWS services and offerings from ISVs and Partners on AWS Marketplace that may be a fit here depending on the use case and environment scope.
Protecting data and operations is critical, and this module keeps threats at bay. For businesses handling valuable data, this module is a must-have to protect against cyber threats and to deliver a responsible AI systems.
AWS has broadest and deepest set of services when it comes to security and privacy of data, network, access and resources. In addition, AI specific security and privacy capabilities include Amazon Bedrock Guardrails, Amazon Bedrock model evaluation, Amazon Bedrock Knowledge Bases RAG evaluation, Amazon Bedrock model evaluation LLM-as-a-judge, Amazon SageMaker Clarify, fmeval library, Amazon SageMaker Ground Truth and Model Monitor. AWS core security services coupled with AI specific security services allow organizations implement robust safety, security and privacy, fairness and explainability, controllability and transparency, governance and compliance, which are key for a responsible AI system.
Together, these components create a powerful Agentic AI framework that adapts, learns, and delivers value for your business. In addition to AWS native services, you can also use open source frameworks such as Langchain, LIamaindex or crewAI with AWS services or Partner solutions from AWS Marketplace such as Service Operations AI Agents, Content Supply Chain based on your use case. In brief, AWS breadth and depth of offerings not only allows organizations to deploy a highly intelligent, scalable, secure and cost-effective Agentic AI system meeting their current business objectives, but also provides organizations solid foundation and flexibility to continue to enhance and innovate their Agentic AI solution in the future.
Though Agentic AI concept is still fairly new, and a lot of supporting tooling and capabilities are still being developed. At the core of Agentic workflows, below four key design patterns enable LLMs to exhibit more autonomous and intelligent behavior:
It is a technique where AI models self-evaluate and refine their own outputs. This pattern enables AI models to become more autonomous, creative, and reliable by mimicking human-like feedback and revision loops. It is particularly useful for LLMs, allowing them to catch mistakes, clarify ambiguities, and improve over multiple iterations. This process involves three key steps:
- Generation: The AI model generates an initial response to a given prompt or task.
- Self-Reflection: The model scrutinizes its output, identifying potential errors, inconsistencies, or areas for improvement.
- Iterative Refinement: Based on the self-assessment, the model iteratively refines its output, making adjustments to enhance its quality and accuracy.
Below diagram represents the process and flow of the data between steps in the reflection pattern.
This pattern enables LLMs to transcend their natural limitations by interacting with external functions to gather information, perform actions, or manipulate data. Through Tool Use, LLMs are not just confined to producing text responses from their pre-trained knowledge; they can now access external resources and functions to gather latest information, process data or update systems.
The below diagram depicts a conceptual Agentic AI tool-use pattern, where AI system uses multiple specialized tools to process user queries by accessing various information sources.
The planning pattern permits LLM to break down large, complicated tasks into smaller, more manageable components. It equips an agent with the ability to react to requests and strategically structure the steps needed to achieve a goal. Instead of tackling a problem randomly, LLM will create a roadmap of sub-tasks and determine the most efficient path to completion. For example, when coding, LLM would first outline the overall structure before implementing individual functions. This avoids confusion and ensures that the AI keeps track of all steps and doesn’t lose sight of the broader task.
ReAct (Reasoning and Acting) and ReWOO (Reasoning with Open Ontology) further enhance this pattern by bringing decision-making and contextual reasoning into the planning process.
ReAct enables LLM to dynamically alternate between reasoning and acting, allowing for more adaptive and flexible planning. By combining these two steps, LLM can refine its approach iteratively, addressing unexpected challenges as they arise. Whereas, ReWOO uses an open-world ontology to guide reasoning, which decouples reasoning from observations for efficient augmented LLM and reduced token consumption, thereby allowing LLM to incorporate broader contextual information and knowledge from various domains, leading to more informed decision-making. With ReWOO, the AI can adjust the plan in real-time based on newly acquired information or changing requirements, ensuring a more robust and comprehensive problem-solving approach.
The Planning pattern, ReAct, and ReWOO enable LLM to handle complex tasks or multi-phase projects in a structured yet adaptive manner, resulting in efficient and goal-oriented execution to produce high quality and consistent results.
Below diagram depicts the flow and components of this pattern where planning component devises the overall plan (including high level goals and strategies) followed by generate task to create specific tasks that are small and manageable. Single task agent then executes each task using predefined methods such as ReAct or ReWOO, and returns task results for evaluation and any re-planning and adjustments, if required. The iteration continues until satisfactory results are achieved.
The MAC pattern builds upon the concept of delegation. This pattern involves assigning different agents (which are instances of an LLM with specific roles or functions) to handle various subtasks. These agents can work independently on their assignments while also communicating and collaborating to achieve a unified outcome. MAC excels to solve complex tasks or multifaceted use cases that require diverse expertise, parallel processing, a problem needing brainstorming and multiple viewpoints such as research. Open-source packages such as LangGraph by Langchain, CrewAI, ChatDev enable multiple agents to think, reason to solve use cases.
There are several types of multi-agent patterns:
- Collaborative Agents: Multiple agents work together on different parts of a task, sharing progress and building toward a unified result. Each agent may specialize in a different domain.
- Supervised Agents: A central supervisor agent manages other agents, coordinating their activities and verifying results to ensure quality.
- Supervisor (tool-calling): this is a special case of supervisor architecture. Individual agents can be represented as tools. In this case, a supervisor agent uses a tool-calling LLM to decide which of the agent tools to call, as well as the arguments to pass to those agents.
- Hierarchical Teams: A structured system where higher-level agents oversee lower-level agents, with decision-making cascaded through levels to accomplish complex tasks.
- Custom multi-agent workflow: Each agent communicates with only a subset of agents. Parts of the flow are deterministic, and only some agents can decide which other agents to call next.
Below diagram is an example of supervised agents based multi-agent pattern, where supervisor agent handles orchestration, governance, and routing for the two mortgage agents and a knowledge base.
By incorporating these patterns, developers can create workflows that are more flexible, capable, and able to handle a wider range of tasks with greater autonomy.
Below are some of the key considerations to apply in successfully designing a highly robust, scalable and responsible Agentic AI system:
- The Agentic AI system output highly depends on the quality of input data. Your data is also the key to building a unique and differentiated offering. Therefore, chose the platform that allows you to establish a solid data foundation to harness power of the data and aligns with your organization’s data security, scaling, cost objectives.
- Consider applying modularity, scalability, interoperability, and adaptability in your design practice from the get go as these are critical to build a highly dynamic, reusable, extensible and evolvable Agentic AI system.
- Have an organization level unified approach for the "black box" problem that poses accountability and trust challenges, especially in high-stakes applications. For e.g. who's responsible when an Agentic AI makes a mistake? what is the risk associated with it such as financial, legal? what is the mitigation and communication strategy around it?
- When enforcing Agentic AI, prioritize ethical considerations to ensure neutrality, fairness and lucidity to avert biased results. The complex nature of AI models can make their decision-making processes difficult to understand or interpret. Use of diverse, representative and quality data sets along with mechanisms for frequent audits and limpid reporting of AI-decision making processes, model monitoring and drift detection, and explainability can help maintain trust and transparency with the users.
- The system is more susceptible to attacks as it becomes more autonomous and handles increasingly sensitive information. Hence, data privacy and security are paramount. Consider implementing robust safeguards and powerful security conventions such as frequently software updates and security audits, advanced encryption methods, fail-safe and constant monitoring to keep security and integrity of the system.
- Agentic AI systems may interact with numerous other technologies and platforms, which can entangle the incorporation process and lead to complexity. Consider enforcing mechanisms such as establish a thorough integration strategy and routine testing / maintenance schedule, modular design, proper documentation and constant training for your team to help mitigate some of the challenges affiliated with intricacy management.
- AI systems require significant compute power. Consider using Cloud platforms such as AWS, optimize AI models and workflow for efficiency, and edge computing, if applicable, to optimize the resource intensity and cost.
- Leverage prebuild components and purpose build services such as Amazon Bedrock to abstract complexity and quickly deploy your Agentic workflow as opposed to building in-house point solution.
- Tool integration for real-time data and external tool interaction: Instead of starting from scratch, consider using tools and modules offered by your platform or 3rd party provider for real-time data and external system integration.
- When designing perception module, incorporate strategies for efficient memory management for long-term task execution without performance degradation.
- Consider designing the AI workflow that is highly adaptive, dynamic and delivered personalized experiences. Furthermore, build a context-aware interaction for real-world application. For instance, a healthcare AI assistant capable of triaging patients based on the severity of their conditions, ensuring those with critical needs receive immediate attention, would distinguish such systems from others in the field.
- Since the system will run independently and won’t require any human interaction. Consider implementing end-to-end monitoring, logging of agents’ communications, LLMs inputs / outputs, what the agents are doing, what each agent receives and returns from the workflow and human control to shut down or override the operation based on alerts created from the logs/monitoring system.
- Consider putting a process in place to profile and evaluate the system characteristics such as performance, cost, accuracy, the chattiness and functionality. For example, does this need to be an agent or could this be a direct LLM call instead? Is there an opportunity to reuse the agent in other Agentic AI systems? These questions may help refine the design further. Account for multiple rounds of testing, reviews and changes.
- As you work with Agentic AI, allow yourself enough time to learn, explore, experiment and integrate new advancements. Think about cost, agent communication or over-communication, performance and latency. Consider choosing the underlying platform that is highly flexible, scalable, secure and innovating on your behalf.
By applying these considerations, you can efficiently implement Agentic AI and unleash its full potential.
Agentic AI solution offers an incredible opportunity for organizations to transform their workflows, there by unlock significant value by driving efficiency, enhancing decision-making, and accelerating innovation. AWS's leadership in generative AI space, and comprehensive set of services across the Agentic AI workflow empower organizations to quickly deploy the solution and start reaping the benefits. Though all of the design considerations discussed earlier are important to set organizations up on the path to successfully delivering a high performant, secure, cost effective responsible Agentic AI solutions to their users. However, ethical contemplation, improving system security, and managing intricacy are key to a successful and feasible Agentic architecture.