AWS Logo
Menu
Agentic AI Operations Hub with Remote MCP Server on Amazon EKS and Amazon Bedrock

Agentic AI Operations Hub with Remote MCP Server on Amazon EKS and Amazon Bedrock

Authors: Elamaran Shanmugam, Godwin Vincent Sahayaraj, Ramesh Kumar Venkatramam

Elamaran Shanmugam
Amazon Employee
Published May 19, 2025
Last Modified May 22, 2025

The Power of Agentic AI in Operations

In today's data-driven world, managing database operations at scale has become increasingly complex. Organizations struggle with performance optimization, security monitoring, and maintenance across diverse database landscapes. Enter the Agentic AI Operations Hub - a revolutionary platform that combines the power of Model Context Protocol (MCP), Amazon Bedrock, and intelligent agents on Amazon EKS to transform how we manage database operations. Our Agentic AI Operations Hub takes the foundational capabilities of MCP and elevates them through integration with Amazon Bedrock and deployment on Amazon EKS. This creates an intelligent system that not only monitors your databases but understands them deeply enough to suggest optimizations, detect anomalies, and implement fixes automatically.

Understanding the MCP Ecosystem and its configuration on Amazon EKS

The Model Context Protocol (MCP) forms the backbone of our solution, serving as a standardized framework that bridges the gap between applications and Large Language Models (LLMs). Unlike traditional database management tools, MCP enables seamless context sharing and data source access across your entire infrastructure. This standardization ensures that all components in your system can communicate effectively, share insights, and learn from each other's operations.
MCP servers act as intelligent middleware, providing direct database connectivity while managing real-time performance monitoring and tool execution. While developers often use local MCP servers during development, the real power lies in remote MCP servers utilizing Server-Sent Events (SSE) transport. These remote servers enable organizations to centralize their monitoring, maintain consistent analysis tools across teams, and implement standardized performance metrics throughout their infrastructure.
To facilitate seamless connectivity within our Kubernetes environment, we leverage mcp-remote, an npm package that serves as a protocol adapter for remote MCP servers. Since both our MCP servers and the Agentic Application with MCP Client are running on the same EKS cluster, we can utilize Kubernetes' internal DNS for service discovery. This approach enhances security and efficiency by keeping traffic within the cluster. A typical configuration in mcp.json might look like this:
This configuration offers several key advantages:
  • Internal Service Communication: By using Kubernetes service URLs (e.g., postgresql-analyzer-service.default.svc.cluster.local), we ensure that communication between the MCP client and servers occurs entirely within the cluster. This internal routing enhances both security and performance by avoiding external network hops.
  • Service Discovery: Kubernetes' built-in DNS automatically resolves these service names to the appropriate pod IP addresses, allowing for dynamic scaling and failover without reconfiguration.
  • Namespace Isolation: The .default.svc.cluster.local suffix indicates that these services are in the defaultnamespace. This approach allows for clear separation of services across different namespaces if needed for multi-tenancy or environment segregation.
  • Transport Specification: The --transport sse argument specifies the use of Server-Sent Events (SSE) for real-time, unidirectional communication between the client and server.
  • Simplified Security: By keeping traffic internal to the cluster, we reduce exposure to external threats and simplify our security model. There's no need for external load balancers or ingress controllers for these MCP server connections.
  • Scalability: This setup allows for easy horizontal scaling of MCP servers. As new server pods are created or removed, Kubernetes automatically updates the service endpoints.
  • Flexibility: The configuration supports multiple database types (PostgreSQL and MySQL in this example) with distinct MCP servers for each, allowing for specialized analysis and optimization strategies per database technology.
This configuration enables seamless integration with remote MCP servers while maintaining high security standards and leveraging Kubernetes' native networking capabilities. It exemplifies a cloud-native approach to building a robust, scalable, and secure Agentic AI Operations Hub.

Solution Architecture and Workflows:

Below is the architecture for building an Agentic AI Operations Hub with Remote MCP Server on Amazon EKS and Amazon Bedrock:
Agentic AI Operations Hub
Agentic AI Operations Hub

The system's architecture is built on Amazon EKS, with MCP servers handling direct database interactions and Amazon Bedrock providing advanced AI capabilities. Keycloak ensures secure authentication and role-based access control, creating a robust and secure operational environment.
Below is the MCP Bedrock client architecture and workflow of the solution:
MCP Server Client Architecture
MCP Server Client Architecture
BedrockClient is the fundamental gateway to Amazon Bedrock's API, handling all interactions with various AI models. It manages connections through boto3, formats requests appropriately, and implements essential features like response caching and error handling. This component ensures smooth communication with different model families while maintaining performance through rate limiting and regional configurations.
MCPServerConnection serves as the dedicated handler for Machine Control Protocol (MCP) server interactions. It establishes and maintains server-sent events connections, manages tool discovery, and processes tool execution results. The component operates asynchronously, ensuring efficient communication while handling connection lifecycles and providing robust error management for reliable server interactions.
MCP-BedrockOrchestrator acts as the central coordinator between Bedrock and MCP servers, effectively managing multi-server connections and routing tool calls. It processes user queries through Bedrock and ensures proper formatting of responses, while maintaining conversation flow and tool execution. This component is crucial for seamless integration between different services and provides an interactive command-line interface for user interaction.
ChatMemory functions as the conversation management system, maintaining a comprehensive history of interactions between users and the assistant. It stores messages, tool requests, and results while providing essential context for future interactions. The component offers configurable memory size and conversation summarization capabilities, ensuring consistent and contextually relevant responses throughout the conversation.
The API layer, built with FastAPI, provides the web interface that makes the entire system accessible through browsers. It handles client sessions, manages form submissions, and renders HTML templates for the user interface. This component operates asynchronously to handle multiple requests efficiently while providing clear error reporting and model selection interfaces, making the system user-friendly and accessible.
Below is the user query workflow:
User Flow
User Flow
When a user submits a query through either the CLI or web UI, it triggers a sophisticated yet efficient workflow. The MCP-Bedrock Orchestrator processes the query and maintains chat memory for context. Amazon Bedrock then processes these queries using available tools, with the orchestrator routing tool calls to appropriate MCP Server Connections. This seamless integration ensures quick and accurate responses while maintaining operational efficiency.
Real-World Implementation: Database Performance Optimization
Consider a scenario where a critical PostgreSQL database is experiencing performance issues. Our system automatically analyzes query patterns like:
The system doesn't just identify problematic queries; it suggests optimizations such as index recommendations, table partitioning strategies, and query rewrites. More importantly, it can implement these solutions automatically while monitoring their impact, creating a truly self-healing database environment.
Solution Walkthrough
Pre-requisites:
  • Access to an AWS Account from your terminal
  • A Route53 Domain for your Agentic AI Operations Hub frontend.
  • The following tools:
    • Podman for building and pushing container images to Amazon Elastic Container Registry (Amazon ECR)
    • Helm 3.9+
    • kubectl
    • AWS Command Line Interface (AWS CLI)
  • An Amazon EKS Cluster with Day-2 operational tools such as Ingress Controller, ALB Controller, KeyCloak. You can create your EKS Cluster using AWS console or eksctl or EKS CDK Blueprints or EKS Terraform Blueprints or GitOps or any other mechanism your organizations use. Building an EKS Cluster is outside the scope of this blog
  • An Amazon Aurora Postgresql database. To successfully operate this solution, please create tables, upload data and load your Postgresql database using the instructions on our migration samples repository. Please make sure to create the database is created in same VPC as your Amazon EKS cluster and handle ingress to DB security groups accordingly to handle inbound traffic from EKS cluster for operations of the Agentic AI operations hub. Also create a AWS Secrets Manager secret with key values as shown below with name dev/db/credentials for the MCP Server to access the database.
Postgresql Secret
Postgresql Secret
  • An Amazon Aurora Mysql database. To successfully operate this solution, please create tables, upload data and load your Postgresql database using the instructions on our migration samples repository. Please make sure to create the database is created in same VPC as your Amazon EKS cluster and handle ingress to DB security groups accordingly to handle inbound traffic from EKS cluster for operations of the Agentic AI operations hub. Also create a AWS Secrets Manager secret as shown below with name dev/mysql/credentials for the MCP Server to access the database.
Mysql Secret
Mysql Secret
Step 1: Lets start with KeyCloak configuration for protecting the UI with authentication and authorization mechanisms for your Agentic AI Application:
Create and configure your realm/client application as shown below in your KeyCloak running in your EKS cluster. Make sure to copy the client Secret from Credentials Tab.
KeyCloak Client Setup
KeyCloak Client Setup

Create a Realm role by name admin and create a new user for the realm by aiuser. Attach the Realm role admin to the aiuser and to the fastapi-client created in previous step:
KeyCloak User and Role Setup
KeyCloak User and Role Setup

Step 2: Clone the repository MCP client repository and create a container image for your Agentic AI LLM Application.
Clone the repository MCP client repository:
Update the keycloak configuration in api2.py as shown below:
KeyCloak config in Agentic AI App
KeyCloak config in Agentic AI App

Now Lets build the Container image for the Agentic AI app:
Step 3: Clone the repository Postgresql MCP Server and create a container image for your Postgresql MCP Server:
Step 4: Clone the repository MySql MCP Server and create a container image for your Mysql MCP Server:
Step 5: Login in to your AWS Console and create IAM required IAM policy and roles:
  • Create an IAM Role with policies such as AmazonBedrockFullAccess, AmazonRDSFullAccess, SecretsManagerReadWrite.
  • Following should be added to the trust relationship of above role:
  • Next, navigate to EKS Console and create two POD Identity associations as shown below based the IAM role created on your EKS Cluster:
Pod Identity for Accessing Bedrock and Secrets Manager
Pod Identity for Accessing Bedrock and Secrets Manager
Step 6: Deploy the Postgresql MCP Server:
Note: Make sure your update your container image uri and url correctly in deployment.yaml before deploying.
Step 7: Deploy the Mysql MCP Server:
Note: Make sure your update your container image uri and url correctly in deployment.yaml before deploying.
Step 8: Deploy your Agentic AI LLM Application which has the UI and MCP Client:
Note: Make sure your update your container image uri and url correctly in deployment.yaml before deploying.
Validate the deployment of your MCP Server and LLM App in your EKS cluster as shown below:

Agentic AI Operations Hub in Action:

Lets now navigate to the URL https://aiagent.<YOUR_DOMAIN.COM> which will take you to KeyCloak screen to authenticate the aiuser as shown below:
KeyCloak Identify Federation
KeyCloak Identify Federation
Upon signing in, you will be taken the home page of the Operations Hub as shown below:
Agentic AI Operations Hub
Agentic AI Operations Hub
In the above screen, click on Start a New Session, select mysql from list of MCP Servers configured and click Connect :
Select MCP Server
Select MCP Server
In the below chat session, type in a prompt I'm new to this database and need to understand its structure before making any optimizations as shown below :
Create a Chat Session and Run a prompt
Create a Chat Session and Run a prompt

Now in the the below screen, type in the secret dev/mysql/credentials of our mysql DB to give secret context to the MCP server for live tooling interaction for the Agentic AI and Bedrock LLM integration to provide response to the user:
Secret Name of our Mysql DB
Secret Name of our Mysql DB
Here is the output from the Agentic AI App on the database structure of the mysql DB after interacting with MCP Server for Mysql and Amazon Bedrock for LLM integration:
Output showing Analzed DB Structure
Output showing Analzed DB Structure
You can run similar queries on both Postgres and Mysql databases from your Agentic AI app which interacts with MCP servers and Amazon Bedrock accordingly to provide right response to the user. As an exercise we recommend you to try out this prompt Our application is running slowly. Can you help me identify and analyze slow database queries? next.

Security and Future Scope

Security remains paramount in our implementation, with all communications encrypted and authentication handled by Keycloak. Role-based access control ensures that different operational tasks are appropriately restricted, while comprehensive audit logging maintains transparency. Looking ahead, we're actively expanding the platform's capabilities to support additional database types, Kubernetes cluster operations, network infrastructure management, and CI/CD pipeline optimization. This evolution will create an even more comprehensive operational intelligence system.

Conclusion

The Agentic AI Operations Hub represents more than just another database management tool; it's a paradigm shift in how we approach operational challenges. By combining MCP's standardization capabilities with Amazon Bedrock's AI prowess and intelligent agents, we're creating a future where database management is proactive, intelligent, and efficient. Whether you're managing a single database or a complex multi-database environment, the Agentic AI Operations Hub provides the intelligence and automation needed to optimize performance, ensure security, and maintain operational excellence. Ready to transform your database operations? Visit our GitHub repository [link] to get started with your own implementation.

About the Author

Elamaran (Ela) Shanmugam is a senior container specialist solutions architect at Amazon Web Services with over 15 years of experience in enterprise systems and infrastructure. Ela specializes in container technologies, app modernization, observability, and machine learning, helping AWS customers and partners design scalable and secure container workloads. Based in Tampa, Florida, Ela contributes to open source projects, speaks at events, mentors, and creates technical content. You can find Ela on Twitter @IamElaShan and on GitHub @elamaran11.
Godwin Sahayaraj Vincent is an Enterprise Solutions Architect at AWS who is passionate about Machine Learning and providing guidance to customers to design, deploy and manage their AWS workloads and architectures. In his spare time, he loves to play cricket with his friends and tennis with his three kids.
Ramesh Kumar Venkatraman is a Senior Solutions Architect at AWS who is passionate about Generative AI, Containers and Databases. He works with AWS customers to design, deploy and manage their AWS workloads and architectures. In his spare time, he loves to play with his two kids and follows cricket.
 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments