
How to Deploy the Milvus Vector Database on Amazon EKS
Learn to deploy open-source Milvus on Amazon EKS with AWS services and test it using Python—perfect for scalable vector search in AI apps.
Published Jun 6, 2025
Last Modified Jun 7, 2025
The rise of Generative AI (GenAI), particularly large language models (LLMs), has significantly boosted interest in vector databases, establishing them as an essential component within the GenAI ecosystem. As a result, vector databases are being adopted in increasing use cases.
An IDC Report predicts that by 2025, over 80% of business data will be unstructured, existing in formats such as text, images, audio, and videos. Understanding, processing, storing, and querying this vast amount of unstructured data at scale presents a significant challenge. The common practice in GenAI and deep learning is to transform unstructured data into vector embeddings, store, and index them in a vector database like Milvus or Zilliz Cloud (the fully managed Milvus) for vector similarity or semantic similarity searches.
But what exactly are vector embeddings? Simply put, they are numerical representations of floating-point numbers in a high-dimensional space. The distance between two vectors indicates their relevance: the closer they are, the more relevant they are to each other, and vice versa. This means that similar vectors correspond to similar original data, which differs from traditional keyword or exact searches.

Figure 1: How to perform a vector similarity search
The ability to store, index, and search vector embeddings is the core functionality of vector databases. Currently, mainstream vector databases fall into two categories. The first category extends existing relational database products, such as Amazon OpenSearch Service with the KNN plugin and Amazon RDS for PostgreSQL with the pgvector extension. The second category comprises specialized vector database products, including well-known examples like Milvus, Zilliz Cloud (the fully managed Milvus), Pinecone, Weaviate, Qdrant, and Chroma.
Embedding techniques and vector databases have broad applications across various AI-driven use cases, including image similarity search, video deduplication and analysis, natural language processing, recommendation systems, targeted advertising, personalized search, intelligent customer service, and fraud detection.
Milvus is one of the most popular open-source options among the numerous vector databases. This post introduces Milvus and explores the practice of deploying Milvus on AWS EKS.
Milvus is a highly flexible, reliable, and blazing-fast cloud-native, open-source vector database. It powers vector similarity search and AI applications and strives to make vector databases accessible to every organization. Milvus can store, index, and manage a billion+ vector embeddings generated by deep neural networks and other machine learning (ML) models.
Milvus was released under the open-source Apache License 2.0 in October 2019. It is currently a graduate project under LF AI & Data Foundation. At the time of writing this blog, Milvus had reached more than 50 million Docker pull downloads and was used by many customers, such as NVIDIA, AT&T, IBM, eBay, Shopee, and Walmart.
As a cloud-native vector database, Milvus boasts the following key features:
- High performance and millisecond search on billion-scale vector datasets.
- Multi-language support and toolchain.
- Horizontal scalability and high reliability even in the event of a disruption.
- Hybrid search, achieved by pairing scalar filtering with vector similarity search.
Milvus follows the principle of separating data flow and control flow. The system breaks down into four levels, as shown in the diagram:

Figure 2 Milvus Architecture
- Access layer: The access layer is composed of a group of stateless proxies and serves as the system’s front layer and endpoint to users.
- Coordinator service: The coordinator service assigns tasks to the worker nodes.
- Worker nodes: The worker nodes are dumb executors that follow instructions from the coordinator service and execute user-triggered DML/DDL commands.
- Storage: Storage is responsible for data persistence. It comprises a meta storage, log broker, and object storage.
Milvus supports three running modes: Milvus Lite, Standalone, and Distributed.
- Milvus Lite is a Python library that can be imported into local applications. As a lightweight version of Milvus, it is ideal for quick prototyping in Jupyter Notebooks or running on smart devices with limited resources.
- Milvus Standalone is a single-machine server deployment. If you have a production workload but prefer not to use Kubernetes, running Milvus Standalone on a single machine with sufficient memory is a good option.
- Milvus Distributed can be deployed on Kubernetes clusters. It supports larger datasets, higher availability, and scalability, and is more suitable for production environments.
Milvus is designed from the start to support Kubernetes, and can be easily deployed on AWS. We can use Amazon Elastic Kubernetes Service (Amazon EKS) as the managed Kubernetes, Amazon S3 as the Object Storage, Amazon Managed Streaming for Apache Kafka (Amazon MSK) as the Message storage, and Amazon Elastic Load Balancing (Amazon ELB) as the Load Balancer to build a reliable, elastic Milvus database cluster.
Next, we’ll provide step-by-step guidance on deploying a Milvus cluster using EKS and other services.
We’ll use AWS CLI to create an EKS cluster and deploy a Milvus database. The following prerequisites are required:
- A PC/Mac or Amazon EC2 instance with AWS CLI installed and configured with appropriate permissions. The AWS CLI tools are installed by default if you use Amazon Linux 2 or Amazon Linux 2023.
- EKS tools installed, including Helm, Kubectl, eksctl, etc.
- An Amazon S3 bucket.
- An Amazon MSK instance.
- The latest stable version of Milvus (v2.5.12) depends on Kafka’s
autoCreateTopics
feature. So when creating MSK, we need to use a custom configuration and change theauto.create.topics.enable
property from the defaultfalse
totrue
. In addition, to increase the message throughput of MSK, it is recommended that the values ofmessage.max.bytes
andreplica.fetch.max.bytes
be increased. See Custom MSK configurations for details.
- Milvus does not support MSK’s IAM role-based authentication. So, when creating MSK, enable
SASL/SCRAM authentication
option in the security configuration, and configureusername
andpassword
in the AWS Secrets Manager. See Sign-in credentials authentication with AWS Secrets Manager for details.

Figure 3: Security settings: enable SASL/SCRAM authentication
- We need to enable access to the MSK security group from the EKS cluster’s security group or IP address range.
There are many ways to create an EKS cluster, such as via the console, CloudFormation, eksctl, etc. This post will show how to create an EKS cluster using eksctl.
eksctl
is a simple command-line tool for creating and managing Kubernetes clusters on Amazon EKS. It provides the fastest and easiest way to create a new cluster with nodes for Amazon EKS. See eksctl’s website for more information.- First, create an
eks_cluster.yaml
file with the following code snippet. Replacecluster-name
with your cluster name, replaceregion-code
with the AWS region where you want to create the cluster and replaceprivate-subnet-idx
with your private subnets. Note: This configuration file creates an EKS cluster in an existing VPC by specifying private subnets. If you want to create a new VPC, remove the VPC and subnets configuration, and then theeksctl
will automatically create a new one.
- Then, run the
eksctl
command to create the EKS cluster.
This command will create the following resources:
- An EKS cluster with the specified version.
- A managed node group with three m6i.2xlarge EC2 instances.
- An IAM OIDC identity provider and a ServiceAccount called
aws-load-balancer-controller
, which we will use later when installing the AWS Load Balancer Controller. - A namespace
milvus
and a ServiceAccountmilvus-s3-access-sa
within this namespace. This namespace will be used later when configuring S3 as the object storage for Milvus.Note: For simplicity, themilvus-s3-access-sa
here is granted full S3 access permissions. In production deployments, it’s recommended to follow the principle of least privilege and only grant access to the specific S3 bucket used for Milvus. - Multiple add-ons, where
vpc-cni
,coredns
,kube-proxy
are core add-ons required by EKS.aws-ebs-csi-driver
is the AWS EBS CSI driver that allows EKS clusters to manage the lifecycle of Amazon EBS volumes.
Now, we just need to wait for the cluster creation to complete.
Wait for the cluster creation to complete. During the cluster creation process, the
kubeconfig
file will be automatically created or updated. You can also manually update it by running the following command. Make sure to replace region-code
with the AWS region where your cluster is being created, and replace cluster-name
with the name of your cluster.Once the cluster is created, you can view nodes by running:
- Create a
ebs-sc
StorageClass configured with GP3 as the storage type, and set it as the default StorageClass. Milvus uses etcd as its Meta Storage and needs this StorageClass to create and manage PVCs.
Then, set the original
gp2
StorageClass to non-default:- Install the AWS Load Balancer Controller. We will use this controller later for the Milvus Service and Attu Ingress, so let’s install it beforehand.
- First, add the
eks-charts
repo and update it.
- Next, install the AWS Load Balancer Controller. Replace
cluster-name
with your cluster name. The ServiceAccount namedaws-load-balancer-controller
was already created when we created the EKS cluster in previous steps.
- Verify if the controller was installed successfully.
- The output should look like:
Milvus supports multiple deployment methods, such as Operator and Helm. Operator is simpler, but Helm is more direct and flexible. We’ll use Helm to deploy Milvus in this example.
When deploying Milvus with Helm, you can customize the configuration via the
values.yaml
file. Click values.yaml to view all the options. By default, Milvus creates in-cluster minio and pulsar as the Object Storage and Message Storage, respectively. We will make some configuration changes to make it more suitable for production.- First, add the Milvus Helm repo and update it.
- Create a
milvus_cluster.yaml
file with the following code snippet. This code snippet customizes Milvus’s configuration, such as configuring Amazon S3 as the object storage and Amazon MSK as the message queue. We’ll provide detailed explanations and configuration guidance later.
The code contains six sections. Follow the following instructions to change the corresponding configurations.
Section 1: Configure S3 as Object Storage. The serviceAccount grants Milvus access to S3 (in this case, it is
milvus-s3-access-sa
, which was created when we created the EKS cluster). Make sure to replace <region-code>
with the AWS region where your cluster is located. Replace <bucket-name>
with the name of your S3 bucket and <root-path>
with the prefix for the S3 bucket (this field can be left empty).Section 2: Configure MSK as Message Storage. Replace
<broker-list>
with the endpoint addresses corresponding to the SASL/SCRAM authentication type of MSK. Replace <username>
and <password>
with the MSK account username and password. You can get the <broker-list>
from MSK client information, as shown in the image below.
Figure 4: Configure MSK as the Message Storage of Milvus
Section 3: Expose Milvus service and enable access from outside the cluster. Milvus endpoint used ClusterIP type service by default, which is only accessible within the EKS cluster. If needed, you can change it to LoadBalancer type to allow access from outside the EKS cluster. The LoadBalancer type Service uses Amazon NLB as the load balancer. According to security best practices,
aws-load-balancer-scheme
is configured as internal mode by default here, which means only intranet access to Milvus is allowed. Click to view the NLB configuration instructions.Section 4: Install and configure Attu, an open-source milvus administration tool. It has an intuitive GUI that allows you to easily interact with Milvus. We enable Attu, configure ingress using AWS ALB, and set it to
internet-facing
type so that Attu can be accessed via the Internet. Click this document for the guide to ALB configuration.Section 5: Enable HA deployment of Milvus Core Components. Milvus contains multiple independent and decoupled components. For example, the coordinator service acts as the control layer, handling coordination for the Root, Query, Data, and Index components. The Proxy in the access layer serves as the database access endpoint. These components default to only 1 pod replica. Deploying multiple replicas of these service components is especially necessary to improve Milvus availability.
Note: The multi-replica deployment of the Root, Query, Data, and Index coordinator components requires the
activeStandby
option enabled.Section 6: Adjust resource allocation for Milvus components to meet your workloads’ requirements. The Milvus website also provides a sizing tool to generate configuration suggestions based on data volume, vector dimensions, index types, etc. It can also generate a Helm configuration file with just one click. The following configuration is the suggestion given by the tool for 1 million 1024 dimensions vectors and HNSW index type.
- Use Helm to create Milvus (deployed in namespace
milvus
). Note: You can replace<demo>
with a custom name.
- Run the following command to check the deployment status.
The following output shows that Milvus components are all AVAILABLE, and coordination components have multiple replicas enabled.
So far, we have successfully deployed the Milvus vector database. Now, we can access Milvus through endpoints. Milvus exposes endpoints via Kubernetes services. Attu exposes endpoints via Kubernetes Ingress.
Run the following command to get service endpoints:
You can view several services. Milvus supports two ports, port
19530
and port 9091
:- Port
19530
is for gRPC and RESTful API. It is the default port when you connect to a Milvus server with different Milvus SDKs or HTTP clients. - Port
9091
is a management port for metrics collection, pprof profiling, and health probes within Kubernetes.
The
demo-milvus
service provides a database access endpoint, which is used to establish a connection from clients. It uses NLB as the service load balancer. You can get the service endpoint from the EXTERNAL-IP
column.As described before, we have installed Attu to manage Milvus. Run the following command to get the endpoint:
You can see an Ingress called
demo-milvus-attu
, where the ADDRESS
column is the access URL.Open the Ingress address in a browser and see the following page. Click Connect to log in.

Figure 5: Log in to your Attu account
After logging in, you can manage Milvus databases through Attu.

Figure 6: The Attu interface
We will use the Milvus example code to test if the Milvus database is working properly. First, download the
hello_milvus.py
example code using the following command:Modify the host in the example code to the Milvus service endpoint.
Run the code:
If the system returns the following result, then it indicates that Milvus is running normally.
This post introduces Milvus, one of the most popular open-source vector databases, and provides a guide on deploying Milvus on AWS using managed services such as Amazon EKS, S3, MSK, and ELB to achieve greater elasticity and reliability.
As a core component of various GenAI systems, particularly Retrieval Augmented Generation (RAG), Milvus supports and integrates with a variety of mainstream GenAI models and frameworks, including Amazon Sagemaker, PyTorch, HuggingFace, LlamaIndex, and LangChain. Start your GenAI innovation journey with Milvus today!