AWS Logo
Menu
Using MongoDB Atlas as a Vector Store for Bedrock

Using MongoDB Atlas as a Vector Store for Bedrock

Learn how to build an AWS Bedrock Knowledge Base using MongoDB Atlas as a vector store for semantic search over S3 documents.

Published Apr 12, 2025

Introduction

Understanding Knowledge Bases in AWS Bedrock

AWS Bedrock enables the development of AI-driven applications by providing foundational models and integration options that enhance knowledge retrieval. One of its key features is the ability to create Knowledge Bases, which support Retrieval-Augmented Generation (RAG). RAG combines large language models (LLMs) with a document retrieval system, allowing models to generate responses based on relevant content retrieved from a knowledge base.
A Knowledge Base in AWS Bedrock allows businesses to query unstructured documents while enabling efficient information retrieval. By leveraging semantic search through vector embeddings, Bedrock makes it possible to find the most relevant content dynamically. This is particularly useful for applications requiring contextualized responses based on proprietary or domain-specific information. Additionally, Knowledge Bases created in AWS Bedrock can be integrated with other Bedrock components, such as Bedrock Agents and Bedrock Flow.

Project Overview: Knowledge Base for a Vegan Bakery

To illustrate the capabilities of AWS Bedrock, this tutorial walks through the creation of a Knowledge Base designed for a demo company, a vegan bakery. The goal is to build a didactic example that demonstrates how AWS Bedrock retrieves relevant answers from textual data stored in Amazon S3.

Key Components:

  • Amazon S3 as a Document Store: Text-based documents containing recipes, ingredient substitutions, and common customer questions will be stored in an S3 bucket.
  • MongoDB Atlas as a Vector Database: Embedded representations of document contents will be stored in MongoDB Atlas, enabling efficient similarity searches.
  • AWS Bedrock for Knowledge Retrieval: AWS Bedrock will power semantic search and generate AI-driven responses based on stored bakery-related information.

High-Level Architecture

The following diagram illustrates the end-to-end architecture of the Knowledge Base solution using AWS Bedrock and MongoDB Atlas:
Architecture highlighting the main components involved in this tutorial.
High-Level Architecture
This architecture highlights the main components involved in the document ingestion, embedding, and retrieval processes. Users upload documents to an S3 bucket, which are then parsed, chunked, and embedded using a Bedrock model. The resulting vector representations are stored in MongoDB Atlas, enabling efficient semantic search during inference through AWS Bedrock.
By following this tutorial, users will gain hands-on experience with integrating AWS Bedrock and MongoDB Atlas to build a functional Knowledge Base. This project serves as an educational example, demonstrating how AI-powered retrieval systems can enhance customer interactions in a specialized domain, such as a vegan bakery.

Step-by-Step Guide

The following steps will walk you through setting up a Knowledge Base in Bedrock using Amazon S3 as your document repository and MongoDB Atlas as the vector database.

Step 1. Choosing or Creating an S3 Bucket

Select or create an Amazon S3 bucket to serve as the document repository for your Knowledge Base. This bucket will store the text files that the Bedrock agent will query.
In a collaborative environment with multiple teams, you can structure the S3 storage based on your operational needs. For example, you might create a dedicated bucket for each team—such as sales, technical, or customer support. Alternatively, you can use a single bucket and organize documents by team using separate folders.
This organizational approach will help maintain clarity and ease of access, especially as the volume of documents grows over time.

Step 2. Uploading and Organizing Documents

For this tutorial, we will organize all documents into two subfolders (recipes and company-info) under the S3 bucket. The folder structure will resemble the following:
This separation simulates how businesses manage document access based on team responsibilities. For example, only employees responsible for product formulation can upload files to the recipes folder, while only administrative team members can update the company-info folder

Uploading Documents to Amazon S3

You can upload documents to Amazon S3 using the AWS Management Console or the AWS CLI, which is more efficient for handling a large number of files.

Using the AWS Management Console

  1. Navigate to the bucket.
  2. Select the appropriate folder (e.g., recipes or company-info).
  3. Click Upload.
  4. Drag and drop your files or select them manually to upload.

Using the AWS CLI

For bulk uploads, the AWS CLI offers a more streamlined approach. Use the following command for recipes and company-info folders respectively:
For more details, refer to the official AWS documentation: AWS CLI S3 Reference

Example Documents

Below are example documents to demonstrate the type of files that can be stored and retrieved in this Knowledge Base setup.
Example Document 1: Recipe File (To be placed in the recipes/ folder)
Example Document 2: Business Information File (To be placed in the company-info/ folder)
Example Document 3: Frequently Asked Questions (To be placed in the company-info/ folder)

Step 3. Setting Up MongoDB Atlas

Before integrating MongoDB Atlas with AWS Bedrock, ensure you have an active MongoDB Atlas account and a configured cluster.
Step 3.1. Log in or Sign Up
Access your MongoDB Atlas account at https://cloud.mongodb.com. Sign up if you don't already have an account.
Step 3.2. Create a Project
Click Create Project and define the following:
  • Name: Bedrock
  • Add Members and Set Permissions: Optionally invite collaborators and configure access levels.
Create Project Button Location
Create Project Location
Create Project Page
Create Project
Step 3.3. Create a Cluster
Follow the on-screen instructions to create a new cluster. Choose a region close to your AWS services for reduced latency.
Cluster configuration:
  • Cluster Type: Choose based on your requirements. This tutorial uses the Free Tier.
  • Name: knowledgebase
  • Provider: AWS
  • Region: us-east-1
Performance Consideration:
While an M10 or higher cluster is recommended for production, both M10 and Free Tier clusters were tested for this tutorial:
  • M10 Cluster: Delivered fast, consistent responses without issues.
  • Free Tier Cluster: Functional for proof of concept. Some latency and occasional network errors were observed. In one case, a retry resolved the issue without changes.
Recommendation: Use the Free Tier for development or learning purposes. Opt for M10 or higher for production deployments.
Create a Cluster Button Location
Create a Cluster
Configuration used to in Cluster
Create a Cluster Config
Step 3.4. Create a Database User
Create a user with the necessary credentials:
  • Username: <define-user-name>
  • Password: <use-a-strong-password>
Example of Definition on  Database User creation
Create a Database User
Step 3.5. Configure Network Access
For this tutorial, allow access from all IPs (0.0.0.0/0) to simplify connectivity. For production, configure VPC peering or restricted IP access to enhance security.
Configuring Network Access
Configure Network Access

Step 4. Creating the Database and Collection in MongoDB Atlas

Step 4.1. Access Your Cluster Collections
From the MongoDB Atlas dashboard, navigate to your cluster. Click Browse Collections to view and manage your database collections.
Cluster CollectionsButton Location
Cluster Collections
Step 4.2. Create a New Database and Collection
Click Add My Own Data. In the dialog that appears, provide the following:
  • Database Name: bedrock
  • Collection Name: knowledge
This will create the initial database and collection structure required for storing the vector embeddings.
Create a New Database and Collection Button Location
Create a New Database and Collection Button
Create a New Database and Collection Config Example
Create a New Database and Collection

Step 5. Defining the Atlas Vector Search Index

To enable vector search functionality, you need to define a vector search index in MongoDB Atlas.
Step 5.1. Navigate to the Atlas Search Tab
Access your cluster in the MongoDB Atlas dashboard and open the Atlas Search tab. Click the Create Search Index button to begin.
Create Search Index Button Loaction
Create Search Index
Step 5.2. Configure the Index Settings
  • Search Type: Select Vector Search.
  • Database and Collection: Choose the database and collection created in Step 4 (bedrock.vectorbase).
  • Configuration Method: Select JSON Editor.
Proceed to the next step.
Sample of Configuration of the Index Settings
Configure the Index Settings
Step 5.3. Define the Index Schema
In the JSON Editor, replace the default configuration with the following definition:
Click Next, review the configuration, and click Create Search Index to finalize the setup.
Defining the Index Schema
Define the Index Schema
Note on numDimensions:
Adjust the numDimensions value to match the embedding model you intend to use in AWS Bedrock. This tutorial uses Titan Text Embedding V2, which supports 1024 dimensions. The supported dimension values were obtained directly from the AWS Bedrock console. Below are common configurations:
  • Titan Text Embedding V2: 1024, 512, or 256
  • Titan Embeddings G1 - Text V1.2: 1536
  • Embed English V3: 1024
  • Embed Multilingual V3: 1024
For more details, refer to the AWS documentation: Titan Embedding Models

Step 6. Creating a Secret in AWS Secrets Manager

To securely store MongoDB credentials for use with AWS Bedrock, create a secret in AWS Secrets Manager.
Step 6.1. Access AWS Secrets Manager
In the AWS Management Console, navigate to Secrets Manager and select Secrets from the sidebar.
Step 6.2. Choose Secret Type
Select Other type of secret. Then, define the key-value pairs as follows:
Note: The Key field is case-sensitive. Make sure to use lowercase for the keys: username and password.
Creating a Secret in AWS Secrets
Creating a Secret in AWS Secrets
Step 6.3. Configure the Secret Details
Provide the following configuration:
  • Secret Name: dev/mongodb/knowledgebase
  • Rotation: Select Do not enable automatic rotation
This secret will later be referenced in the Bedrock Knowledge Base configuration.
Configure the Secret Details
Configure the Secret Details

Step 7. Creating a Knowledge Base in AWS Bedrock

With MongoDB Atlas configured, the next step is to create a Knowledge Base in AWS Bedrock.
Step 7.1. Start the Knowledge Base Creation Process
In the AWS Bedrock Console, navigate to Builder tools → Knowledge bases and click Create knowledge base. Choose Knowledge Base with vector store as the setup option.
Start the Knowledge Base Creation Process
Start the Knowledge Base Creation Process
Step 7.2. Provide Basic Configuration Details
  • Name: bakery-knowledge-base
  • Description: Centralized knowledge base for bakery documents.
  • IAM Permissions: Choose Create a new service role
  • Data Source: Select Amazon S3
Provide Basic Configuration Details Fo KB
Provide Basic Configuration Details
Step 7.3. Configure the Data Source
  • Data Source Name: bakery-data-source
  • S3 URI: s3://sample-s3-bedrock-knowledge-bases/
Note: Using the entire bucket as a single data source allows centralized updates across all team folders. Alternatively, you may create separate data sources per folder or team-specific bucket to allow independent updates. For simplicity, this tutorial uses one data source for the full bucket.
  • Parsing Strategy: Choose Amazon Bedrock default parser
    • Suitable for plain text documents such as .txt
    • Other options include Amazon Bedrock Data Automation and Foundation models, which support visually rich documents but incur additional costs. See Bedrock Pricing for details.
  • Chunking Strategy: Select how the documents should be split before embedding. Available options:
    • Default chunking
    • Fixed-size chunking
    • Hierarchical chunking
    • Semantic chunking
    • No chunking
For this tutorial, use the Default chunking strategy.
Configuring the Data Source
Configure the Data Source
Step 7.4. Configure Storage and Processing Details
  • Embeddings Model: Choose Titan Text Embeddings V2, ensuring alignment with the numDimensions value set in Step 5.3.Click Additional configurations under the selected model to confirm the number of vector dimensions supported or required. Some models allow editing this value; others have it fixed.
Embeddings Model
Embeddings Model
  • Vector Database: Choose Use an existing vector store, then select MongoDB Atlas.
Provide the MongoDB Atlas connection details:
  • Hostname: <clusterName>.<shardIdentifier>.mongodb.net (e.g., knowledgebase.xxxx.mongodb.net)
    • You can find this value by clicking Connect in your MongoDB Atlas cluster and copying the connection string.
MongoDB Atlas cluster Connection String Location
MongoDB Atlas cluster Connection String Location

MongoDB Atlas cluster Connection String
MongoDB Atlas cluster Connection String
  • Database Name: bedrock
  • Collection Name: knowledge
  • Credentials Secret ARN: Provide the ARN of the secret created in Step 6
Vector Database Configs
Vector Database
Metadata Field Mapping:
  • Vector Search Index name: vector_index
  • Vector Embedding Field path: embedding
  • Text Field Path: text_chunk
  • Metadata Field Path: metadata
Note: Indexing duration may vary. For the sample dataset, expect approximately 4 minutes per Knowledge Base.
Metadata Field Mapping
Metadata Field Mapping

Step 8. Syncing and Testing the Knowledge Base

Once the Knowledge Base is created, the next step is to synchronize and test its functionality.
Step 8.1. Access the Knowledge Base
In the AWS Bedrock Console, go to Builder tools → Knowledge bases and open your Knowledge Base.
Step 8.2. Synchronize the Knowledge Base
  • Select the associated data source.
  • Click Sync to begin the synchronization process.
The sync operation typically completes in seconds, depending on the number and size of the documents. Once the status indicates success, the data is indexed and ready for retrieval.
Syncing and Testing the Knowledge Base
Syncing and Testing the Knowledge Base
Step 8.3. Test the Knowledge Base
Scroll to the Test Knowledge Base section within the same page. Select a supported LLM (e.g., Amazon Nova Lite) and run sample queries such as:
  • "When was Kind Bites founded?"
  • "What ingredients are used in your vegan chocolate chip cookies?"
The model will return answers based on the indexed content. This provides a convenient way to validate whether the synchronization and document parsing were successful.

Step 9. (Optional) Verifying Indexed Documents in MongoDB Atlas

For users who wish to inspect the underlying data structure or confirm the vector storage, MongoDB Atlas offers visibility into the indexed documents.
Step 9.1. Open the Collection
From the MongoDB Atlas dashboard, navigate to your cluster and click Browse Collections. Locate the bedrock.knowledge collection.
Step 9.2. Review Document Structure
Each document stored by the Knowledge Base typically contains:
  • embedding: The vector representation of a text chunk (e.g., 1024-dimensional array)
  • text_chunk: The portion of content that was embedded
  • metadata: Information such as the S3 file path
Example:
This step is optional but recommended for troubleshooting and understanding the internal indexing mechanism of the system.
Review Document Structure
Review Document Structure

Conclusion

This tutorial provided a step-by-step guide to building a Knowledge Base using AWS Bedrock, Amazon S3, and MongoDB Atlas as a vector store. By using MongoDB Atlas, you gain flexible indexing, semantic search, and scalable storage — ideal for RAG-based AI applications.
For further guidance, refer to the documentation or support teams of AWS and MongoDB.

Learn More

Comments