Retrieval Augmented Generation with Mongo Atlas - Startups.aws Build Template Review

Retrieval Augmented Generation with Mongo Atlas - Startups.aws Build Template Review

Build a RAG chat-bot powered by Amazon Bedrock and Mongo Atlas

Giuseppe Battista
Amazon Employee
Published Jul 8, 2024
Back in 2011, MongoDB was the first NoSQL database I ever heard of, and it immediately belw my mind because of its simple interface, flexibility, and scalability.
Put yourself in my younger self's shoes: all I had seen at that time was relational SQL databases which were both expensive and complex to maintain. They were usually the main bottle-neck of all my architectures. Not to mention the fact that they weren't exactly fit for what I was building. But hey, when all you have is a hammer...
Enter MongoDB. Finally I didn't have to mold my data into inflexible schemas, rather I had a database that was working with my data. Alas, the only managed database service around back then was Amazon RDS, and despite my instant-crush for MongoDB, it took me a while to convince my stakeholders that a document db could have been a viable alternative for the right use-cases. Mainly because nobody in my team wanted to maintain servers running shiny new tech.
"I'm limited by the technology of my time" meme
That's how I felt back then
Fast-forward 13 years, we all now know that you should choose the right tool for the job—man, I hope my then manager is reading this. Guess what? MongoDB has its own managed offer too, Atlas! Oh brave new world. Mongo has also evolved as an all-purpose database having features like Document Model, Geo-spatial, TimeSeries, Vectors etc. It also provides a free tier, so it's great if you're looking to test Mongo out or build a small prototype.
🤫 AWS Activate Members get some extra credits here...

Architecture Overview

Let's have a look at what a possible RAG powered chat bot architecture could look like, of course featuring Mongo Atlas. This code sample is built by my friends at Mongo. You can have a look at the full repository here and you can deploy it to your AWS account from startups.aws/build. Let's dive in!
an architectural diagram depicting the services used in this sample architecture.
Mongo Atlas Generative AI Sample App
For our user interface, we'll make use of Streamlit. This tool makes it easy to build web applications with just Python, eliminating the need to work with complex front-end frameworks. You’ll especially like it if you’re a data scientist or a data engineer who cannot be bothered dealing with HTML, CSS, and JavaScript. Streamlit's allows us to create interactive dashboards and visualizations quickly, which is perfect for our RAG-powered chat bot. The downside of using Streamlit? You'll have to run it on some form of compute: in our case, ECS Fargate. If you know me, you'll know I usually prefer hosting my web apps on Amazon Simple Storage service (S3) and Amazon CloudFront–aka the serverless way–BUT I can definitely understand the appeal of using Streamlit, especially if you're not familiar with front-end development.
Side note: there's a porting of Streamlit to WebAssembly–StLite, meaning you can run Streamlit applications directly in your browser, without the need of running compute. I haven't managed to test it yet. Have you tested StLite? Do you want me to run that experiment and report back? Let me know in the comment section.
As I was saying, our front-end is a Streamlit application hosted on ECS Fargate, behind an internet-facing Application Load Balancer. The front-end application makes use of Amazon Bedrock to interact with two different models:
  • Amazon Titan Embeddings G1 - Text is used to calculate embeddings of the users's prompt and the documents we want to ingest in our knowledge-base. Don't know what I mean by this? Make sure you check out the definition of embedding in ABC of Generative AI.
  • Amazon Titan Text G1 is used to generate text. This is the main LLM in our genrative application. Depending on your region, Bedrock has a wide selection of LLMs to choose from
We store vectors in Mongo Atlas, making it our knowledge base, where the "non-parametric memory" of our RAG system lives. Notice how Mongo Atlas sits in its own VPC and our Streamlit application hosted on Fargate interfaces with it through AWS PrivateLink. This is because your Atlas infrastructure runs in a different AWS account, managed for you by Mongo. In this case, AWS PrivateLink provides private connectivity between the Streamlit application VPC and Mongo Atlas', supported AWS services without sending your traffic through the public internet. This way, strengthening your security posture, reducing costs due to Data Transfer Out (DTO), and improving overall network performance.
Cherry on top, we make use of AWS SecretsManager to store our Mongo Atlas API Keys.

Ingestion Flow

This template's README shows you exactly how to ingest documents into Atlas: you'll create an EC2 instance in the same VPC as the the Streamlit application, trigger the creation of embeddings via Amazon Bedrock and execute a sample search to verify that the ingestion was successful.

Retrieval and Generation Flow

Once you fire up your Streamlit app on Fargate, you'll be presented with the following user interface.
user interface as streamlit app
Chatbot UI
The retrieval and generation flow works as follows:
  1. user enters a prompt via the UI
  2. the prompt is sent to Bedrock to calculate its vectorial representation (the embedding!)
  3. the embedding is used to search similar documents in Atlas
  4. the user prompt and retrieved relevant documents are sent to Bedrock as input for the chosen LLM
  5. the generated response is sent back to the user

Conclusion & Cleanup

You're now the happy owner of a Mongo Atlas powered RAG chat-bot! Congrats!
Once you're done chatting with your new shiny bot, make sure you follow the cleanup section on the template repository, so you don't incur in charges when you're not using it.

Recommended Readings

Authors

Giuseppe Battista is a Senior Solutions Architect at Amazon Web Services. He leads soultions architecture for Early Stage Startups in UK and Ireland. He hosts the Twitch Show "Let's Build a Startup" on twitch.tv/aws and he's head of Unicorn's Den accelerator. Follow Giuseppe on LinkedIn
 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

1 Comment