Vector Databases for generative AI applications

Vector Databases for generative AI applications

How to overcome LLM limitations using Vector databases and RAG

Abhishek Gupta
Amazon Employee
Published Apr 24, 2024
Last Modified May 30, 2024
I first shared this blog from my session (at GIDS 2024). If you attended it, thank you for coming and I hope you found it useful! If not, well, you have the resources and links anyway – I have written out the talk, so that you can follow along with the slides if you need more context.
The talk is now available on Youtube as well.
If you have something specific in mind, feel free to ask in the Comments below and I would be happy to answer and/or update this blog post! 🙌

Key info

Summarised version of the talk

I had 30-mins – so, I kept it short and sweet!

Setting the context

Foundation models (FMs) are the heart of generative AI. These models that are pre-trained on vast amounts of data. Large language models (LLMs) are a class of FMs. For instance, Claude family from Anthropic, Llama from Meta etc.
You generally access these using dedicated platforms. For example Amazon Bedrock, which is a fully managed service with a wide range of models accessible via APIs. These models are pretty powerful, and they can be used standalone used to build generative AI apps.
So, why do we need vector databases?
To better understand this, lets take a step back and talk about the limitations of LLMs. I will highlight a few common ones.

LLM Limitations

  • Knowledge cut-off: The knowledge of these models is often limited to the data that was current at the time it was pre-trained or fine-tuned.
  • Hallucination: Sometimes, these models provide an incorrect response, quite “confidently”.
LLM limitations
LLM limitations

Another one is lack of lack of access to external data sources.
Think about it - You can setup an AWS account and start using models on Amazon Bedrock. But, if you want to build generative AI applications that are specific to your business needs, you need domain or company specific private data (example, a customer service chatbot that can access customer details, order info, etc.)
Now its possible to train or fine tune these models with your data – but its not trivial or cost effective. But there are techniques to work around these constraints – RAG (discussed later) being one of them and Vector Databases play a key role.

Dive into Vector Databases

Before we get into it, lets understand..
What is a Vector?
In simple terms - Vectors are numerical representation of text.
  1. There is input text (also called prompt)
  2. You pass it through something called an embedding model - think of as a stateless function
  3. You get an output which is an array of floating point numbers
What’s important to understand is that Vectors capture semantic meaning. So they can be used for relevancy or context based search, rather than simple text search.
Vector embeddings
Vector embeddings
I tend to categorise Vector databases as two types:
  • Vector data type support within existing databases, such as PostgreSQL, Redis, OpenSearch, MongoDB, Cassandra, etc.
  • And the other category is for specialised vector databases, like Pinecone, Weaviate, Milvus, Qdrant, ChromaDB, etc.
This field is also moving very fast and I’m sure we will see a lot more in the near future!
Now you can run these specialised vector stores on AWS, via their dedicated cloud offerings. But I want to quickly give you a glimpse of the choices in terms of the first category that I referred to.
They are supported as native AWS database(s)
This includes:
Here is a simplified view of where vector databases sit in generative AI solutions
  • You take your domain-specific data, split/chunk them up
  • Pass them through an embedding model - This gives you these vectors or embeddings,
  • Store these embeddings in a vector database
  • And, then there are applications that execute semantic search queries and combine them in various ways (RAG being one of them)
I will come back to this later

Demo 1 (of 3) - Semantic Search with OpenSearch and LangChain

Find the details here - https://github.com/abhirockzz/langchain-opensearch-rag
Semantic Search
Semantic Search

RAG – Retrieval Augmented Generation

We covered the Limitations of LLM – knowledge cut-off, hallucination, no access to internal data, etc. Of course, there are multiple ways to overcome this.
  • Prompt-engineering techniques: zero-shot , few-shot etc. Sure this is cost-effective but how would this apply to domain-specific data?
  • Fine-tuning: Take an existing LLM and train it using specific dataset. But what about the infra and costs involved? Do you want to become a model development company or focus on your core business?
These are just a few examples.
Now RAG technique adopts a middle ground.
There are two key parts to a RAG workflow:
Part 1: Data ingestion is where you take your source data (pdf, text, images, etc.), break it down into chunks, pass it through an embedding model and store it in the vector database.
Part 2: This involves the end-user application (e.g. a chatbot). The user sends a query – this input is converted to vector embedding using the same (embedding) model that was used for the source data. And we then execute a semantic or similarity search to get the top-N closest results.
That’s not all.
Part 3: These results, also referred to as ”context” are then combined with the user input and a specialised prompt. Finally this is sent to a LLM – note this not the embedding model, this is a large language model. The added context in the prompt helps the model provide a more accurate and relevant response to the user’s query.

Demo 2 (of 3) - RAG with OpenSearch and LangChain

Find the details here - https://github.com/abhirockzz/langchain-opensearch-rag

Fully-managed RAG experience - Knowledge Bases for Amazon Bedrock

Another approach is to have a managed solution to take care of the heavy lifting. For example, if you use Amazon Bedrock, then Knowledge Bases can make RAG easier and manageable. It supports the entire RAG workflow, from ingestion, to retrieval, and prompt augmentation.
And it supports multiple vector stores to store vector embedding data.
Amazon Bedrock - Vector Databases
Amazon Bedrock - Vector Databases

Demo 3 (of 3) - Full-managed RAG Knowledge Bases for Amazon Bedrock

Find the details here - https://github.com/abhirockzz/langchain-opensearch-rag
The demo uses AWS console to configure and test the a Knowledge Base.
Knowledge Base config
Knowledge Base config
Now how do we build RAG applications using this?
For application integration, this is exposed by APIs:
  • RetrieveAndGenerate: Call the API, get the response - that's it. Everything (query embedding, semantic search, prompt engineering, LLM orchestration) is handled!
  • Retrieve: For custom RAG workflows, where you simply extract the ton-N responses (like semantic search) and integrate the rest as per your choice.

Where do I learn more?

Wrap up

And, that's it. Like I said, I had 30-mins and I kept it short and sweet! This area is evolving very quickly. This includes vector databases, LLMs (there is one every week - feels like JavaScript frameworks era!), frameworks (like LangChain, etc.). It's hard to keep up, but remember, the fundamentals are the same. The key is to grasp them - hopefully this helps with some of it.
Happy Building!

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.