Build Retrieval Augmented Generation (RAG) based Generative AI (GenAI) application with Amazon Bedrock

ABSTRACT

The main objective of this article is to share a quick and easy way to prototype Large Language Model (LLM) with Retrieval Augmented Generation (RAG) application using Amazon Bedrock. We are using information from few HTML documents with general descriptions in the areas of Transformer model architecture and Attention mechanism. These documents are embedded and form our knowledge base on the subject. This example approach can be easily adapted to use different models with advanced RAG to evaluate the LLM responses based on the required tasks.

Python Libraries

The following libraries are required for the LLM RAG application prototype.

Create AWS client for making inference requests

Create a boto3 client to connect programmatically for making inference requests for Foundational Models (FM) hosted in Amazon Bedrock (eg. Cohere Command R).

Explore Cohere Models with Embeddings

Load and Chunk (html documents with unstructured library).

Embed the document chunks in batches using Cohere Embed (English) model hosted in Amazon Bedrock.

Uses the hsnwlib package to index the document chunk embeddings. This ensures efficient similarity search during retrieval. For simplicity, we use hsnwlib as vector library for our knowledge database.

Chatbot decides if it needs to consult external information from knowledge database before responding. If so, it determines an optimal set of search queries to use for documents retrieval.

The document search is performed by the knn_query() method from the hnswlib library. With a user query message, it returns the document chunks that are most similar to this query. We can define the number of document chunks to return using the attribute retrieve_top_k(). If there are matched documents, the retrieved document chunks are then passed as documents in a new query message send to the FM (Cohere Command R+).

Display external information from RAG with citations and retrieved document chunks based on the retrieve_top_k parameter.

The chat history is updated for next user query.

Alternative LLM model to handle general questions

Using an alternative Foundational Models (FM) hosted in Amazon Bedrock (eg. Claude3 Sonnet) to handle general questions whenever the Cohere models do not know the answers. LangChain LLMChain and ConversationBufferMemory are used in to establish the conversation and store the chat history.

LLM Chatbot with RAG User interface

Using Streamlit to build the User interface for the LLM Chatbot with RAG.

Select your cookie preferences

Site Terms, Privacy, and more.