Build Retrieval Augmented Generation (RAG) based Generative AI (GenAI) application with Amazon Bedrock
Quick GenAI RAG-based application prototype using Large Language Models (LLM) with Amazon Bedrock
- Create a boto3 client to connect programmatically for making inference requests for Foundational Models (FM) hosted in Amazon Bedrock (eg. Cohere Command R).
- Load and Chunk (html documents with unstructured library).
- Embed the document chunks in batches using Cohere Embed (English) model hosted in Amazon Bedrock.
- Uses the
hsnwlib
package to index the document chunk embeddings. This ensures efficient similarity search during retrieval. For simplicity, we usehsnwlib
as vector library for our knowledge database.
- Chatbot decides if it needs to consult external information from knowledge database before responding. If so, it determines an optimal set of search queries to use for documents retrieval.
- The document search is performed by the
knn_query()
method from thehnswlib
library. With a user query message, it returns the document chunks that are most similar to this query. We can define the number of document chunks to return using the attributeretrieve_top_k()
. If there are matched documents, the retrieved document chunks are then passed as documents in a new query message send to the FM (Cohere Command R+).
- Display external information from RAG with citations and retrieved document chunks based on the
retrieve_top_k
parameter.
- The chat history is updated for next user query.
- Using an alternative Foundational Models (FM) hosted in Amazon Bedrock (eg. Claude3 Sonnet) to handle general questions whenever the Cohere models do not know the answers. LangChain
LLMChain
andConversationBufferMemory
are used in to establish the conversation and store the chat history.
Streamlit
to build the User interface for the LLM Chatbot with RAG.