
Using Amazon Bedrock to compare Retrieval Augmented Generation (RAG) based Generative AI (GenAI) application between Amazon Nova Pro and Anthropic Claude 3.5 Sonnet
GenAI Chatbot with RAG and Rerank using different Foundational Models (FMs) on Amazon Bedrock endpoints.
Published Feb 16, 2025
Last Modified Feb 17, 2025
The main objective of this article is to share a quick and easy way to build a Retrieval Augmented Generation (RAG) based Generative AI (GenAI) application using Amazon Bedrock. We are using information from few HTML/PDF documents with general descriptions in the areas of Environmental, Social and Governance (ESG). These documents are embedded and form our knowledge base on the subject. This example approach can be easily adapted to use different high-performing Foundation Models (FMs) with advanced RAG to evaluate the LLM responses based on the required tasks.
The high level task pipeline for the GenAI-powered application :
- Create a Knowledge Base from the documents to be used as contexts for FM queries.
- Select the secure, reliable, accurate, efficient and cost-effective FM.
- Based on the user queries and document embeddings (
Cohere Embed English
), retrieve the similar document chunks from vector store withFAISS
engine. - For improved relevancy and accuracy, rerank the retrieved similar document chunks.
- Augment user query with the reranked document chunks.
- Re-write user query and perform Prompt Construction for the selected FM (Amazon Nova Pro / Claude 3.5 Sonnet).
- Use Amazon Bedrock Guardrails to filter harmful contents and topics on both the user inputs and FM responses.
- Perform FM responses with streaming for responsiveness.
- Format the FM responses accordingly based on the use case.
- Collect user feedback on the responses for potential model improvements.
- Save the Queries and Responses for model evaluations, model fine-tuning and/or continued pre-training.
The following libraries are required for the GenAI RAG application prototype.
- Create a boto3 agent runtime client to connect programmatically for making retrieval from Amazon Bedrock Knowledge Base (OpenSearch Serverless).
- Create a boto3 agent runtime client to connect programmatically for making inference requests from Large Language Models (LLM) hosted in Amazon Bedrock (eg. Amazon Nova Pro).
- Retrieve embedded chunks (
Cohere Embed English
) from OpenSearch Serverless (OSS) vector store with semantic search. - Rerank document chunks with
Cohere Rerank 3.5
model. - Stop harmful content in models using Amazon Bedrock Guardrails.
- Generate LLM response in streams.
- Perform multi-turn conversations with
sessionId
. - Obtain LLM response in streams after knowledge retrieval, rerank and employing Amazon Guardrails.
- Display external information from RAG with citations and retrieved document chunks based on the
numberOfResults
parameter.
- Collect user feedback based on the LLM responses to user queries.
- Save these user feedback in JSON formatted output file and DynamoDB table.
- These collected user data can be used for model evaluations with Amazon Bedrock evaluations.
- This data can also be used as a source for Reinforcement Learning Human Feedback (RLHF) which can be used in fine-tuning of the Foundational Models.

In the model evaluation, we can prepare a simple evaluation table (with 👍, 👎) to measure relevancy and accuracy of the application responses to the user queries. This table could also be used as a source for Reinforcement Learning Human Feedback (RLHF) which can be used in fine-tuning of the Foundational Model. Moreover, during the solution building, we can use the Amazon Bedrock Chat playground with different models to compare the prompt outputs with the application responses of the same user queries.
Using
Streamlit
to build the User interface for the LLM Chatbot with RAG.