Using Amazon Bedrock to compare Retrieval Augmented Generation (RAG) based Generative AI (GenAI) application between Amazon Nova Pro and Anthropic Claude 3.5 Sonnet

ABSTRACT

The main objective of this article is to share a quick and easy way to build a Retrieval Augmented Generation (RAG) based Generative AI (GenAI) application using Amazon Bedrock. We are using information from few HTML/PDF documents with general descriptions in the areas of Environmental, Social and Governance (ESG). These documents are embedded and form our knowledge base on the subject. This example approach can be easily adapted to use different high-performing Foundation Models (FMs) with advanced RAG to evaluate the LLM responses based on the required tasks.

GEN-AI RAG PIPELINE OVERVIEW

The high level task pipeline for the GenAI-powered application :

Create a Knowledge Base from the documents to be used as contexts for FM queries.
Select the secure, reliable, accurate, efficient and cost-effective FM.
Based on the user queries and document embeddings (Cohere Embed English), retrieve the similar document chunks from vector store with FAISS engine.
For improved relevancy and accuracy, rerank the retrieved similar document chunks.
Augment user query with the reranked document chunks.
Re-write user query and perform Prompt Construction for the selected FM (Amazon Nova Pro / Claude 3.5 Sonnet).
Use Amazon Bedrock Guardrails to filter harmful contents and topics on both the user inputs and FM responses.
Perform FM responses with streaming for responsiveness.
Format the FM responses accordingly based on the use case.
Collect user feedback on the responses for potential model improvements.
Save the Queries and Responses for model evaluations, model fine-tuning and/or continued pre-training.

Python Libraries

The following libraries are required for the GenAI RAG application prototype.

1
2
3
4
5
6
7
8
9
10
11
# AWS Bedrock runtime and agent runtime clients, Embedding, Retrieval, Rerank
import boto3
import json
import uuid
import hnswlib
import datetime
import time

from typing import List, Dict
from unstructured.partition.html import partition_html
from unstructured.chunking.title import chunk_by_title

Create AWS clients for making inference requests

Create a boto3 agent runtime client to connect programmatically for making retrieval from Amazon Bedrock Knowledge Base (OpenSearch Serverless).
Create a boto3 agent runtime client to connect programmatically for making inference requests from Large Language Models (LLM) hosted in Amazon Bedrock (eg. Amazon Nova Pro).

1
2
3
4
5
# Creates bedrock_runtime
self.bedrock_runtime = boto3.client(service_name='bedrock-runtime', region_name='us-west-2')

# Creates bedrock_agent_runtime
self.bedrock_agent_runtime_us_west = boto3.client(service_name='bedrock-agent-runtime', region_name='us-west-2')

Amazon Bedrock Agent Retrieve and Generate

Retrieve embedded chunks (Cohere Embed English) from OpenSearch Serverless (OSS) vector store with semantic search.
Rerank document chunks with Cohere Rerank 3.5 model.
Stop harmful content in models using Amazon Bedrock Guardrails.
Generate LLM response in streams.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
if sessionId:
    return self.bedrock_agent_runtime_us_west.retrieve_and_generate_stream(
        input={
            'text': message
        },
        retrieveAndGenerateConfiguration={
            'knowledgeBaseConfiguration': {
                'generationConfiguration': {
                    'additionalModelRequestFields': {
                        'string': None
                    },
                    'guardrailConfiguration': {
                        'guardrailId': guardrailId,
                        'guardrailVersion': '1'
                    },
                    'inferenceConfig': {
                        'textInferenceConfig': {
                            'maxTokens': 1000,
                            'temperature': 0.1,
                            'topP': 0.999
                        }
                    },
                    'performanceConfig': {
                        'latency': 'standard'
                    },
                    'promptTemplate': {
                        'textPromptTemplate': kb_orchestration_prompt
                    }
                },
                'knowledgeBaseId': kbId,
                'modelArn': model_package_arn,
                'orchestrationConfiguration': {
                    'additionalModelRequestFields': {
                        'string': None
                    },
                    'inferenceConfig': {
                        'textInferenceConfig': {
                            'maxTokens': 1000,
                            'temperature': 0.1,
                            'topP': 0.999
                        }
                    },
                    'performanceConfig': {
                        'latency': 'standard'
                    },
                    'promptTemplate': {
                        'textPromptTemplate': kb_orchestration_prompt
                    }
                },
                                        'retrievalConfiguration': {
                    'vectorSearchConfiguration': {
                        'numberOfResults': numberOfResults,
                        'overrideSearchType': 'SEMANTIC',
                        'rerankingConfiguration': {
                            'bedrockRerankingConfiguration': {
                                'modelConfiguration': {
                                    'modelArn': rerank_model_package_arn
                                },
                                'numberOfRerankedResults': numberOfResults
                            },
                            'type': 'BEDROCK_RERANKING_MODEL'
                        }
                    }
                }
            },
            'type': 'KNOWLEDGE_BASE'
        },
        # Value to maintain multi-turn interactions and contexts
        sessionId=sessionId

Amazon Bedrock LLM responses in streams

Perform multi-turn conversations with sessionId.
Obtain LLM response in streams after knowledge retrieval, rerank and employing Amazon Guardrails.

1
2
3
4
5
6
7
8
9
10
11
if self.prev_sessionId:
  print(f"Multi-turn conversation with prev_sessionId : {self.prev_sessionId}")
  retrieve_gen_response = self.KB_Retrieve_and_Generate_Rerank_Stream(message, kbId, modelId,
                          sessionId=self.prev_sessionId)
else:
  print(f"Start a new conversation should not have any prev_sessionId : {self.prev_sessionId}")
  retrieve_gen_response = self.KB_Retrieve_and_Generate_Rerank_Stream(message, kbId, modelId)
  if 'sessionId' not in retrieve_gen_response:
      print("There was no sessionId after being blocked from guardrail")
      retrieve_gen_response['sessionId'] = self.prev_sessionId
  self.prev_sessionId = retrieve_gen_response['sessionId']

Display external information from RAG with citations and retrieved document chunks based on the numberOfResults parameter.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Store only the non-duplicated text with the information for first occurrence
for i in range(len(llm_chain.cited_documents)):
    if llm_chain.citations[i] not in doc_refs_text:
        doc_refs_text.append(llm_chain.citations[i])
        doc_refs.append({'text': llm_chain.citations[i],
                         'x-amz-bedrock-kb-source-uri' : llm_chain.cited_documents[i]})

print("\nREFERENCES :")
for document in doc_refs:
    print(f"[{document_cnt}] {document}")

    # Extract filename from URI path
    ref_str = document['x-amz-bedrock-kb-source-uri']
    ref_filename = ref_str.split("/")[-1]
    mention(label=f"[{document_cnt}] {ref_filename} :: 👉 \"{document['text']}\"",
                     url=document['x-amz-bedrock-kb-source-uri'])
    document_cnt += 1

Reinforcement Learning Human Feedback (RLHF)

Collect user feedback based on the LLM responses to user queries.
Save these user feedback in JSON formatted output file and DynamoDB table.
These collected user data can be used for model evaluations with Amazon Bedrock evaluations.
This data can also be used as a source for Reinforcement Learning Human Feedback (RLHF) which can be used in fine-tuning of the Foundational Models.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
def _submit_feedback(user_response, emoji=None):
    st.toast(f"Feedback submitted: {user_response}")
    print(f"Feedback submitted: {user_response}, {emoji}")

    write_user_feedback(answer_fbk_score=user_response['score'])

    # Set new feedback key for next session
    st.session_state.feedback_key += 1

    # Save Questions and Answers with user feedback
    question = st.session_state.questions[-1]['question']
    answer = st.session_state.answers[-1]['answer']
    thumbUp = True
    
    if user_response['score'] == '👎':
        thumbUp = False

    llm_chain = st.session_state["llm_chain"]

    # Write to JSON formatted output file
    llm_chain.write_qa_to_json(modelId = modelId,
                               question = question,
                               answer = answer,
                               thumbUp = thumbUp)

    # Write to DynamoDB table
    llm_chain.write_qa_to_dynamodb(modelId = modelId,
                                   question = question,
                                   answer = answer,
                                   thumbUp = thumbUp)

Image not found

GenAI RAG and Rerank Chatbot Information stored in DynamoDB Table

MODEL EVALUATIONS AND IMPROVEMENTS

In the model evaluation, we can prepare a simple evaluation table (with 👍, 👎) to measure relevancy and accuracy of the application responses to the user queries. This table could also be used as a source for Reinforcement Learning Human Feedback (RLHF) which can be used in fine-tuning of the Foundational Model. Moreover, during the solution building, we can use the Amazon Bedrock Chat playground with different models to compare the prompt outputs with the application responses of the same user queries.

GenAI Chatbot with RAG User Interface

Using Streamlit to build the User interface for the LLM Chatbot with RAG.

Image not found

Conversational AI with RAG and Rerank using Amazon Bedrock

Select your cookie preferences

Site Terms, Privacy, and more.