
Using Amazon Bedrock to compare Retrieval Augmented Generation (RAG) based Generative AI (GenAI) application between Amazon Nova Pro and Anthropic Claude 3.5 Sonnet
GenAI Chatbot with RAG and Rerank using different Foundational Models (FMs) on Amazon Bedrock endpoints.
- Create a Knowledge Base from the documents to be used as contexts for FM queries.
- Select the secure, reliable, accurate, efficient and cost-effective FM.
- Based on the user queries and document embeddings (
Cohere Embed English
), retrieve the similar document chunks from vector store withFAISS
engine. - For improved relevancy and accuracy, rerank the retrieved similar document chunks.
- Augment user query with the reranked document chunks.
- Re-write user query and perform Prompt Construction for the selected FM (Amazon Nova Pro / Claude 3.5 Sonnet).
- Use Amazon Bedrock Guardrails to filter harmful contents and topics on both the user inputs and FM responses.
- Perform FM responses with streaming for responsiveness.
- Format the FM responses accordingly based on the use case.
- Collect user feedback on the responses for potential model improvements.
- Save the Queries and Responses for model evaluations, model fine-tuning and/or continued pre-training.
1
2
3
4
5
6
7
8
9
10
11
# AWS Bedrock runtime and agent runtime clients, Embedding, Retrieval, Rerank
import boto3
import json
import uuid
import hnswlib
import datetime
import time
from typing import List, Dict
from unstructured.partition.html import partition_html
from unstructured.chunking.title import chunk_by_title
- Create a boto3 agent runtime client to connect programmatically for making retrieval from Amazon Bedrock Knowledge Base (OpenSearch Serverless).
- Create a boto3 agent runtime client to connect programmatically for making inference requests from Large Language Models (LLM) hosted in Amazon Bedrock (eg. Amazon Nova Pro).
1
2
3
4
5
# Creates bedrock_runtime
self.bedrock_runtime = boto3.client(service_name='bedrock-runtime', region_name='us-west-2')
# Creates bedrock_agent_runtime
self.bedrock_agent_runtime_us_west = boto3.client(service_name='bedrock-agent-runtime', region_name='us-west-2')
- Retrieve embedded chunks (
Cohere Embed English
) from OpenSearch Serverless (OSS) vector store with semantic search. - Rerank document chunks with
Cohere Rerank 3.5
model. - Stop harmful content in models using Amazon Bedrock Guardrails.
- Generate LLM response in streams.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
if sessionId:
return self.bedrock_agent_runtime_us_west.retrieve_and_generate_stream(
input={
'text': message
},
retrieveAndGenerateConfiguration={
'knowledgeBaseConfiguration': {
'generationConfiguration': {
'additionalModelRequestFields': {
'string': None
},
'guardrailConfiguration': {
'guardrailId': guardrailId,
'guardrailVersion': '1'
},
'inferenceConfig': {
'textInferenceConfig': {
'maxTokens': 1000,
'temperature': 0.1,
'topP': 0.999
}
},
'performanceConfig': {
'latency': 'standard'
},
'promptTemplate': {
'textPromptTemplate': kb_orchestration_prompt
}
},
'knowledgeBaseId': kbId,
'modelArn': model_package_arn,
'orchestrationConfiguration': {
'additionalModelRequestFields': {
'string': None
},
'inferenceConfig': {
'textInferenceConfig': {
'maxTokens': 1000,
'temperature': 0.1,
'topP': 0.999
}
},
'performanceConfig': {
'latency': 'standard'
},
'promptTemplate': {
'textPromptTemplate': kb_orchestration_prompt
}
},
'retrievalConfiguration': {
'vectorSearchConfiguration': {
'numberOfResults': numberOfResults,
'overrideSearchType': 'SEMANTIC',
'rerankingConfiguration': {
'bedrockRerankingConfiguration': {
'modelConfiguration': {
'modelArn': rerank_model_package_arn
},
'numberOfRerankedResults': numberOfResults
},
'type': 'BEDROCK_RERANKING_MODEL'
}
}
}
},
'type': 'KNOWLEDGE_BASE'
},
# Value to maintain multi-turn interactions and contexts
sessionId=sessionId
- Perform multi-turn conversations with
sessionId
. - Obtain LLM response in streams after knowledge retrieval, rerank and employing Amazon Guardrails.
1
2
3
4
5
6
7
8
9
10
11
if self.prev_sessionId:
print(f"Multi-turn conversation with prev_sessionId : {self.prev_sessionId}")
retrieve_gen_response = self.KB_Retrieve_and_Generate_Rerank_Stream(message, kbId, modelId,
sessionId=self.prev_sessionId)
else:
print(f"Start a new conversation should not have any prev_sessionId : {self.prev_sessionId}")
retrieve_gen_response = self.KB_Retrieve_and_Generate_Rerank_Stream(message, kbId, modelId)
if 'sessionId' not in retrieve_gen_response:
print("There was no sessionId after being blocked from guardrail")
retrieve_gen_response['sessionId'] = self.prev_sessionId
self.prev_sessionId = retrieve_gen_response['sessionId']
- Display external information from RAG with citations and retrieved document chunks based on the
numberOfResults
parameter.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Store only the non-duplicated text with the information for first occurrence
for i in range(len(llm_chain.cited_documents)):
if llm_chain.citations[i] not in doc_refs_text:
doc_refs_text.append(llm_chain.citations[i])
doc_refs.append({'text': llm_chain.citations[i],
'x-amz-bedrock-kb-source-uri' : llm_chain.cited_documents[i]})
print("\nREFERENCES :")
for document in doc_refs:
print(f"[{document_cnt}] {document}")
# Extract filename from URI path
ref_str = document['x-amz-bedrock-kb-source-uri']
ref_filename = ref_str.split("/")[-1]
mention(label=f"[{document_cnt}] {ref_filename} :: 👉 \"{document['text']}\"",
url=document['x-amz-bedrock-kb-source-uri'])
document_cnt += 1
- Collect user feedback based on the LLM responses to user queries.
- Save these user feedback in JSON formatted output file and DynamoDB table.
- These collected user data can be used for model evaluations with Amazon Bedrock evaluations.
- This data can also be used as a source for Reinforcement Learning Human Feedback (RLHF) which can be used in fine-tuning of the Foundational Models.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
def _submit_feedback(user_response, emoji=None):
st.toast(f"Feedback submitted: {user_response}")
print(f"Feedback submitted: {user_response}, {emoji}")
write_user_feedback(answer_fbk_score=user_response['score'])
# Set new feedback key for next session
st.session_state.feedback_key += 1
# Save Questions and Answers with user feedback
question = st.session_state.questions[-1]['question']
answer = st.session_state.answers[-1]['answer']
thumbUp = True
if user_response['score'] == '👎':
thumbUp = False
llm_chain = st.session_state["llm_chain"]
# Write to JSON formatted output file
llm_chain.write_qa_to_json(modelId = modelId,
question = question,
answer = answer,
thumbUp = thumbUp)
# Write to DynamoDB table
llm_chain.write_qa_to_dynamodb(modelId = modelId,
question = question,
answer = answer,
thumbUp = thumbUp)
Streamlit
to build the User interface for the LLM Chatbot with RAG.