logo
Menu
I built an AI landmark recommender for visiting new cities

I built an AI landmark recommender for visiting new cities

I’ve created a city explorer app that recommends landmarks to visit using Knowledge Bases for Amazon Bedrock and in this post, I detail how I created this app.

Veliswa Boya
Amazon Employee
Published Apr 16, 2024
In my line of work, I’m lucky enough to travel to various parts of the world. As you may already know, travel for business is unlike travel for leisure. When traveling for business, there usually isn’t a lot of time to explore and do some sightseeing.
To fit in some sightseeing I usually wake up very early, before any of my work engagements for the day start, or fit in a hour or so at the end of the day.
With such little time to do all this sightseeing, it’s always important that I make the most of this little time. I don’t always get this right and my sightseeing plans are usually supported by information gathered from various sources ranging from travel Instagram accounts, TripAdvisor, friends and colleagues, and more.
Well, friends and colleagues fret no more! I’ve created a city explorer app that recommends landmarks to visit using Knowledge Bases on Amazon Bedrock and in this post, I detail how I created this app.

What are knowledge bases, and why do they matter?

Retrieval-Augmented Generation (RAG) helps you optimize the output that you get out of a Large Language Model (LLM) by referencing an authoritative knowledge base outside of its training data sources before generating a response to you. Therefore, knowledge bases help with giving more context to the LLM, resulting in output that’s more relevant, accurate, and customized.
I built my city explorer app using Knowledge Bases for Amazon Bedrock because:
  1. I was looking for a single and accurate data source for all my city explorer app needs.
  2. I wanted the ability to quickly and cost effectively amend my data source based on my travel itinerary.
RAG lets me test and improve my app faster because I can control and change the LLM's data source in line with my changing requirements.

How I built my app using Knowledge Bases

So now I’ll dive into the three (yes only 3! :) ) steps I followed to build my app:
  • The data source
  • Creating the knowledge base
  • Invoking the knowledge base
All code and documentation for this walkthrough is available on GitHub.

Step 1: The data source

I opted for Amazon Simple Storage Service (Amazon S3) as my single data source where I'll be aggregating data from various sources (in this case, various Wikipedia pages of cities from around the world). I created an Amazon S3 bucket in the same region (us-east-1) as the knowledge base that I’m creating, then uploaded the Wikipedia pages in PDF format.
Document file formats allowed are .txt, .md, .html, .doc/.docx, .csv, .xls/.xlsx, .pdf. Each file size cannot exceed the quota of 50 MB.

S3 bucket with Wikipedia pages
S3 bucket with Wikipedia pages


Step 2: Creating the knowledge base

I then created a knowledge base in Amazon Bedrock for the ingestion of the data I created in the previous step. I created the knowledge base in the same region as the data source (us-east-1).
Create the knowledge base
Create the knowledge base
As a data source, I linked the Amazon S3 bucket that I created in Step 1.
Set up data source for the knowledge base
Set up data source for the knowledge base

I was happy to find the Vector Embeddings and RAG Demystified: Leveraging Amazon Bedrock, Aurora, and LangChain - Part 1 post at the start of my journey in learning about RAG. The post explained the concept of vector embeddings and their importance for RAG in a way that helped my understanding a lot. A vector embedding is a numerical representation of content in a form that machines can process and understand by taking a piece of content, like a word, sentence, or image, and mapping it into a multi-dimensional vector space.
Before the city explorer data that’s in the data source, in this case the S3 bucket created in Step 1, can be used in the knowledge base, the data has to be converted to into an embedding, and for this, I (and you :) ) need an embeddings model. I won't go into detail here regarding embedding techniques but popular embedding techniques include Word2Vec for words, Doc2Vec for documents and image embeddings for images. Choosing the right embeddings model depends on the use case, and the MTEB Leaderboard from Hugging Face is a great guide when looking for supporting data that will help in choosing the best one.
So next I selected the embeddings model, and configured the vector store, choosing Amazon OpenSearch Serverless as my vector store to save text embeddings, then created my knowledge base.
Be sure to first request access to the model before assigning it as an embeddings model. You need access to ALL models you'll be referencing in the knowledge base.
Embeddings model and vector store
Embeddings model and vector store
Sync :), the process of syncing the data source to the knowledge base is a very important next step! And while we're here, it’s worth mentioning that this is one of the features of knowledge base that I especially love. During my early stages of experimenting with knowledge bases I changed the data in my data source often, and I liked the fact that I didn’t need to retrain any models whenever I changed the data source, all I needed to do was to re-sync and I'm good to go. This aligned with my 2nd reason for choosing knowledge bases, outlined at the beginning of this post.
Sync
Sync
Once the sync was complete, I could test the knowledge base in the console, Claude 3 Sonnet was my LLM of choice to generate the responses.
Testing the knowledge base
Testing the knowledge base

Step 3: Invoking the knowledge base

Soon it was time for me to jump into the code, to complete building my city explorer app. Here is my AWS Lambda as API function to invoke the knowledge base' RetrieveandGenerate API.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import os
import boto3
import random
import string

boto3_session = boto3.session.Session()
region = boto3_session.region_name

# create a boto3 bedrock client
bedrock_agent_runtime_client = boto3.client('bedrock-agent-runtime')

# get knowledge base id from environment variable
kb_id = os.environ.get("KNOWLEDGE_BASE_ID")
print (kb_id)

# declare model id for calling RetrieveAndGenerate API
model_id = "anthropic.claude-instant-v1"
model_arn = f'arn:aws:bedrock:{region}::foundation-model/{model_id}'

def retrieveAndGenerate(input, kbId, model_arn, sessionId):
#print(input, kbId, model_arn, sessionId)
if sessionId != "":
return bedrock_agent_runtime_client.retrieve_and_generate(
input={
'text': input
},
retrieveAndGenerateConfiguration={
'type': 'KNOWLEDGE_BASE',
'knowledgeBaseConfiguration': {
'knowledgeBaseId': kbId,
'modelArn': model_arn
}
},
sessionId=sessionId
)
else:
return bedrock_agent_runtime_client.retrieve_and_generate(
input={
'text': input
},
retrieveAndGenerateConfiguration={
'type': 'KNOWLEDGE_BASE',
'knowledgeBaseConfiguration': {
'knowledgeBaseId': kbId,
'modelArn': model_arn
}
}
)
I tested my city explorer app using Streamlit which I invoked using the following command:
python -m streamlit run cityexplorer.py
City explorer app
City explorer app
Here’s the JSON response returned by the LLM:
1
2
3
4
5
{"question": "which city has an eiffel tower", "sessionId": ""}
{'statusCode': 200, 'body': {'question': 'which city has an eiffel tower', 'answer': 'Paris, France has an Eiffel Tower.', 'sessionId': '7954e8a2-9510-48b5-9d39-037d1d563eee'}}

{"question": "what else is interesting in Paris", "sessionId": "7954e8a2-9510-48b5-9d39-037d1d563eee"}
{'statusCode': 200, 'body': {'question': 'what else is interesting in Paris', 'answer': 'Some other interesting things to do in Paris besides the Eiffel Tower include visiting the Louvre museum, Notre Dame Cathedral, Arc de Triomphe, Champs-Elysées, and Sainte-Chapelle.', 'sessionId': '7954e8a2-9510-48b5-9d39-037d1d563eee'}}
To learn more, and to get started with building your own smart app:
I had fun building this app, and I learned a ton. Find the code in the Bedrock samples GitHub repo, learn more and build your own smart app with knowledge bases for Amazon Bedrock. I can’t wait to hear what you build, let me know in the comments!
 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

2 Comments