I built an AI landmark recommender for visiting new cities

In my line of work, I’m lucky enough to travel to various parts of the world. As you may already know, travel for business is unlike travel for leisure. When traveling for business, there usually isn’t a lot of time to explore and do some sightseeing.
To fit in some sightseeing I usually wake up very early, before any of my work engagements for the day start, or fit in a hour or so at the end of the day.

With such little time to do all this sightseeing, it’s always important that I make the most of this little time. I don’t always get this right and my sightseeing plans are usually supported by information gathered from various sources ranging from travel Instagram accounts, TripAdvisor, friends and colleagues, and more.

Well, friends and colleagues fret no more! I’ve created a city explorer app that recommends landmarks to visit using Knowledge Bases on Amazon Bedrock and in this post, I detail how I created this app.

What are knowledge bases, and why do they matter?

Retrieval-Augmented Generation (RAG) helps you optimize the output that you get out of a Large Language Model (LLM) by referencing an authoritative knowledge base outside of its training data sources before generating a response to you. Therefore, knowledge bases help with giving more context to the LLM, resulting in output that’s more relevant, accurate, and customized.

I built my city explorer app using Knowledge Bases for Amazon Bedrock because:

I was looking for a single and accurate data source for all my city explorer app needs.
I wanted the ability to quickly and cost effectively amend my data source based on my travel itinerary.

RAG lets me test and improve my app faster because I can control and change the LLM's data source in line with my changing requirements.

How I built my app using Knowledge Bases

So now I’ll dive into the three (yes only 3! :) ) steps I followed to build my app:

The data source
Creating the knowledge base
Invoking the knowledge base

All code and documentation for this walkthrough is available on GitHub.

Step 1: The data source

I opted for Amazon Simple Storage Service (Amazon S3) as my single data source where I'll be aggregating data from various sources (in this case, various Wikipedia pages of cities from around the world). I created an Amazon S3 bucket in the same region (us-east-1) as the knowledge base that I’m creating, then uploaded the Wikipedia pages in PDF format.

Document file formats allowed are .txt, .md, .html, .doc/.docx, .csv, .xls/.xlsx, .pdf. Each file size cannot exceed the quota of 50 MB.

S3 bucket with Wikipedia pages

Step 2: Creating the knowledge base

I then created a knowledge base in Amazon Bedrock for the ingestion of the data I created in the previous step. I created the knowledge base in the same region as the data source (us-east-1).

As a data source, I linked the Amazon S3 bucket that I created in Step 1.

Set up data source for the knowledge base

I was happy to find the Vector Embeddings and RAG Demystified: Leveraging Amazon Bedrock, Aurora, and LangChain - Part 1 post at the start of my journey in learning about RAG. The post explained the concept of vector embeddings and their importance for RAG in a way that helped my understanding a lot. A vector embedding is a numerical representation of content in a form that machines can process and understand by taking a piece of content, like a word, sentence, or image, and mapping it into a multi-dimensional vector space.

Before the city explorer data that’s in the data source, in this case the S3 bucket created in Step 1, can be used in the knowledge base, the data has to be converted to into an embedding, and for this, I (and you :) ) need an embeddings model. I won't go into detail here regarding embedding techniques but popular embedding techniques include Word2Vec for words, Doc2Vec for documents and image embeddings for images. Choosing the right embeddings model depends on the use case, and the MTEB Leaderboard from Hugging Face is a great guide when looking for supporting data that will help in choosing the best one.

So next I selected the embeddings model, and configured the vector store, choosing Amazon OpenSearch Serverless as my vector store to save text embeddings, then created my knowledge base.

Be sure to first request access to the model before assigning it as an embeddings model. You need access to ALL models you'll be referencing in the knowledge base.

Sync :), the process of syncing the data source to the knowledge base is a very important next step! And while we're here, it’s worth mentioning that this is one of the features of knowledge base that I especially love. During my early stages of experimenting with knowledge bases I changed the data in my data source often, and I liked the fact that I didn’t need to retrain any models whenever I changed the data source, all I needed to do was to re-sync and I'm good to go. This aligned with my 2nd reason for choosing knowledge bases, outlined at the beginning of this post.

Once the sync was complete, I could test the knowledge base in the console, Claude 3 Sonnet was my LLM of choice to generate the responses.

Each time you add, modify, or remove files from the S3 bucket for a data source, you must sync the data source so that it's re-indexed to the knowledge base.

Step 3: Invoking the knowledge base

Soon it was time for me to jump into the code, to complete building my city explorer app. Here is my AWS Lambda as API function to invoke the knowledge base' RetrieveandGenerate API.

I tested my city explorer app using Streamlit which I invoked using the following command:

python -m streamlit run cityexplorer.py

Here’s the JSON response returned by the LLM:

To learn more, and to get started with building your own smart app:

Amazon Bedrock code examples - Our constantly growing list of examples across models and programming languages.
The Generative AI Space here on community.aws has a curated list of articles all around Amazon Bedrock and Generative AI.
Knowledge Bases for Amazon Bedrock now supports Amazon Aurora PostgreSQL and Cohere embedding models
Vector Embeddings and RAG Demystified: Leveraging Amazon Bedrock, Aurora, and LangChain - Part 1

I had fun building this app, and I learned a ton. Find the code in the Bedrock samples GitHub repo, learn more and build your own smart app with knowledge bases for Amazon Bedrock. I can’t wait to hear what you build, let me know in the comments!

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Select your cookie preferences

Site Terms, Privacy, and more.