Is A Chat Bot The Answer To Search?
In this post, we'll discuss enabling semantic search / hybrid search as an improvement to keyword and filter based search.
Tanner McRae
Amazon Employee
Published Jun 24, 2024
Photo by Marten Newhall on Unsplash
One of the most common conversations I have with companies is around adding a chat interface to their existing search experience. Is this the silver bullet for search? I would argue no in many cases.
In this blog post we’ll discuss the root problem we’re trying to solve and discuss some potential solutions to enhance your search experience.
With the proliferation of Generative AI, it often leads us to start with the technology and try to fit it into a problem, which is backward. For a search experience, the ultimate goal is to make relevant information easy to find.
Lets use this example:
Example: [Company Name] sells various widgets. Their homepage allows users to search for widgets, read reviews, and get answers about them. Currently, the search is keyword and filter-based. Users type keywords and select filters to find widgets.
In UX design, counting user actions to complete an objective helps identify improvements. Suppose a user searches for widgets by typing keywords and adding filters for color, shape, size, and pickup location. This requires five actions to complete the objective (which is a lot).
The main problem we’re trying to solve becomes: “How do we reduce the steps taken for users to find relevant search results”
If a chat bot is multi-turn and asks the user 5 questions, we didn’t actually reduce friction. We just changed the UI to be a chat interface with the same outcome.
Users think about what they’re trying to find in natural language and then have to retrofit that thought into a clickable user interface. So why not open up the search bar to support natural language queries directly? We can keep the search results page the same and users can get relevant results in as little as one natural language query. This would solve our problem, 5 actions down to 1 in the best case scenario.
We need to do two things. (1) Enable semantic search and (2) auto-populate search filters from the users natural language query. This approach can be distilled down to a “hybrid search” problem.
The solution can be split up into 3 features. Each feature enhances the user experience indivudually, but work best when combined.
In the next couple sections, we’ll talk about these three features.
- Translate natural language to search filters.
- Enable semantic search instead of keyword search.
- Enhance data using a large language model (LLM).
This is the simplest solution of the three. Instead of the user filling in filters and keywords, we use an LLM to convert their natural language query to an Amazon OpenSearch query.
The existing search API previously took a JSON request as input and converted it into an OpenSearch query. In this approach, we’re using an LLM to perform that translation for us. Depending on your needs, you could use an existing LLM API provider like Amazon Bedrock, run your own LLM through Amazon SageMaker, or run an LLM on your own container orchestration platform (ECS or EKS). Price out the solution with your estimated scale early to make that determination.
Semantic search is a search technique that focuses on understanding the meaning and context of the query to provide more relevant results. This was popularized by the paper Efficient Estimation of Word Representations in Vector Space published by researchers at Google in 2013 but the concepts date back much further.
The idea behind semantic search is to represent snippets of text using dense vectors in a high-dimensional space. We often call these vectors embeddings. To do this, we train machine learning models to consume text and generate these dense vectors. To visualize this concept, it’s helpful to model these vectors in 2 dimensions.
In the diagram above, you can see the search query embedding is close to 2 embeddings. That means they are semantically similar to each other. In practice, 2 dimensions are not enough to capture the information needed to make that determination. That’s why these models often output embeddings in a 512 or even higher-dimensional space.
To enable semantic search, we need to create embeddings for our data. In the widgets example, lets assume we have descriptions of the products. We will run a job to update each of the documents in our search index with an embedding of the description. We can then convert the users natural language query to an embedding and search for similar embeddings in our index to get relevant results.
In some cases, you might not have descriptions or existing descriptions don’t cover all the relevant information about your documents. In this case, we can use an LLM to enhance these documents. Then we embed and ingest those enhanced snippets into our search index.
Let’s say you have this json document:
The description is filled with buzzwords and doesn’t capture what the widget is and what it does. You can derive more information from the product features, name, and categories. Knowing this, we can prompt an LLM to generate a better description that might be more semantically meaningful to a users query.
Using a prompt, we convert the json to a description that looks something like this:
“HammerWidget is a blue hand tool from the Tools & Hardware category. Priced at $19.99, it features an ergonomic design, high-grade materials, anti-slip handle, versatile applications, and rust-resistant coating. Dimensions: 5x3x2 inches, Weight: 1.2 lbs, Rating: 4.7.“
Key Point: What you embed matters. You should try to align what you embed with how users will query for your data. In this case, users will likely search for this widget by typing in “Can you show me some hammers that are rust resistant”.
Find an example of what the new architecture might look like.
In this diagram, you can see that instead of just embedding the description, we’re calling an LLM and prompting it to create a more concise summary. We’re then using that enhanced summary as the input into our embedding model and storing the embeddings in our search index.
These three solutions build off each other. You can build each one separately, but ultimately all three should be combined together to create a great customer experience.
Ultimately, if you decide you still want a chat interface, the steps above can be integrated into a chat solution. Improving your search is the first step. If you return bad search results to a chat bot, the chat bot will also be bad.
Query Translation Costs: Depending on your search traffic, LLM costs can add up. For lower traffic use cases, something like Claude3 Haiku through Amazon Bedrock can be a cost effective and simple solution. Most projects start with LLMs vended through these API providers. If you have a lot of traffic, serving an open source model through a SageMaker endpoint can sometimes be more cost effective. It’s important to price this early early.
Embedding Storage Cost: Storage costs for dense vectors can add up. It’s possible to “quantize” your embeddings in order to make them smaller. You can read more about that process here.
LLM Cost: Depending on the size of your search index, LLM costs for data enrichment can also add up. For small/medium sized clusters, it’ll likely be cheaper to use an LLM through Amazon Bedrock. For large and x-large clusters, it might be worthwhile to spin up Llama3 or Mixtral in Amazon SageMaker to do the summarizations. We encourage you to price it out early to make that decision.
In this post we discussed three features that build off each other to provide a better search experience for users. Allowing users to search in natural language and making your data more discoverable enables users to find the information they’re looking for with less intermediate steps. Once you’ve enabled semantic search, adding a chat experience on top of it becomes relatively simple.
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.