logo
The Quest for Context: Unlocking Scattered Insights with AI

The Quest for Context: Unlocking Scattered Insights with AI

How to unify enterprise content into one intelligent search platform

JS
Joe Standerfer
Amazon Employee
Published Dec 22, 2023
Authors: Oshry Ben-Harush, Joe Standerfer, Dennis Liang

Human: Summarize this paper for me
Assistant:
Organizations today deal with many disconnected document and knowledge repositories like Microsoft Office, Google Docs, and Amazon WorkDocs, causing productivity challenges as employees must constantly switch between these siloed systems. This context switching introduces mental blocks that disrupt workflows and results in major inefficiencies. A proposed solution is constructing a centralized knowledge index to ingest content from disparate sources into a unified schema, enabling enterprise search and retrieval without duplicating content across systems. On top of this foundation, applying natural language processing techniques facilitates contextualized unified search to directly answer user questions by retrieving and reformatting the most relevant content. This approach delivers an improved user experience while unlocking the full value of previously siloed content. Examples are provided of leveraging AWS services like Amazon Kendra, Amazon Bedrock, Amazon Q, OpenSearch and LangChain to implement such a system.

Organizations today deal with an overwhelming number of document and knowledge repositories - Microsoft Office, Google Docs, Quip, Amazon Workdocs, customer relations management platforms like SalesForce and internal Wiki pages. While each platform serves a purpose, their existence in silos creates major productivity challenges.
Employees must toggle between these disparate systems constantly throughout their workday, losing valuable time and focus each time they switch contexts. Rather than having information readily available in one place, they must recall which system houses the documents, data, or conversations they need and navigate to it. This context switching introduces mental blocks that disrupt workflows.
On top of that, each repository has its own search functionality built in, with varying degrees of sophistication. As a result, employees must conduct searches in each system independently, then manually compile, analyze, and contextualize the information. This wasted time and effort to simply find and make sense of organizational knowledge adds up, resulting in declines in productivity ([1], [2],[3]):
  1. Increased Content Creation and Management Challenges: Modern businesses, where each employee is a content creator, often have uncoordinated CMS, leading to issues like duplication of content, inconsistent messaging, access risks, and version control problems. These issues result in productivity loss, the risk of inaccurate content, and low user satisfaction.
  2. Duplication and Findability Issues: Organizations often create copies of content in each CMS, causing problems in updating or removing content. Users struggle to find content stored across multiple systems, having to adjust their search behavior and spending extra time in each system. This fragmentation prevents capturing important relationships between different pieces of content.
  3. Impact on U.S. Businesses: Lost productivity due to factors like data silos costs U.S. businesses about $1.8 trillion annually. Data silos hinder team collaboration and communication, leading to inefficiencies. A study found that the average team wastes over 20 hours per month due to poor collaboration and communication, and employees waste an average of 5.3 hours each week waiting for or recreating information.
  4. Revenue Loss Due to Inefficiencies: According to IDC Market Research, companies lose 20-30% of their revenue each year because of inefficiencies caused by issues like data silos. Gartner reports that outdated or inaccurate data can cost small to mid-sized businesses over $15 million per year.
Centralizing access to these disparate sources through an overarching, workplace-specific search engine could help regain some of these losses. Rather than hunting through 4+ systems, employees could simply search once in a unified interface. Even better, enabling natural language, contextual searches with intelligent results aggregation can minimize friction and enable self-service access to information.
In short, while point solutions aid specific tasks, fragmented information spread across them significantly hinders productivity. Developing a smart knowledge base to connect these insights can save time and supercharge efficiency.
In this post, we discuss the different components required for a centralized conceptual search and different approaches to implement such an approach.

Let’s first make sure that we are all familiar with some of the concepts and terminologies we discuss throughout this post:
Large language models are Machine Learning (ML) Neural Networks (NN) trained on massive amounts of text data to generate human-like text. They can summarize, translate, answer questions, and more. See Amazon Bedrock and Anthropic Claude and OpenAI ChatGPT.
Embeddings are vector representations of words or concepts learned by large language models like Claude during pre-training. As LLMs are trained on huge amounts of text data, the embeddings encode semantic and syntactic information about the concepts. This allows the embeddings to capture meaningful relationships between concepts that can be useful for downstream tasks like classification.
Vector databases are specialized database systems optimized for storing and querying large collections of vectors or embeddings. They allow efficient similarity searches across high-dimensional vector spaces, enabling applications like visual search, recommendations, fraud detection, and more. By organizing vectors spatially and using advanced indexing techniques, they can quickly identify vectors similar to a query without having to scan all data. This makes them ideal for AI/ML workloads needing scalable, real-time vector lookups and comparisons. Some popular vector database providers are Pinecone, Chroma and OpenSearch.
Embeddings in three dimensions. Given a set of embeddings stored in the vector database and a new prompt the vector database can retrieve similar documents that can serve as the context.
Retrieval Augmented Generation (RAG) combines large language models with a retrieval system to improve text generation. The model retrieves relevant passages from a database and uses them to condition the language model to generate more accurate, factual, and relevant text. This allows the model to incorporate external and up to date knowledge into its outputs rather than relying solely on the knowledge encoded in its parameters.
Retrieval Augmented Generation

To construct an integrated knowledge base, ingest content from disparate sources into a centralized index. The index provides a unified schema to connect heterogeneous data stores, abstracting the complexity of underlying content management systems. Rather than duplicating content, the index maintains referential links and metadata to enable enterprise search and retrieval. This decoupled architecture offers flexibility to incorporate new document sources without migrating data. The consolidated index layer delivers a federated view of information across systems while avoiding disruption to existing repositories. Through search and discovery facets applied at the index, knowledge can be uncovered from siloed content without moving data from native stores. By funneling queries via the centralized index, the complexity of cross-repository information retrieval is shielded from end users.
This centralized knowledge repository can be constructed leveraging various data stores:
Amazon Kendra integration connectors facilitate ingestion from numerous SaaS platforms like Microsoft 365, Atlassian, Slack. Kendra subsequently constructs a consolidated index spanning these siloed data sources.
Alternatively, vector databases including Pinecone, Chroma and OpenSearch enable encoding of semantic content chunks as dense vectors within a high dimensional space. Subsequent retrieval of relevant vectors can be achieved by leveraging similarity search in response to textual queries or prompts.

Being able to search across these disparate sources is useful, but still requires manual browsing and reading to find answers. What if we could unlock the full value of this content?
By storing the content in a unified, storage-efficient repository, full text search and semantic analysis become possible. Now we can apply the latest natural language processing techniques to directly answer user questions. Retrieve the most relevant content, reformat it into clear and consistent responses, and provide links back to the original sources.
The result is a dramatically improved user experience. No more bouncing between systems or skimming through documents. Users get quick, authoritative answers in a consumable format. Content is freed from silos, reused across the organization, and delivers exponentially more value.
With the recent developments in this field, implementing contextualized unified search is very straightforward for most requirements.
Amazon Q is a new generative AI-powered assistant from AWS that can have conversations, solve problems, generate content, and take actions using a company's data, code, and systems. It provides fast, tailored answers and advice to employees based on their role and permissions. Amazon Q aims to help businesses streamline tasks, speed decision-making, spark creativity and innovation by tapping into enterprise knowledge. Read the blog post here.
Amazon Bedrock is a fully managed service that provides access to a range of high-performing foundation models from leading AI companies through a single API. It enables developers to easily build, evaluate, customize, and deploy generative AI applications while providing security, privacy, and responsible AI safeguards.
Bedrock knowledge bases enable contextualized search through retrieval-augmented generation. Data is indexed into vector embeddings, allowing natural language queries to search relevant content. Retrieved results provide context for foundation models to generate accurate, attributed responses reduces hallucination. This tight integration between search and generation powered by Bedrock streamlines building contextual AI applications. Read the blog post here.
For custom control over the type models, vector stores, databases, document loaders, text splitters, output parsers, agents and others. LangChain is an open-source software framework launched in October 2022 that aims to simplify the process of building applications with large language models. It provides integrations with cloud platforms, databases, web APIs, and over 50 file formats to enable capabilities like chatbots, document summarization, code analysis, and synthetic data generation.

Let us take a moment to understand how LangChain operates in practice. Imagine you desire a chatbot able to respond to queries within the context of your team. As most of your work is internal, public language models would offer little insight. You therefore choose to use Retrieval Augmented Generation to contextualize the LLMs responses and prevent hallucinations.
Setting up a question-answer LangChain that utilizes RAG can be quickly accomplished with AWS Kendra, a python interpreter, and the following steps:
1. Build a Kendra index and upload internal documents. AWS offers various connectors to ingest content from directories, databases, websites, etc.
2. Connect to the Kendra index through a LangChain Retriever object using the kendra_index_id, AWS region, and appropriate policy permissions:
1
2
3
4
5
6
7
8
9
10
# Connect to Kendra Index via Langchain Retriever
from langchain.retrievers import AmazonKendraRetriever

kendra_index_id = "XXXXXXXXXXXXXXXXXXXX"
region = "us-west-2"
retriever = AmazonKendraRetriever(
index_id=kendra_index_id,
region_name=region,
top_k=3
)
3. Connect to an LLM model deployment using the Bedrock API. We'll use Anthropic's Claude2 model:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Create Bedrock LLM API Connection
from langchain.llms.bedrock import Bedrock

model_id = "anthropic.claude-v2"
configs = {"anthropic_version":"bedrock-2023-05-31",
"max_tokens_to_sample":1000,
"temperature":1,
"top_k":250,
"top_p":0.999
}
llm = Bedrock(
region_name = region,
model_kwargs= configs,
model_id=model_id
)
4. Define a prompt template to structure the document context and question inputs. This specific template encourages the model to respond "don't know" if the required info is missing, rather than hallucinating a response. Extra emphasis is placed on the “don’t know” response by placing it at the bottom of the input prompt.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Structure LangChain Prompt Inputs
from langchain.prompts import PromptTemplate

template = """Human: Here are a few documents in <documents> tags:
<documents>
{context}
</documents>
Based on the above documents, provide a detailed answer for, {question} Answer "don't know"
if not present in the document.

Assistant:"""

prompt_template = PromptTemplate(
template=template, input_variables=["context", "question"]
)
5. Wrap the kendra retriever, LLM, and prompt template together in a question and answer LangChain. This will be the parent object that orchestrates and combines the Kendra and LLM responses
1
2
3
4
5
6
7
8
9
# Combine LLM and Kendra Retriever using Langchain
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
llm=llm, chain_type="stuff",
retriever=retriever,
chain_type_kwargs={"prompt": prompt_template},
return_source_documents=True
)
6. Now its ready for questions!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Ask question
query = "Summarize <team_names> roadmap for 2024"
response = qa_chain({"query": query})

# Print results
sources = "\n- ".join([d.metadata["source"] for d in response["source_documents"]])
print("Answer: ", response["result"])
print("Sources: \n- ", sources)

# Response
# Answer: Based on the document excerpts provided, In
# 2024, <team_name> plans to focus on enhancing
# the core functionality of their flagship product, as
# well as expanding into some new areas.

# On the roadmap is a major update to the algorithms
# that power the product's recommendation engine.
# The improved algorithms will enable more
# personalized and relevant recommendations
# for users. The team also plans to introduce
# new machine learning capabilities that
# will allow the product to continuously improve
# its recommendations over time based on user behavior.
# ...
#
# analytics.
#
# Sources:
# - doc1
# - doc2
# - doc3

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.