
Knowledge Graph And Generative AI applications (GraphRAG) with Amazon Neptune and LlamaIndex (Part 2) - Knowledge Graph Retrieval
How to use LlamaIndex and Amazon Bedrock to translate a natural language question into templated graph queries.
TextToCypherRetriever
class of the PropertyGraphIndex to take the schema of the graph and the question, generate an openCypher query, and then execute that query.Note: To try this out for yourself as you go through this post, you can download a notebook from our Amazon Neptune Generative AI Samples repository on Github, here.
- The core package for LlamaIndex
- Packages for Amazon Bedrock, which we'll be using as our large language model (LLM)
- Packages for Amazon Neptune, which will serve as our data store
1
pip install llama-index llama-index-graph-stores-neptune llama-index-llms-bedrock llama-index-embeddings-bedrock
Claude v3 Sonnet
and Titan Embedding v1
.PropertyGraphIndex
later we must provide it an embedding model, so even though we are creating one, it is not used.1
2
embed_model = BedrockEmbedding(model="amazon.titan-embed-text-v1")
llm = Bedrock(model="anthropic.claude-3-sonnet-20240229-v1:0")
Settings
object which sets the settings for all modules in the application. In this example, we’ll set the LLM and embedding model to the values we defined above.1
2
Settings.llm = llm
Settings.embed_model = embed_model
PropertyGraphIndex
.PropertyGraphStore
for our Amazon Neptune Database using the NeptuneDatabasePropertyGraphStore
, specifying the cluster endpoint.1
2
from llama_index.graph_stores.neptune import NeptuneDatabasePropertyGraphStore
graph_store = NeptuneDatabasePropertyGraphStore(host='your-neptune-endpoint')
PropertyGraphStore
for our Amazon Neptune Database using the NeptuneAnalyticsPropertyGraphStore
, specifying the graph identifier.1
2
from llama_index.graph_stores.neptune import NeptuneAnalyticsPropertyGraphStore
graph_store = NeptuneAnalyticsPropertyGraphStore(graph_identifier="g-<INSERT GRAPH ID>")
PropertyGraphIndex
which is a feature in LlamaIndex. To read more about the features, check out this blog post, it is a great read.from_existing
method since we already have data loaded into the graph.1
2
3
4
from llama_index.core import PropertyGraphIndex
index = PropertyGraphIndex.from_existing(
property_graph_store=graph_store
)
CypherTemplateRetriever
.CypherTemplateRetriever
is the core component powering the knowledge graph retrieval capability in this system. This is an area where LlamaIndex does significant heavy lifting for us.Here's how the retriever works:
- When given a natural language question, the retriever combines the question with predefined template parameters.
- It then provides this combined input to the language model (LLM), which extracts the relevant parameters from the question.
- Once the parameters are extracted, the retriever incorporates them into a parameterized openCypher query template.
- Finally, the retriever executes this query against the graph store and returns the results.
CypherTemplateRetriever
, a few additional pieces need to be configured:- The
TemplateParams
class: This is a Pydantic BaseModel class that defines the expected parameters, along with a description for each. The LLM uses these descriptions to understand what values it needs to extract from the question. - The parameterized openCypher query: This is the template query that will be executed, with the extracted parameters inserted as necessary.
$names
parameter to the Cypher query during execution.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
from llama_index.core.indices.property_graph import CypherTemplateRetriever
from pydantic.v1 import BaseModel, Field
class TemplateParams(BaseModel):
"""Template params for a cypher query."""
names: list[str] = Field(
description="A list of person names to use for lookup in a knowledge graph."
)
# Friends Query
cypher_query = """
WHERE p1.first_name in $names
MATCH (p1:person)-[:friends]->(p)
RETURN p.first_name"""
retriever = CypherTemplateRetriever(index.property_graph_store, TemplateParams, cypher_query)
retrieve
method on our Retriever
object, passing in the natural language question we want to have answered.1
2
3
4
nodes = retriever.retrieve("Who are Dave's Friends?")
for node in nodes:
print(node.text)
As we've seen, the results returned from our graph queries not only include the requested data values, but also additional metadata about the graph structure.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
cypher_query = """
MATCH p=(src:person {first_name:$from_person})-[:friends*1..]-(dst:person {first_name:$to_person})
RETURN p LIMIT 1"""
class TemplateParams(BaseModel):
"""Template params for a cypher query."""
from_person: str = Field(
description="A person names to start for lookup in a knowledge graph."
)
to_person: str = Field(
description="A person names to end for lookup in a knowledge graph."
)
retriever = CypherTemplateRetriever(index.property_graph_store, TemplateParams, cypher_query)
nodes = retriever.retrieve("Are Dave and Denise connected?")
for node in nodes:
print(node.text)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
cypher_query = """
MATCH (p:person {first_name: $from_person})-[:lives]->(c:city)<-[:within]-(r:restaurant)-[:serves]->(cuisine:cuisine)
WHERE cuisine.name IN $cusines
MATCH (r)<-[a:about]-(rev:review)
WITH r, max(rev.rating) AS max_rating
MATCH (r)<-[a:about]-(rev:review)
WHERE rev.rating = max_rating
RETURN r.name, rev.rating
ORDER BY r.name"""
class TemplateParams(BaseModel):
"""Template params for a cypher query."""
from_person: str = Field(
description="A person names to start for lookup in a knowledge graph."
)
cusines: list[str] = Field(
description="Cuisine names for restaurant types to lookup in a knowledge graph."
)
retriever = CypherTemplateRetriever(index.property_graph_store, TemplateParams, cypher_query)
nodes = retriever.retrieve("What restaurants near Dave with a diner or bar cuisine is the highest rated?")
for node in nodes:
print(node.text)
- Noisy neighbor problems: Where high-demand queries from some users impact the performance for others.
- Resource over-utilization: Ensuring individual users don't monopolize shared computing resources.
- Data manipulation: Protecting the integrity of the underlying knowledge graph data.
- Inadvertent data exposure: Restricting access to sensitive or confidential information.
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.