Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose β€œCustomize” or β€œDecline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose β€œAccept” or β€œDecline.” To make more detailed choices, choose β€œCustomize.”

AWS Logo
Menu
Hacking GraphRAG with Amazon Bedrock πŸŒ„

Hacking GraphRAG with Amazon Bedrock πŸŒ„

Learn how to run GraphRAG pipelines backed by Amazon Bedrock using LiteLLM proxy.

JoΓ£o Galego
Amazon Employee
Published Jul 9, 2024

From RAG to GraphRAG

"Everything is a graph" -- Matt Rickard
Later this week, I'll be traveling to the UK to attend the Oxford ML School hosted by AI for Global Goals (just DM me if you want to learn more about it).
Before going on that well-deserved "rest" though (if you've been following along, it won't come as a surprise that I have these strange notions of what PTO and fun actually looks like), I'd like to do something cool... and crazy.
Today's mission (if you choose to accept it) is to hack GraphRAG using Amazon Bedrock.
Yes, you read that right... but first things first - what exactly is GraphRAG?
In a nutshell, GraphRAG is a suite of data transformations in the shape of a pipeline that turns raw text into hierarchical, graph-like structures.
Image not found
Source: Edge et al. (2024)
As you may have guessed, the motivation is that this is better for Retrieval Augmented Generation (RAG) workflows than plain semantic search because... well, graphs have a lot of additional structure (Entities, Relationships, Communities and the like) that make the retrieval part more effective and efficient.
In this post, I won't delve into the details of how we come by this structure (for that you have the original paper and a blog), but this structure will become apparent once we get into the visualization part.
πŸ’‘ Tip: If you want to deep dive into the GraphRAG knowledge model, I recommend reading the Dataflow section of the GraphRAG docs.

Ready, set... hack! πŸ‘¨πŸ»β€πŸ’»

πŸ‘¨β€πŸ’» All code and documentation for this section is available on GitHub.
So, how are we going to hack this thing? Well, the good news is that the only thing we really need is an OpenAI-compatible proxy that can serve as a go-between.
Fortunately, there are (at least) two projects out there that do just that:
Let's start by creating a Conda environment with everything we need
1
2
conda env create -f environment.yml
conda activate graphrag
environment.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# For more information on how to manage conda environments from an environment.yml file, refer to
# https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-from-an-environment-yml-file
# https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-file-manually

name: graphrag
channels:
- defaults
dependencies:
- python=3.11
- pip
- pip:
- boto3
- graphrag
- litellm[proxy]
Next, we need to create the proxy configuration (config.yaml)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Adapted from https://litellm.vercel.app/docs/proxy/configs

model_list:
# Chat
- model_name: anthropic-claude-3-sonnet
litellm_params:
model: bedrock/anthropic.claude-3-sonnet-20240229-v1:0
aws_region_name: us-east-1
temperature: 0.0
# Embeddings
- model_name: amazon-titan-embed
litellm_params:
model: bedrock/amazon.titan-embed-text-v2:0
aws_region_name: us-east-1

litellm_settings:
drop_params: True

general_settings:
master_key: correct-horse-battery-staple
# Did you get that reference? ☝️
In this case, we're using Anthropic's Claude 3 Sonnet as our chat model and Amazon Titan Text Embeddings V2 as our embeddings model.
Let's fire up our proxy πŸ”₯
1
litellm -c config.yaml
Image not found
Now, we need to create a GraphRAG workspace. I'll be following the Get Started example from the GraphRAG docs, but feel free to explore at your own volition.
First, let's get our own copy of A Christmas Carol πŸŽ„ by Charles Dickens from Project Gutenberg
1
2
mkdir -p ./ragtest/input
curl https://www.gutenberg.org/cache/epub/24022/pg24022.txt > ./ragtest/input/book.txt
Once the download finishes, you can go ahead and initialize the workspace
1
python -m graphrag.index --init --root ./ragtest
Now is where things get a little bit tricky. The command above will create two files:
  • .env which contains the environment variables required to run the GraphRAG pipeline, and
  • settings.yaml which contains the settings for the pipeline
In the .env file, we will place the GraphRAG API key (this is the key we defined in the LLM proxy config viz. βœ”οΈπŸŽπŸ”‹πŸ“Ž) as well as the base URL (πŸ‘‰ proxy) and our model choices
1
2
3
4
OPENAI_BASE_URL=http://0.0.0.0:4000
GRAPHRAG_API_KEY=correct-horse-battery-staple
GRAPHRAG_CHAT_MODEL=anthropic-claude-3-sonnet
GRAPHRAG_EMBEDDINGS_MODEL=amazon-titan-embed
which are then injected into the pipeline settings (side note: I have removed a lot of stuff to make it more readable)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
encoding_model: cl100k_base
skip_workflows: []
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_chat
model: ${GRAPHRAG_CHAT_MODEL}
model_supports_json: true
api_base: ${OPENAI_BASE_URL}

parallelization:
stagger: 0.3

async_mode: threaded

embeddings:
async_mode: threaded
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_embedding
model: ${GRAPHRAG_EMBEDDINGS_MODEL}
api_base: ${OPENAI_BASE_URL}

chunks:
size: 300
overlap: 100
group_by_columns: [id]

input:
type: file
file_type: text
base_dir: "input"
file_encoding: utf-8
file_pattern: ".*\\.txt$"

cache:
type: file
base_dir: "cache"

storage:
type: file
base_dir: "output/${timestamp}/artifacts"

reporting:
type: file
base_dir: "output/${timestamp}/reports"

entity_extraction:
prompt: "prompts/entity_extraction.txt"
entity_types: [organization,person,geo,event]
max_gleanings: 0

summarize_descriptions:
prompt: "prompts/summarize_descriptions.txt"
max_length: 500

claim_extraction:
prompt: "prompts/claim_extraction.txt"
description: "Any claims or facts that could be relevant to information discovery."
max_gleanings: 0

community_report:
prompt: "prompts/community_report.txt"
max_length: 2000
max_input_length: 8000

cluster_graph:
max_cluster_size: 10

embed_graph:
enabled: false

umap:
enabled: false # if true, will generate UMAP embeddings for nodes

snapshots:
graphml: false
raw_entities: false
top_level_nodes: false

local_search:
# text_unit_prop: 0.5
# community_prop: 0.1
# conversation_history_max_turns: 5
# top_k_mapped_entities: 10
# top_k_relationships: 10
# max_tokens: 12000

global_search:
# max_tokens: 12000
# data_max_tokens: 12000
# map_max_tokens: 1000
# reduce_max_tokens: 2000
# concurrency: 32
Finally, let's run the indexing pipeline
1
python -m graphrag.index --root ./ragtest
Image not found
Once it is done, you can start asking questions...
1
2
3
4
python -m graphrag.query \
--root ./ragtest \
--method global \
"What are the top themes in this story?"
Output:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
SUCCESS: Global Search Response: "A Christmas Carol" by Charles Dickens explores several profound themes
through the transformative journey of Ebenezer Scrooge. Here are the key themes that emerge from the reports:

### Redemption and Transformation

The central theme revolves around Scrooge's transformation from a miserly, selfish individual to a kind and generous person,
driven by his encounters with supernatural entities and reflections on his past [Data: Reports (47, 21, 15, 24)].

The story emphasizes the redemptive power of the Christmas spirit, as Scrooge undergoes a profound change, embracing generosity,
kindness, and compassion towards others, particularly the less fortunate [Data: Reports (49, 48, 45)].

### The Role of Supernatural Guidance

The Ghost serves as a moral guide and catalyst for Scrooge's transformation, challenging his perspectives on wealth, poverty,
and missed opportunities in life [Data: Reports (49, 48)].

The role of supernatural entities, such as ghosts and spirits, in guiding and influencing characters' transformations and
self-discovery is a recurring theme [Data: Reports (47, 24, 15, 37)].

### The Importance of Family and Human Connections

The importance of family, human connections, and embracing the Christmas spirit is a recurring theme, contrasted with Scrooge's
initial isolation and indifference [Data: Reports (47, 19, 30, 25)].

The Cratchit family, despite their poverty, is portrayed as a close-knit and loving unit, finding joy in their togetherness and
the spirit of the Christmas season [Data: Reports (45)].

### The Contrast between Wealth and Poverty

The story explores the stark contrast between the wealthy, miserly Scrooge and the impoverished but joyful Cratchit family,
highlighting the societal divide and the importance of empathy and compassion [Data: Reports (45, 49)].

The contrast between poverty and wealth, and the treatment of the less fortunate, is a prominent theme, with Scrooge initially
displaying a lack of empathy towards the poor [Data: Reports (18, 12)].

### The Impact of Choices and Consequences

The impact of one's choices and actions on others, and the potential for redemption, is a significant theme explored through
Scrooge's journey and the consequences faced by other characters [Data: Reports (47, 38, 24)].

### The Fleeting Nature of the Present

The Ghost of Christmas Present's life is limited, ending at midnight, symbolizing the fleeting nature of the present moment
and the importance of cherishing and making the most of the present time [Data: Reports (48)].
Finally, we can visualize the actual graph with a tool like Gephi (make sure snapshots.graphml is set to true).
πŸ’‘ Tip: I recommend watching this tutorial series if you're not familiar with Gephi (like I was).
Image not found
Thanks for reading, see you next time! πŸ‘‹

References

Articles

Blogs

Image not found

 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

1 Comment

Log in to comment