Hacking GraphRAG with Amazon Bedrock 🌄

From RAG to GraphRAG

"Everything is a graph" -- Matt Rickard

Later this week, I'll be traveling to the UK to attend the Oxford ML School hosted by AI for Global Goals (just DM me if you want to learn more about it).

Before going on that well-deserved "rest" though (if you've been following along, it won't come as a surprise that I have these strange notions of what PTO and fun actually looks like), I'd like to do something cool... and crazy.

Today's mission (if you choose to accept it) is to hack GraphRAG using Amazon Bedrock.

Yes, you read that right... but first things first - what exactly is GraphRAG?

In a nutshell, GraphRAG is a suite of data transformations in the shape of a pipeline that turns raw text into hierarchical, graph-like structures.

Image not found

Source: Edge et al. (2024)

As you may have guessed, the motivation is that this is better for Retrieval Augmented Generation (RAG) workflows than plain semantic search because... well, graphs have a lot of additional structure (Entities, Relationships, Communities and the like) that make the retrieval part more effective and efficient.

In this post, I won't delve into the details of how we come by this structure (for that you have the original paper and a blog), but this structure will become apparent once we get into the visualization part.

💡 Tip: If you want to deep dive into the GraphRAG knowledge model, I recommend reading the Dataflow section of the GraphRAG docs.

Ready, set... hack! 👨🏻‍💻

👨‍💻 All code and documentation for this section is available on GitHub.

So, how are we going to hack this thing? Well, the good news is that the only thing we really need is an OpenAI-compatible proxy that can serve as a go-between.

Fortunately, there are (at least) two projects out there that do just that:

🚅💥 LiteLLM Proxy (preferred)
Bedrock Access Gateway

Let's start by creating a Conda environment with everything we need

1
2
conda env create -f environment.yml
conda activate graphrag

environment.yml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# For more information on how to manage conda environments from an environment.yml file, refer to
# https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-from-an-environment-yml-file
# https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-file-manually

name: graphrag
channels:
  - defaults
dependencies:
  - python=3.11
  - pip
  - pip:
    - boto3
    - graphrag
    - litellm[proxy]

Next, we need to create the proxy configuration (config.yaml)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Adapted from https://litellm.vercel.app/docs/proxy/configs

model_list:
  # Chat
  - model_name: anthropic-claude-3-sonnet
    litellm_params: 
      model: bedrock/anthropic.claude-3-sonnet-20240229-v1:0
      aws_region_name: us-east-1
      temperature: 0.0
  # Embeddings
  - model_name: amazon-titan-embed
    litellm_params: 
      model: bedrock/amazon.titan-embed-text-v2:0
      aws_region_name: us-east-1

litellm_settings:
  drop_params: True

general_settings: 
  master_key: correct-horse-battery-staple
  # Did you get that reference? ☝️

In this case, we're using Anthropic's Claude 3 Sonnet as our chat model and Amazon Titan Text Embeddings V2 as our embeddings model.

Let's fire up our proxy 🔥

1
litellm -c config.yaml

Image not found

Now, we need to create a GraphRAG workspace. I'll be following the Get Started example from the GraphRAG docs, but feel free to explore at your own volition.

First, let's get our own copy of A Christmas Carol 🎄 by Charles Dickens from Project Gutenberg

1
2
mkdir -p ./ragtest/input
curl https://www.gutenberg.org/cache/epub/24022/pg24022.txt > ./ragtest/input/book.txt

Once the download finishes, you can go ahead and initialize the workspace

1
python -m graphrag.index --init --root ./ragtest

Now is where things get a little bit tricky. The command above will create two files:

.env which contains the environment variables required to run the GraphRAG pipeline, and
settings.yaml which contains the settings for the pipeline

In the .env file, we will place the GraphRAG API key (this is the key we defined in the LLM proxy config viz. ✔️🐎🔋📎) as well as the base URL (👉 proxy) and our model choices

1
2
3
4
OPENAI_BASE_URL=http://0.0.0.0:4000
GRAPHRAG_API_KEY=correct-horse-battery-staple
GRAPHRAG_CHAT_MODEL=anthropic-claude-3-sonnet
GRAPHRAG_EMBEDDINGS_MODEL=amazon-titan-embed

which are then injected into the pipeline settings (side note: I have removed a lot of stuff to make it more readable)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat
  model: ${GRAPHRAG_CHAT_MODEL}
  model_supports_json: true
  api_base: ${OPENAI_BASE_URL}

parallelization:
  stagger: 0.3

async_mode: threaded

embeddings:
  async_mode: threaded
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding
    model: ${GRAPHRAG_EMBEDDINGS_MODEL}
    api_base: ${OPENAI_BASE_URL}

chunks:
  size: 300
  overlap: 100
  group_by_columns: [id]
    
input:
  type: file
  file_type: text
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.txt$"

cache:
  type: file
  base_dir: "cache"

storage:
  type: file
  base_dir: "output/${timestamp}/artifacts"

reporting:
  type: file
  base_dir: "output/${timestamp}/reports"

entity_extraction:
  prompt: "prompts/entity_extraction.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 0

summarize_descriptions:
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

claim_extraction:
  prompt: "prompts/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 0

community_report:
  prompt: "prompts/community_report.txt"
  max_length: 2000
  max_input_length: 8000

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: false

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes

snapshots:
  graphml: false
  raw_entities: false
  top_level_nodes: false

local_search:
  # text_unit_prop: 0.5
  # community_prop: 0.1
  # conversation_history_max_turns: 5
  # top_k_mapped_entities: 10
  # top_k_relationships: 10
  # max_tokens: 12000

global_search:
  # max_tokens: 12000
  # data_max_tokens: 12000
  # map_max_tokens: 1000
  # reduce_max_tokens: 2000
  # concurrency: 32

Finally, let's run the indexing pipeline

1
python -m graphrag.index --root ./ragtest

Image not found

Once it is done, you can start asking questions...

1
2
3
4
python -m graphrag.query \
       --root ./ragtest \
       --method global \
       "What are the top themes in this story?"

Output:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
SUCCESS: Global Search Response: "A Christmas Carol" by Charles Dickens explores several profound themes
through the transformative journey of Ebenezer Scrooge. Here are the key themes that emerge from the reports:

### Redemption and Transformation

The central theme revolves around Scrooge's transformation from a miserly, selfish individual to a kind and generous person,
driven by his encounters with supernatural entities and reflections on his past [Data: Reports (47, 21, 15, 24)]. 

The story emphasizes the redemptive power of the Christmas spirit, as Scrooge undergoes a profound change, embracing generosity,
kindness, and compassion towards others, particularly the less fortunate [Data: Reports (49, 48, 45)].

### The Role of Supernatural Guidance

The Ghost serves as a moral guide and catalyst for Scrooge's transformation, challenging his perspectives on wealth, poverty,
and missed opportunities in life [Data: Reports (49, 48)].

The role of supernatural entities, such as ghosts and spirits, in guiding and influencing characters' transformations and
self-discovery is a recurring theme [Data: Reports (47, 24, 15, 37)].

### The Importance of Family and Human Connections

The importance of family, human connections, and embracing the Christmas spirit is a recurring theme, contrasted with Scrooge's
initial isolation and indifference [Data: Reports (47, 19, 30, 25)].

The Cratchit family, despite their poverty, is portrayed as a close-knit and loving unit, finding joy in their togetherness and
the spirit of the Christmas season [Data: Reports (45)].

### The Contrast between Wealth and Poverty

The story explores the stark contrast between the wealthy, miserly Scrooge and the impoverished but joyful Cratchit family,
highlighting the societal divide and the importance of empathy and compassion [Data: Reports (45, 49)].

The contrast between poverty and wealth, and the treatment of the less fortunate, is a prominent theme, with Scrooge initially
displaying a lack of empathy towards the poor [Data: Reports (18, 12)].

### The Impact of Choices and Consequences

The impact of one's choices and actions on others, and the potential for redemption, is a significant theme explored through
Scrooge's journey and the consequences faced by other characters [Data: Reports (47, 38, 24)].

### The Fleeting Nature of the Present

The Ghost of Christmas Present's life is limited, ending at midnight, symbolizing the fleeting nature of the present moment
and the importance of cherishing and making the most of the present time [Data: Reports (48)].

Finally, we can visualize the actual graph with a tool like Gephi (make sure snapshots.graphml is set to true).

💡 Tip: I recommend watching this tutorial series if you're not familiar with Gephi (like I was).

Image not found

Thanks for reading, see you next time! 👋

References

Articles

(Edge et al., 2024) From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Blogs

GraphRAG: Unlocking LLM discovery on narrative private data

Image not found

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Select your cookie preferences

Site Terms, Privacy, and more.

Hacking GraphRAG with Amazon Bedrock 🌄

Learn how to run GraphRAG pipelines backed by Amazon Bedrock using LiteLLM proxy.

From RAG to GraphRAG

Ready, set... hack! 👨🏻‍💻

References

Articles

Blogs

1 Comment