Hacking GraphRAG with Amazon Bedrock 🌄

Hacking GraphRAG with Amazon Bedrock 🌄

Learn how to run GraphRAG pipelines backed by Amazon Bedrock using LiteLLM proxy.

João Galego
Amazon Employee
Published Jul 9, 2024

From RAG to GraphRAG

"Everything is a graph" -- Matt Rickard
Later this week, I'll be traveling to the UK to attend the Oxford ML School hosted by AI for Global Goals (just DM me if you want to learn more about it).
Before going on that well-deserved "rest" though (if you've been following along, it won't come as a surprise that I have these strange notions of what PTO and fun actually looks like), I'd like to do something cool... and crazy.
Today's mission (if you choose to accept it) is to hack GraphRAG using Amazon Bedrock.
Yes, you read that right... but first things first - what exactly is GraphRAG?
In a nutshell, GraphRAG is a suite of data transformations in the shape of a pipeline that turns raw text into hierarchical, graph-like structures.
Source: Edge et al. (2024)
As you may have guessed, the motivation is that this is better for Retrieval Augmented Generation (RAG) workflows than plain semantic search because... well, graphs have a lot of additional structure (Entities, Relationships, Communities and the like) that make the retrieval part more effective and efficient.
In this post, I won't delve into the details of how we come by this structure (for that you have the original paper and a blog), but this structure will become apparent once we get into the visualization part.
💡 Tip: If you want to deep dive into the GraphRAG knowledge model, I recommend reading the Dataflow section of the GraphRAG docs.

Ready, set... hack! 👨🏻‍💻

👨‍💻 All code and documentation for this section is available on GitHub.
So, how are we going to hack this thing? Well, the good news is that the only thing we really need is an OpenAI-compatible proxy that can serve as a go-between.
Fortunately, there are (at least) two projects out there that do just that:
Let's start by creating a Conda environment with everything we need
environment.yml
Next, we need to create the proxy configuration (config.yaml)
In this case, we're using Anthropic's Claude 3 Sonnet as our chat model and Amazon Titan Text Embeddings V2 as our embeddings model.
Let's fire up our proxy 🔥
Now, we need to create a GraphRAG workspace. I'll be following the Get Started example from the GraphRAG docs, but feel free to explore at your own volition.
First, let's get our own copy of A Christmas Carol 🎄 by Charles Dickens from Project Gutenberg
Once the download finishes, you can go ahead and initialize the workspace
Now is where things get a little bit tricky. The command above will create two files:
  • .env which contains the environment variables required to run the GraphRAG pipeline, and
  • settings.yaml which contains the settings for the pipeline
In the .env file, we will place the GraphRAG API key (this is the key we defined in the LLM proxy config viz. ✔️🐎🔋📎) as well as the base URL (👉 proxy) and our model choices
which are then injected into the pipeline settings (side note: I have removed a lot of stuff to make it more readable)
Finally, let's run the indexing pipeline
Once it is done, you can start asking questions...
Output:
Finally, we can visualize the actual graph with a tool like Gephi (make sure snapshots.graphml is set to true).
💡 Tip: I recommend watching this tutorial series if you're not familiar with Gephi (like I was).
Thanks for reading, see you next time! 👋

References

Articles

Blogs


 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

1 Comment