Building a knowledge graph with generative AI
Extracting relationships from unstructured data
Randy D
Amazon Employee
Published Mar 26, 2024
Have you ever thought about the amount of valuable information that's lost in meeting notes? Whenever I meet with a customer, I jot down notes about not only the technical topic of the day, but also the context of the meeting. Who else was mentioned in the meeting that might be interested in the conversation? What are some of the organizational blockers that are getting in the way of progress?
I've tried a few electronic note-taking systems over the years, but fundamentally that just gives me a better way to transfer my hand-scribbled notes into a more permanent archive. What I really need is a way to see some structure out of my notes - what's the big picture view, and what are the connections that I may have missed, or forgotten about, that are really important?
A knowledge graph is a good way to model this sort of loosely structured information. In the AWS world, I can store the graph in Amazon Neptune, and use Neptune's ML algorithms (based on graph neural networks) to ask interesting questions like "what relationship probably exists between these two people, given the other data I know."
It turns out that you can use a large language model (LLM) to convert raw, unstructured text into a knowledge graph. LLMs can pick out the concepts (graph nodes) in the text, and how they are related (the graph edges). I've written a few examples of how to do this and published them here:
I haven't done a full survey of the existing techniques, so there may be easier ways to do this. But think of the possibilities. If you are sitting on thousands of meeting notes from you and your coworkers, you can uncover potentially interesting relationships that are otherwise very hard to find. In the example notebook, I used a data set of NDA disclosure forms filed with the SEC. These are boilerplate documents that only contain a few useful pieces of real data - who signed the NDA, with what counterparty, and the terms of the agreement. By building a knowledge graph from publicly available NDA filings, you might uncover interesting trends, like the same person signing an NDA with multiple companies working in the same space over a period of years.
You can of course use an LLM to query the graph, either directly or via a graph-RAG solution.
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.