Software Vulnerability Analysis using SBOMs, Amazon Neptune, and Nodestream

Software vulnerabilities are a significant concern for companies and individuals. In this post, we'll discuss some of the recent work I have been doing to provide a unified graph model for ingesting and analyzing Software Bill of Materials (SBOM).

Dave Bechberger
Amazon Employee
Published Jul 10, 2024

Software Vulnerability Analysis using SBOMs, Amazon Neptune, and Nodestream

Software vulnerabilities are a significant concern for companies and individuals alike. In recent years, we've witnessed critical security vulnerabilities in widely used libraries, such as Log4j in 2021 and the XZ utility in March 2024. While the root causes of these vulnerabilities differed – Log4j was an oversight, while XZ contained an intentional backdoor – the consequences for users were equally severe.
These security issues are being noticed and take seriously by enterprises and business executives, see this white paper to see how Intuit is approaching this problem.
Once these vulnerabilities became known, organizations and individuals spend countless hours scouring through numerous applications, identifying and patching systems running vulnerable versions of the software. During this process, a common question arose: "Isn't there a better way to track and manage this information?"
In this post, we'll discuss some of the recent work I have been doing to provide a unified graph model for ingesting and analyzing Software Bill of Materials (SBOM).
Specifically, we will look at how to use a plugin for Nodestream that can ingest SBOM data from:
  • Local SPDX or CycloneDX files
  • From Amazon Inspector
  • Directly from GitHub
Then we'll show how to combine this plugin with the Amazon Neptune to demonstrate how you can gain insights into software vulnerabilities within application stacks. But first, let's explain what an SBOM is and why using a graph is beneficial for analysis.

What is a Software Bill of Materials (SBOM) and why use Graphs

A Software Bill of Materials (SBOM) is an invaluable tool for software development and management, enabling organizations to enhance transparency, security, and reliability of their applications. It serves as a comprehensive "ingredient list" that outlines the components, libraries, and dependencies used in a software system.
An SBOM offers several key benefits:
1. They empower software creators to track and manage dependencies within their applications effectively.
2. They provide security personnel with the necessary information to examine and assess potential vulnerabilities within the software environment.
3. They equip legal personnel with the data required to ensure compliance with licensing requirements.
However, the true power of an SBOM lies in its ability to represent the intricate relationships and dependencies between the various components of a software system. This is where graphs come into play, offering an excellent way to model these interconnected relationships.
In a graph representation, nodes represent individual components, while edges depict the dependencies and relationships between them. This structure can handle complex hierarchies and recursive relationships with ease, making it an ideal choice for analyzing software systems.
By leveraging graph data structures and algorithms, organizations can gain valuable insights and perform various analyses, including:
  • Dependency Graphs: These graphical representations illustrate how different components within the software depend on and relate to one another, making complex relationships more comprehensible.
  • Vulnerability Graphs: By mapping vulnerabilities to the corresponding components, vulnerability graphs enable organizations to assess associated risks and prioritize the mitigation of known issues.
  • Supply Chain Graphs: SBOMs trace the flow of components and dependencies up the software supply chain.
Graph visualizations can illustrate the origin and propagation of open-source components from lower-level suppliers to the final product, aiding in the identification of vulnerabilities or licensing issues throughout the supply chain. By harnessing the power of graphs, organizations can gain a deeper understanding of their software systems, enabling them to make informed decisions, mitigate risks, and ensure compliance more effectively.

How to use Graphs for SBOM analysis

Analyzing Software Bill of Materials (SBOM) can be a complex task, especially when dealing with different data formats. Traditionally, the two most widely used formats are CycloneDX and SPDX, which can make it challenging to consolidate and analyze the data effectively. However, using graphs can provide a powerful solution to this problem, offering a structured and intuitive way to represent and visualize the relationships between different components within an SBOM.
To simplify the process of loading and analyzing SBOMs in a graph database, I recently worked on an SBOM plugin for Nodestream, a Python framework designed for performing graph database ETL (Extract, Transform, Load) operations. This plugin extends the capabilities of Nodestream, providing a straightforward way to load SBOMs from various sources, such as local files, GitHub repositories, or Amazon Inspector, into an opinionated graph data model.
The opinionated graph data model used in the SBOM plugin is designed to capture the essential elements and relationships found in SBOMs, making it easier to navigate and analyze the data. By leveraging the power of graph databases, you can explore the interconnections between different components, identify potential vulnerabilities or licensing issues, and gain valuable insights into the overall structure and composition of your software supply chain.
With the SBOM plugin for Nodestream, you can streamline the process of ingesting and analyzing SBOMs, regardless of their original format. The plugin handles the necessary transformations and mappings, allowing you to focus on the analysis and decision-making aspects of your software supply chain management.

Loading Data into SBOMs into our Graph

To get started loading your SBOM files into Amazon Neptune, you first need to install the Nodestream plugins for Neptune and SBOM.
pip install -q pyyaml nodestream-plugin-neptune nodestream_plugin_sbom
With those data files installed, all we need to do is set our configuration in the nodestream.yaml file as shown below. In this example, we are going to load the SBOM files for Nodestream, the Nodestream Neptune Plugin, and the Nodestream SBOM plugin into our database, directly from GitHub.
With our configuration setup, we can run the import using the following command:
nodestream run sbom_github --target my-neptune
After we run the data load, we get a graph that similar to the image below.

What does our graph look like?

Let’s take a look at the types of data that we are storing in our graph. The plugin uses the opinionated graph data model shown below to represent SBOM data files.
This model contains the following elements:
Node Types
  • Document - This represents the SBOM document as well as the metadata associated with that SBOM.
  • Component - This represents a specific component of a software system.
  • Reference - This represents a reference to any external system which the system wanted to include as a reference. This can range from package managers, URLs to external websites, etc.
  • Vulnerability - This represents a specific known vulnerability for a component.
  • License - The license for the component or package.
Edge Types
  • DESCRIBES/DEPENDS_ON/DEPENDENCY_OF/DESCRIBED_BY/CONTAINS - This represents the type of relationship between a Document and a Component in the system.
  • REFERS_TO - This represents a reference between a Component and a Reference
  • AFFECTS - This represents that a particular Component is affected by the connected Vulnerability
The properties associated with each element will vary depending on the input format used, and the optional information contained in each file.

Analyzing SBOMs

Now that we have our data loaded into our graph, the next step is to start to extract insights into what is actually important in our SBOM data.
One common use case is to investigate shared dependencies across projects. Shared dependencies allow development and security teams to better understand the security posture of the organization through identification of shared risks. Let's start by taking a look at the most shared dependencies between these projects using the query below.
Running this query will show us that there are quite a few dependencies that are shared across all three projects. To do this analysis, we used a graph algorithm known as Degree Centrality which counts the number of edges connected to a node. This measure of how connected the node is can in turn indicate the node's importance and level of influence in the network.
Running the query below shows us that there are 31 Components that are shared across all the projects.
Given that this is a closely connected group of projects, it is not a surprise that there are many shared components. Given that one of the strengths of graphs is the ability to visualize the connectedness between data, let’s take a look at how they are connected.
Another common use case is to investigate licensing across multiple projects. This sort of investigation benefits from the connectedness across the graph by leveraging the connectedness to find how component licenses are connected to each other. Let’s take a look at what other licenses are associated with the lgpl-2.1-or-later licensed components.
As we see, there are quite a few other licenses used in these projects. We can leverage the visual nature of graph results to gain some insight into how components are connected. In this case, let’s see how components with the lgpl-2.1-or-later are connected to components with the unlicense.
We see that there exists one path in our graph between these two licenses.

Next Steps

As we have seen, using graphs to perform analysis of SBOM data can be a powerful tool in your toolbox to gain insights into the connections between software projects. What I have shown here is only the beginning of the types of analysis you can perform with this data. For a more detailed walkthrough of using graphs for SBOM analysis, I recommend taking a look at the following notebooks:

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.