Visual Analysis with GenAI - Part 2: Graphs, Charts & Tables
In Part 2 of the Visual Analysis with GenAI series, you'll learn how to analyze graphs, charts and tables from dashboards or documents
Arya Subramanyam
Amazon Employee
Published Aug 19, 2024
Authors: Arya Subramanyam, Ross Alas
In this four-part series of Visual Analysis with GenAI, Ross and I will be taking you through the different techniques including prompt engineering techniques on how you can gain deeper understanding of documents as part of your intelligent document processing pipeline. In Visual Analysis with GenAI - Part 1: Sentiment & Emotion, we focused on analyzing sentiment and emotional tone.
In the ever-evolving landscape of artificial intelligence, Large Language Models (LLMs) have emerged as transformative tools capable of processing and interpreting a wide array of media types, including text, images, audio, and video. These models, such as Anthropic's Claude 3.5 Sonnet, are at the forefront of AI technology, offering unprecedented capabilities in understanding and generating content across multiple modalities, including the analysis of graphs, charts, and tables.
Traditionally, data analysis has been constrained by the limitations of text-only processing, often overlooking the rich insights embedded in visual and multimedia content. However, the advent of multimodal LLMs has revolutionized this process, enabling a more holistic approach to data interpretation. By integrating text and visual analysis, these models provide a comprehensive understanding of complex documents, unlocking insights that were previously inaccessible.
One of the most significant advantages of using LLMs in this context is their ability to facilitate a conversational approach to data analysis. This approach allows users to interact with the data in a more intuitive and natural manner, posing questions and receiving detailed, context-aware responses. This interaction mimics human-like understanding, making it easier to extract meaningful insights from complex datasets.
By leveraging Amazon Bedrock, a fully managed service for deploying AI models, we can harness the power of LLMs to process documents in formats such as PDF. This process involves converting documents into images, preserving the integrity of visual elements such as graphs, charts, and tables, and using the Converse API to analyze and interpret the data. The result is a more nuanced understanding of the content, capturing the intricate patterns and relationships that text alone cannot convey.
Throughout this post, we will explore how this innovative approach can transform data analysis, particularly in scenarios where visual elements play a crucial role. By embracing the capabilities of LLMs, we can enhance our ability to understand and interpret data, driving more informed decision-making and unlocking new possibilities in document analysis.
Our approach begins by converting PDF documents into images using the
pdf2image
Python library. These images are then processed using Amazon Bedrock, which hosts the multi-modal LLM capable of analyzing both text and visual data. This method allows us to preserve the integrity of visual elements and extract detailed insights that are often overlooked by text-only analysis.Figure 1. Architecture overview of the solution. It takes a PDF document, splits it into pages in the form of images, then the images are used as part of the prompt sent to Bedrock, and finally the LLM hosted on Amazon Bedrock responds with the analysis based on the dashboard images.
To implement this solution, ensure you have the following:
- An AWS account with an AWS Identity and Access Management (IAM) user with permissions to invoke Amazon Bedrock
- The AWS Command Line Interface (AWS CLI) installed and configured for use
- Python 3.11 or later with Amazon SDK for Python (Boto3) installed and pdf2image
- (Optional) Use virtualenv or conda to create a virtual Python environment
- (Optional) Use Jupyter Notebooks
Ensure that you have the AWS CLI installed and is configured for use, and as well as Python 3.11 or later. Then, install boto3 and pypdf using PIP:
Import boto3 and pdf2image and create the Bedrock client. Feel free to experiment with different models from the Anthropic Claude 3 and Claude 3.5 families, adjust the temperature, and max_tokens parameters
This is the sample PDF that I’ve used for this blog. It’s in the form of a analytics dashboard outlining some sample sales data. The Sales Dashboard which includes multiple graphs, tables and charts on sales data for various products across regions and industries You can use any of your own PDFs in this example.
Figure 2. The test document containing a Sales Dashboard which includes multiple graphs, tables and charts on sales data.
After you have your .PDF that you want to use, convert it to an array of image bytes in JPEG format.
Once the prompt and images have been prepared, build the Converse API call to Amazon Bedrock. Using the list of image bytes, you will need to build the image image content block for each of the images and incorporate the text prompt. This will be used as part of the message to the Converse API. Note that at the time of writing, there is a maximum limit of 20 images per call, a maximum of 3.75 MB per image, and 8000px by 8000px maximum resolution.
Using the message built above, you can now call the Converse API.
Now, we will be leveraging Large Language Models (LLMs) to analyze and interpret graphs contained within the document.
Understanding the data landscape is the first step towards effective analysis. By summarizing the data and identifying key metrics and dimensions, we lay the groundwork for deeper insights. This step is essential for contextualizing the data and setting the stage for more detailed exploration.
Prompt 1:
Model Response 1:
Prompt 2:
Model Response 2:
Interpreting visuals helps translate complex data into understandable insights. By examining specific graphs, we can identify patterns and trends that inform decision-making. This interpretation is crucial for recognizing significant data points and drawing meaningful conclusions.
Prompt 1:
Model Response 1:
Prompt 2:
Model Response 2:
Analyzing trends over time offers a dynamic perspective on changes and developments within the data. By comparing data across different periods, we can uncover significant shifts and understand the factors influencing these trends. This analysis is key to adapting strategies based on evolving data insights.
Prompt 1:
Model Response 1:
Prompt 2:
Model Response 2:
Prompt 3:
Model Response 3:
Exploring relationships between variables helps us understand the underlying connections within the data. Identifying correlations and potential causations is crucial for developing a nuanced understanding of the data and making informed decisions that address key findings.
Prompt 1:
Model Response 1:
Prompt 2:
Model Response 2:
The goal of dashboard analysis is to derive strategic insights that inform decision-making. By identifying areas with the most potential, we can prioritize efforts and tailor strategies to maximize impact. These insights are essential for driving progress and achieving objectives.
Prompt 1:
Model Response 1:
Prompt 2:
Model Response 2:
Integrating Large Language Models (LLMs) into your data analysis workflow enables a deeper understanding of visual data, transforming it into actionable insights. By leveraging multi-modal models like Anthropic's Claude 3.5 Sonnet through Amazon Bedrock, you can effectively analyze complex documents, capturing the nuanced patterns and relationships that traditional methods might miss. This approach not only enhances the depth of analysis but also facilitates a more intuitive, conversational interaction with data, empowering users to make more informed decisions. As AI technology continues to evolve, embracing these tools will be essential for unlocking the full potential of your data.
To learn more using Generative AI in your applications, check out the following resources:
- Amazon Bedrock is the easiest way to build GenAI-powered applications
- Learn more about Anthropic Claude 3.5 Sonnet
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.