Llama3.2-Chat with your Entity Relationship (ERD) and Architecture diagrams

Llama 3.2 from Meta was recently made available in Amazon Bedrock. It represents Meta's latest advancement in LLMs and comes in various sizes, from small and medium-sized multimodal models to larger versions. The 11B and 90B parameter models are capable of sophisticated reasoning tasks, including multimodal support for high-resolution images. At the other end, lightweight text-only 1B and 3B parameter models are suitable for edge devices. Llama 3.2 is the first Llama model to support vision tasks, featuring a new model architecture that integrates image encoder representations into the language model.

In this demo we will use Llama 3.2 multimodal vision capabilities to chat with Entity Relationship (ERD) and Architecture diagrams. We see how it will reason and also extract required information. For this demo I used a notebook which I ran on Amazon SageMaker Studio

Note: This is demo code for illustrative purposes only. Not intended for production use.

We will import our required libraries and

1
2
3
4
import boto3
from PIL import Image
import matplotlib.pyplot as plt
from botocore.exceptions import ClientError

Now we will initialize our model and the Amazon Bedrock client

1
2
3
MODEL_ID = "us.meta.llama3-2-90b-instruct-v1:0"

bedrock_runtime = boto3.client("bedrock-runtime")

Now we will create 2 functions, one to show the sample image on our notebook and other main function which will call Amazon Bedrock for Llama3.2 model passing in our prompt. We will use converse api.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
def showImage(IMAGE_NAME):
    img = Image.open(IMAGE_NAME)
    plt.figure(figsize=(15,7))  # Increase figure size (in inches)
    plt.imshow(img)
    plt.axis('off')
    plt.show()

def query_image(query) :
    
    user_message = query
    
    messages = [
        {
            "role": "user",
            "content": [
                {"image": {"format": "png", "source": {"bytes": image}}},
                {"text": user_message},
            ],
        }
    ]
    
    try:
        # Send the message to the model, using a basic inference configuration.
        streaming_response = bedrock_runtime.converse_stream(
            modelId=MODEL_ID,
            messages=messages,
            inferenceConfig={"maxTokens": 2000, "temperature": 0.5, "topP": 0.9},
        )

        # Extract and print the streamed response text in real-time.
        for chunk in streaming_response["stream"]:
            if "contentBlockDelta" in chunk:
                text = chunk["contentBlockDelta"]["delta"]["text"]
                print(text, end="")

    except (ClientError, Exception) as e:
        print(f"ERROR: Can't invoke '{MODEL_ID}'. Reason: {e}")
        exit(1)

Now lets load our first image which is a sample database Entity Relationship Diagram (ERD)

1
2
3
4
5
6
IMAGE_NAME = "sample_erd.png"

with open(IMAGE_NAME, "rb") as f:
    image = f.read()

showImage(IMAGE_NAME)

Image not found

Now lets query the image by passing our prompt and asking about tables.

1
query_image("In this ERD how many database tables you see?")

We will get the following response

There are five database tables in the ERD. The tables are: courses, departments, professors, enrollments, and students.

Now lets pass more prompts. This one asks Llama 3.2 to identify tables containing Computer Science student enrollment data.

1
query_image("If i need to find all students enrolled in Computer Science course, which tables should I see. Only provide me table details?")

Response :

To find all students enrolled in the Computer Science course, you will need to access the following tables:1. **enrollments**: This table contains information about which students are enrolled in which courses.
2. **courses**: This table contains information about the courses, including the course name.
3. **students**: This table contains information about the students, including their student ID and name.By joining these tables, you can find the students enrolled in the Computer Science course.

Now lets ask to identify tables containing Generative AI course enrollment data for students in the Machine Learning department, listing only table names.

1
query_image("I need to find all students learning Generative AI course in Machine Learning department, which tables should I see. Only provide me table details?")

Response :

To find all students learning Generative AI course in Machine Learning department, you need to access the following tables:* **courses**: To find the course_id of Generative AI course.
* **departments**: To find the department_id of Machine Learning department.
* **students**: To find the student_id of students learning Generative AI course in Machine Learning department.
* **enrollments**: To find the student_id and course_id of students enrolled in Generative AI course.These tables are connected through the following relationships:* **courses** -> **departments**: A course belongs to a department.
* **students** -> **enrollments**: A student is enrolled in multiple courses.
* **enrollments** -> **courses**: An enrollment is for a specific course.By joining these tables, you can find the student_id of students learning Generative AI course in Machine Learning department.

Now lets load our architecture image for analysis

1
2
3
4
5
6
IMAGE_NAME = "camera_as_service.png"

showImage(IMAGE_NAME)

with open(IMAGE_NAME, "rb") as f:
    image = f.read()

Image not found

Lets ask to seek what does the architecture shows.

1
query_image("what does this architecture shows?")

Response :

This architecture shows how you can use Internet of Things (IoT)-enabled cameras to generate live video feed and machine learning inference that can be consumed by an end user in near real-time.

Now lets ask to list AWS services.

1
query_image("List the names of all AWS services in this image")

Response :

The image shows a flowchart of how Amazon Web Services (AWS) can be used to generate live video feeds and machine learning inference that can be consumed by an end user in near real-time. The AWS services mentioned in the image are:1. AWS Cloud
2. Amazon Kinesis Video Streams
3. Amazon S3
4. Amazon Rekognition
5. Amazon API Gateway
6. Amazon DynamoDB
7. Amazon CloudWatch
8. Amazon CognitoThese services work together to provide a scalable and secure solution for processing and analyzing video data in real-time.

Now lets ask it to extract title, description, and steps from an image into a JSON format suitable for database storage and search application use.

1
query_image("Extract title, description and steps in a json format. Please consider formulating the json file so that the extracted information can be stored in a database for further use in a search application. Return the json file as the output")

Response :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
**JSON File for Camera as a Service Architecture**

```json
{
  "title": "Camera as a Service",
  "description": "This architecture shows how you can use Internet of Things (IoT)-enabled cameras to generate live video feed and machine learning inference that can be consumed by an end user in near real-time.",
  "steps": [
    {
      "step": 1,
      "description": "Generate video feed using Amazon Kinesis Video Streams Producer libraries."
    },
    {
      "step": 2,
      "description": "Ingest live video feed to Amazon Kinesis Video Streams."
    },
    {
      "step": 3,
      "description": "Live feed is converted into images through an on-demand or automated feature and sent to Amazon Simple Storage Service (Amazon S3)."
    },
    {
      "step": 4,
      "description": "An Amazon S3 write event cues an AWS Lambda function, and the image is sent to Amazon Rekognition to generate inference."
    },
    {
      "step": 5,
      "description": "The inference and metadata are stored in Amazon DynamoDB."
    },
    {
      "step": 6,
      "description": "User APIs fetch the inference."
    },
    {
      "step": 7,
      "description": "The user app consumes live feed from Amazon Kinesis Video Streams, fetches the inference from Amazon DynamoDB, and exposes the live feed using a REST API."
    },
    {
      "step": 8,
      "description": "Amazon API Gateway exposes the API for video feed and inference."
    },
    {
      "step": 9,
      "description": "The end user consumes two APIs exposed by Amazon API Gateway. The first API provides video feed using HTTP Live streaming (HLS), MPEG/DASH, or GetMedia streaming. The second video feed provides the machine learning inference."
    },
    {
      "step": 10,
      "description": "The admin app is used for governance, managing administrative APIs, user APIs, camera onboarding, metrics, insights, and so on."
    },
    {
      "step": 11,
      "description": "Use Amazon CloudWatch to store logs and metrics generated by complete stack (applications, infrastructure, network, and services). Use Amazon Cognito to secure API feed generated by Amazon API Gateway."
    }
  ]
}
```

This JSON file can be stored in a database and used in a search application to retrieve information about the Camera as a Service architecture. The `title` and `description` fields provide a brief overview of the architecture, while the `steps` field contains a detailed list of the steps involved in the architecture. Each step is represented by a JSON object with a `step` field containing the step number and a `description` field containing a brief description of the step.

You can find end to end demo in below video as well.

Happy Building !

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Select your cookie preferences

Site Terms, Privacy, and more.

Llama3.2-Chat with your Entity Relationship (ERD) and Architecture diagrams

Interact with your diagrams using Llama 3.2 multimodal vision. Gain insights, answer queries, and explore your through natural conversation.

Comments