logo
Menu
Llama3.2-Chat with your Entity Relationship (ERD) and Architecture diagrams

Llama3.2-Chat with your Entity Relationship (ERD) and Architecture diagrams

Interact with your diagrams using Llama 3.2 multimodal vision. Gain insights, answer queries, and explore your through natural conversation.

Nitin Eusebius
Amazon Employee
Published Oct 1, 2024
Llama 3.2 from Meta was recently made available in Amazon Bedrock. It represents Meta's latest advancement in LLMs and comes in various sizes, from small and medium-sized multimodal models to larger versions. The 11B and 90B parameter models are capable of sophisticated reasoning tasks, including multimodal support for high-resolution images. At the other end, lightweight text-only 1B and 3B parameter models are suitable for edge devices. Llama 3.2 is the first Llama model to support vision tasks, featuring a new model architecture that integrates image encoder representations into the language model.
In this demo we will use Llama 3.2 multimodal vision capabilities to chat with Entity Relationship (ERD) and Architecture diagrams. We see how it will reason and also extract required information. For this demo I used a notebook which I ran on Amazon SageMaker Studio
Note: This is demo code for illustrative purposes only. Not intended for production use.
We will import our required libraries and
Now we will initialize our model and the Amazon Bedrock client
Now we will create 2 functions, one to show the sample image on our notebook and other main function which will call Amazon Bedrock for Llama3.2 model passing in our prompt. We will use converse api.
Now lets load our first image which is a sample database Entity Relationship Diagram (ERD)
Now lets query the image by passing our prompt and asking about tables.
We will get the following response
There are five database tables in the ERD. The tables are: courses, departments, professors, enrollments, and students.
Now lets pass more prompts. This one asks Llama 3.2 to identify tables containing Computer Science student enrollment data.
Response :
To find all students enrolled in the Computer Science course, you will need to access the following tables:1. **enrollments**: This table contains information about which students are enrolled in which courses.
2. **courses**: This table contains information about the courses, including the course name.
3. **students**: This table contains information about the students, including their student ID and name.By joining these tables, you can find the students enrolled in the Computer Science course.
Now lets ask to identify tables containing Generative AI course enrollment data for students in the Machine Learning department, listing only table names.
Response :
To find all students learning Generative AI course in Machine Learning department, you need to access the following tables:* **courses**: To find the course_id of Generative AI course.
* **departments**: To find the department_id of Machine Learning department.
* **students**: To find the student_id of students learning Generative AI course in Machine Learning department.
* **enrollments**: To find the student_id and course_id of students enrolled in Generative AI course.These tables are connected through the following relationships:* **courses** -> **departments**: A course belongs to a department.
* **students** -> **enrollments**: A student is enrolled in multiple courses.
* **enrollments** -> **courses**: An enrollment is for a specific course.By joining these tables, you can find the student_id of students learning Generative AI course in Machine Learning department.
Now lets load our architecture image for analysis
Lets ask to seek what does the architecture shows.
Response :
This architecture shows how you can use Internet of Things (IoT)-enabled cameras to generate live video feed and machine learning inference that can be consumed by an end user in near real-time.
Now lets ask to list AWS services.
Response :
The image shows a flowchart of how Amazon Web Services (AWS) can be used to generate live video feeds and machine learning inference that can be consumed by an end user in near real-time. The AWS services mentioned in the image are:1. AWS Cloud
2. Amazon Kinesis Video Streams
3. Amazon S3
4. Amazon Rekognition
5. Amazon API Gateway
6. Amazon DynamoDB
7. Amazon CloudWatch
8. Amazon CognitoThese services work together to provide a scalable and secure solution for processing and analyzing video data in real-time.
Now lets ask it to extract title, description, and steps from an image into a JSON format suitable for database storage and search application use.
Response :
You can find end to end demo in below video as well.
Happy Building !
 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments