Staff Pick
Build an AI image catalogue! - Claude 3 Haiku
Building an Intelligent Photo Album with Amazon Bedrock
Published Mar 17, 2024
Last Modified Mar 18, 2024
If you’re into your Machine Learning/Artificial Intelligence/Generative AI and have been living under a rock, it may be news to you that Anthropic have now released their latest version of the Claude Large Language Model (LLM), which they have predictively named, Claude 3. But what’s not so predictable is that not only is this the third generation of Claude, but there are also three different variations of Claude.
In this post, we’re going to explore the creation of a basic, serverless, image cataloguing application using Claude 3 and explore how it enhances the functionality of our AI photo album application. Powered by Amazon Bedrock, this application leverages the poetic prowess of Claude 3 Haiku to provide insightful summaries of uploaded images.
In today’s digital era, our memories are often immortalized through the lens of a camera, capturing moments that evoke a myriad of emotions. As our photo collections grow, organizing and making sense of these images can become a daunting task. However, with the emergence of artificial intelligence (AI) and cloud computing, we now have the opportunity to create intelligent solutions that automate and enrich the photo management process.
“The next generation of Claude: A new standard for intelligence” - https://www.anthropic.com/news/claude-3-family
When Claude 3 was announced they came up with not just one new model, but three:
- Haiku (Available on Amazon Bedrock)
- Sonnet (Available on Amazon Bedrock)
- Opus (Coming soon to your favourite AI provider starting with B… and ending in rock)
Now whilst I write this in celebration of the Claude 3 family, I’m not going to go into all the differing benchmark charts as I’m sure you’ve already seen, however, the links and resources in this post should provide the benchmarks you seek if this is something you’re interested in.
What really got my attention was the overall impact that the whole Claude 3 family has on the LLM landscape. Bold strokes of improvement across all three variances including;
- Near-instant results
- Strong vision capabilities
- Fewer refusals
- Improved accuracy
- Long context and near-perfect recall
- Responsible design
- Easier to use
What got my attention the most was the claim of “Strong vision capabilities”. How strong? That’s something I wanted to test out.
I originally started writing this little project about 2 weeks before the release of Claude 3 Haiku, using Claude 3 Sonnet. The appeal was that Claude 3 Sonnet was offering unmatched vision capabilities.
“The Claude 3 models have sophisticated vision capabilities on par with other leading models. They can process a wide range of visual formats, including photos, charts, graphs and technical diagrams. We’re particularly excited to provide this new modality to our enterprise customers, some of whom have up to 50% of their knowledge bases encoded in various formats such as PDFs, flowcharts, or presentation slides.”
(OK I caved in and put a benchmark chart in - worth it though!)
“Claude 3 Haiku: our fastest model yet” - https://www.anthropic.com/news/claude-3-haiku
The moment Haiku was released, my colleagues at the other end of the work meeting I was attending at the time were greeted with a physical whoop of joy. I was already excited by Claude 3 Sonnet so having the next in the generation arrive was like Christmas had come early.
I also knew this meant that I could change my large language model from Sonnet to Haiku seamlessly.
The benefits it provided to my proof of concept were:
- The fastest model in the Claude 3 family
- The most cost-effective model in the Claude 3 family.
This means a lot, especially in a proof of concept like this. This gives us the speed of execution to develop fast, fail fast and iterate on solutions a lot quicker. It also means that we’re able to control the costs of the application.
“Speed is essential for our enterprise users who need to quickly analyze large datasets and generate timely output for tasks like customer support. Claude 3 Haiku is three times faster than its peers for the vast majority of workloads, processing 21K tokens (~30 pages) per second for prompts under 32K tokens [1]. It also generates swift output, enabling responsive, engaging chat experiences and the execution of many small tasks in tandem.”
This speed also means the ability to analyze documents at a much faster rate than ever with a faster recall of information.
In our use case, we want to visualize our images by using descriptive language to help create a catalogue of images with a summary and to also have it provide a category for the image.
This can create powerful use cases such as making art more accessible for the blind or improve operational overhead in managing catalogs of stock images in a creative studio.
By the end of this post, I hope to have demonstrated:
- Seamlessly integrate Claude 3 Haiku into a photo album app
- Calling Amazon Bedrock from a serverless application
- Basic Architecture needed to achieve this.
- Some lessons learned with Prompt Engineering
The basic architecture that we’re going to build looks something like this:
I liken AWS Services to Lego bricks. There are many of them and when put together in varying combinations can make many unique different builds. In this build, we’ll be using:
Amazon S3
- To receive images and send them to be processed using S3 Events
- To store and host processed images and a static hosted HTML page API Gateway
- To accept a GET request from a static hosted HTML page AWS Lambda
- To receive the S3 event and process the image
- To accept a GET request and provide the user with the image data Amazon Bedrock
- To leverage Claude 3 and provide an AI-generated summary and category Amazon DynamoDB
- To store the information generated by Claude 3 / Amazon Bedrock
I’ve utilised the AWS Serverless Application Model (SAM) framework to help build and deploy this as well to keep things as easy as possible.
“SAM and Bedrock? Surely not?!?” - Remember, Amazon Bedrock is a serverless service and can be called by an API or an SDK. The power is yours to call Bedrock any way you want to.
And last but not least, our language of choice. Python for our AWS Lambda functions and there’s a small sprinkling of old-school Javascript to assist the frontend HTML.
If you’re keen to get this setup yourself feel free to browse the repo and get started yourself - https://github.com/alanblockley/bedrock-haiku-image-catalog
Otherwise, for more explanation, keep reading.
When writing this little experiment I started with a very typical MVP style approach. Get something working. We’ve all been there, right? But in this process, I fell victim to also shirking on my prompt for Claude 3. My original prompt was something like
“Provide me a summary of this image and give it a short category”.
In my defence, it worked. But as any good builder/coder/developer, I read the documentation… actually no I didn’t… Have you read IKEA instructions, I’ve been burnt for life! But instead, I sought out a peer review from a very good friend of mine. He swiftly explained that this was a bad prompt. But I didn’t get it. Yes, I’ve done ML/AI/PartyRock/Gen AI stuff but I’ve never really had to write anything in anger until now.
The explanation of why this is a bad prompt can be complex. To make it easier to understand, let’s think of it in human terms, after all this is an emulation of human intelligence.
In your prompt you need to allow for:
- Time for the model to think about what it’s doing - When we are thinking about something or trying to work on something, we generally think it through 3 or 4 times maybe, sometimes we’ll jot down some notes as well.
- Structure your output - Claude likes XML tags, so maybe have Claude structure its output. Ask it to put its thinking into XML tags such as
<SCRATCHPAD>
or<NOTES>
, closing them respectively. Then have it put the output of value, your output, into another XML tag such as<OUTPUT></OUTPUT>
- The more detail the better. Assume you’re explaining this task for the first time to a trainee. If a trainee runs this task for the first time and gets it wrong, you didn’t explain it well enough and Claude definitely won’t get it right. Give it a purpose or a role. It’s not good to expect a toaster to dispense ice. So ensure that your model is given a contextual purpose to work to. * This will help make your answers more accurate.
Whilst all of these actions will help find the accuracy you need in your AI tasks, they’ll also make things more consistent as well. With this advice, my prompt ended up taking up 34 lines and told a story to the model, not just a quick 1 line action that could be interpreted vaguely. Have a play with it and see what you think.
And don’t forget a little bit of logic in your code to help extract the data from the XML tags. I’ve included a Python example in my repo.
In our architecture, we’ve detailed two functions. One that processes images and one that retrieves the stored data.
For the sake of simplicity, there’s a third function, not detailed in the diagram, which takes the image and copies it to the image store, but also renames the image to a random, unique UUID. This is to help indexing and also removes any bias from the AI generation of image data. The model then has no concept of image name. You could also add MIME checking, AV scanning and other checks here and also utilise this bucket to receive from anywhere such as AWS Transfer Family or a web front end via Presigned URLs.
I’ve included the functions in a public repository for you to play with - https://github.com/alanblockley/bedrock-haiku-image-catalog
For the sake of making this an interesting read, I’ve only included a copy/paste of the function that will interact with Amazon Bedrock. This function performs the following:
- Decode Image Name: Extracts the name of the uploaded image from the S3 event and removes any obfuscation caused by URL formatting of the name.
- Retrieve Image Type: Determines the MIME type of the uploaded image using the S3 object key.
- Retrieve Image Data: Fetches the image data from the designated S3 bucket.
- Encode Image Data: Converts the image data to a base64-encoded string for processing by the Claude 3 Haiku model.
- Generate Summary: Utilizes the Claude 3 Haiku model to produce a summary of the image content.
- Extract Summary: Parses the response from the Claude 3 Haiku model to extract the generated summary.
- Store Summary in DynamoDB: Stores the extracted summary along with relevant image details (e.g., object ID) in a DynamoDB table for future retrieval.
This lambda function will also handle the streaming output that Bedrock provides and format it in such a way that it can be inserted into a DynamoDB table.
Also note, that I asked for structured data from the model in my prompt. So when Claude presents me with this data, it’s creating a JSON string within the XML tags which I then use to continue processing the data.
NOTE: To call a bedrock-friendly version of boto3 we’re making use of a Lambda Layer. This will need to be built into your account before deploying this SAM template. You will then need to include the ARN of the layer as a Parameter as you deploy. To build the boto3 lambda layer:
Open Cloud Shell from the AWS Console and type the following commands
Source:- https://www.linkedin.com/posts/mikegchambers_serverless-python-activity-7154258975926964224-IL4G
To use the information we’ve generated in a useful way I created a very simple front end. This just lists any data in the dynamoDB and gives a thumbnail.
I’ve kept this as basic as possible. The hero of the story here after all is Claude… not my horrific javascript skills.
As you can see, Claude has done pretty well at creating a summary and categorising these images. This could then lead to further features being added such as search engines in large collections or services like Amazon Polly to explain what the image is.
Imagine how good this would be at pictionary!
The glue that holds this all together is a SAM template defining the resources in a declarative language that we’re all familiar with.
I’ve kept this template as simple as possible, including a SimpleTable for the DynamoDB table and separating resources between the frontend and backend.
Some of the most important lessons I learned in this experience we not about Amazon Bedrock. Whilst I utilized Bedrock to call our new friend, Sir Claude the third of Haiku, I spent more time working on my prompt than I did calling Bedrock. Many iterations of this prompt happened and also varying changes between system prompts and multi-modal prompts (The ability to use images and text in a prompt).
As we conclude our exploration of Haiku and its role in our photo album application, we’re reminded of the boundless possibilities that AI and cloud computing offer in reshaping our digital landscape. I see this being used in many use cases but I can’t escape the ability to use this to make life easier who may be vision impaired:
A guided tour of an art museum Assistive technology when going through daily life such as shopping for groceries Advanced screen reading.
Some of these use cases can even lead to the betterment of social inclusion for people with disabilities and vision impairment.
There are many commercial use cases for Claude 3’s vision capabilities including but not limited to marketing, hospitality, healthcare/wellness and many more but I’ll leave these to your imagination, as the opportunities are endless.
Curious to explore the world of Claude 3 Haiku and its applications beyond photo management? Dive deeper into AWS services and AI technologies, and discover the endless possibilities of AI models in enriching our digital experiences. Share your thoughts and insights in the comments on my LinkedIn or below, and let’s embark on an AI journey together!