AWS Logo
Menu
Visual Analysis with Visual Few-Shot Prompting

Visual Analysis with Visual Few-Shot Prompting

In this post I will outline a method to provide a Large Language Model "visual" examples using the "few-shot" prompting approach to get more accurate results.

Randall Potter
Amazon Employee
Published Dec 16, 2024

Introduction

Any form of programmatic vision analysis can be very difficult at times, particularly when using traditional OCR approaches.
With the availability of vision capable Large Language Models (LLMs) such as Anthropic's latest Claude models we can take a different approach to solving complex document and visual analysis.
There is a time-tested method for providing examples to large language models called "few-shot prompting." Before, we were limited to only using "text" for this approach. Utilizing an SDK such as Amazon Web Services' (AWS) "boto3 library," we can take advantage of the "messages API" in Amazon Bedrock.
The idea behind the messages api is that you can construct a conversation between a user and an LLM. This of course is often seen in "chat style" interactions. However, it can also be used to "prime" or "pre-fill" context to the LLM. For our exercise here we will be using it to provide "visual context" for the LLM, enhancing its abilities to understand the images we will want to analyze.

Let's Dive Deeper

Our basic message structure is going to be like this:
  1. Create a System Prompt
  2. Create Visual Examples to Provide to the LLM
  3. Construct a messages API historical conversation that the LLM only uses so that the visual examples ARE ONLY used as examples and also so that the LLM doesn't leak its analysis of them into its response.
  4. Provide the System Prompt to the LLM as the user's final turn in the conversation.
For each turn in the conversation will be associated and constructed like so:
  1. Conversation
    1. User:
      1. The following are examples...
    2. Assistant:
      1. I understand these are only examples...
    3. User:
      1. Here are the examples.
        1. Example A
        2. Example B
        3. Example C
    4. Assistant:
      1. I understand these are examples and are only to be used as examples
and so on...
A quick example of a messages API payload would be:
As you can see here (and learn more about here), we could replace "PLACEHOLDER" with whatever we need to provide the LLM with a more structured historical context.

Example Complete Conversation

Note that you will have to ensure you replace the placeholders denoted in brackets, "[ ]"...

Annotated Visual Few-Shot Example

Annotated Visual Few-Shot Sample.  Annotated tree leaves of various colors.
Annotated Visual Few-Shot Sample

Example of an Image to Analyze

Random Tree leaves of various colors.
Example Image to Analyze

Example Output


Conclusion

In this post we've described a way to provide a vision capable large language model, few-shot examples that were annotated visuals. I hope you've found this helpful!
Please review the below for more examples.
Thank you!

Model, Prompt, and Inference Parameter Samples

Now that we've outlined the process here are sample inference snippets for you to experiment with!

Model

Anthropic Claude 3.5 Sonnet v2 on Amazon Bedrock (Click here for more information on this model.)

System Prompt

User Prompt:

Inference Parameters

Adjust as needed...
WhatValue
Temperature0.2
Top P0.999
Top K150
Max Tokens2000
Stop Sequencesresponse_format
 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments