Reduce hallucinations using feedback - Amazon Bedrock multi-modal capabilities

As in Large Language Models (LLMs), Large Multi-modal models (LMMs) also cause hallucination challenges. Processing the outputs' feedback from these models can increase the accuracy.

Gopinath Srinivasan
Amazon Employee
Published Jul 15, 2024
Multi-modal generative AI capabilities of Amazon Bedrock provide alternate, easy on-ramp into the world of image analysis and object recognition. However, hallucinations are common while using generative AI. This blog provides a technique to use generative AI to gather feedback and reduce hallucination.
(Disclaimer: The content below is to indicate the art of the possible. There exist opportunities to refactor and reduce duplicate code, optimize runtime efficiency, and prompt engineering. )

Step 1: Set Up Your Environment

First, ensure you have Python installed on your system. For this project, you'll also need to install boto3, which is the Amazon Web Services (AWS) SDK for Python. This library allows you to create, configure, and manage AWS services, such as the Amazon Bedrock models.
- I will use Anthropic Claude v3 Sonnet for our exercise. You can find details of supported models and regions here: https://docs.aws.amazon.com/bedrock/latest/userguide/models-regions.html.
- Set up model access for your account and region following the link here: https://catalog.workshops.aws/building-with-amazon-bedrock/en-US/prerequisites/bedrock-setup.
The following snippet of code imports required libraries and sets up basic logging:

Step 2 : Define function for running multi-modal prompt

I will define a method for invoking Amazon Bedrock API. This method will be used for both identifying objects in the image and validating if the objects are truly present in the image.

Step 3 : Define AI Model, prompt and image identification function

For this blog, I will use Anthropic Claude-3-sonnet model. Depending on the size and complexity of the image in context, you may need to adjust max_tokens parameter. The prompt asks what is expected of the model and the schema format. Since object detection is often used for downstream processing, giving a schema will make output validation and parsing simple.
Using the above parameters, define the method to identify the objects in image. Please note that the input image will needs to be base64 encoded and passed as part of the API payload.

Step 4 : Define prompt and function to validate objects

The validation prompt categorizes the objects as present, not-present and unsure. The prompt below is appended with each identified object.
This function verifies if the objects identified and returned in get_objects_from_model function is accurate.

Step 5 : Putting it all together and running

I exercise image identification and feedback evaluation in our main() method. While I tried to "enforce" the models to provide outputs complying with specific schemas using detailed prompts, the model outputs occasionally don't comply with the specified schema. To account for variations in model response, two techniques are showcased in here.
1. Retry when there's exception: Occasionally, the object schema returned by the model in get_objects_from_model function varies, resulting in failure of parsing and data-extraction. Simply re-running the model invocation provides better outcomes. To avoid cost-overrun, I can limit the maximum retries to a specific number (3 in this blog)
2. Alternate pattern matching: In validate_objects function, the model occasionally returned values mapped to different keys. To account for this variation, alternate matching patterns are given during evaluation.
Run the code from command line:
$ python multimodal-image-analysis-with-bedrock.py
Sample output for image(sample-image.jpg) given in the git repository:
In the above output, "Books" is initially identified as an object in the given image. During feedback processing using validate_objects function, Amazon Bedrock could not identify "Books" in the image, hence it will be removed from the list. The list of objects before and after are logged in console. Please note that if you run this code, your output might differ from outputs of run above.

Summary

This blog provides you with a technique to do object identification from images and seeking feedback from the same model to reduce hallucinations. The schema validations and fail-safe implemented in main() can be potentially included get_objects_from_model() and validate_objects() functions separately. Invoking models repeatedly for validation increases the accuracy and also increases cost of overall solution. Use of different Bedrock models to balance accuracy and cost can potentially help offset/reduce this increased cost. Full code for this article can be found at https://github.com/gopinaath/multimodal-image-analysis-bedrock.

Disclaimer

  • The content above is to indicate the art of the possible. There exist opportunities to refactor and reduce duplicate code, runtime optimizations, and prompt engineering.
  • If you'd like to dive deeper into this, look here for a hands-on workshop: Building with Amazon Bedrock
     

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments