Reduce hallucinations using feedback - Amazon Bedrock multi-modal capabilities
As in Large Language Models (LLMs), Large Multi-modal models (LMMs) also cause hallucination challenges. Processing the outputs' feedback from these models can increase the accuracy.
Gopinath Srinivasan
Amazon Employee
Published Jul 15, 2024
Multi-modal generative AI capabilities of Amazon Bedrock provide alternate, easy on-ramp into the world of image analysis and object recognition. However, hallucinations are common while using generative AI. This blog provides a technique to use generative AI to gather feedback and reduce hallucination.
(Disclaimer: The content below is to indicate the art of the possible. There exist opportunities to refactor and reduce duplicate code, optimize runtime efficiency, and prompt engineering. )
First, ensure you have Python installed on your system. For this project, you'll also need to install boto3, which is the Amazon Web Services (AWS) SDK for Python. This library allows you to create, configure, and manage AWS services, such as the Amazon Bedrock models.
- I will use Anthropic Claude v3 Sonnet for our exercise. You can find details of supported models and regions here: https://docs.aws.amazon.com/bedrock/latest/userguide/models-regions.html.
- Set up model access for your account and region following the link here: https://catalog.workshops.aws/building-with-amazon-bedrock/en-US/prerequisites/bedrock-setup.
The following snippet of code imports required libraries and sets up basic logging:
I will define a method for invoking Amazon Bedrock API. This method will be used for both identifying objects in the image and validating if the objects are truly present in the image.
For this blog, I will use
Anthropic Claude-3-sonnet
model. Depending on the size and complexity of the image in context, you may need to adjust max_tokens
parameter. The prompt asks what is expected of the model and the schema format. Since object detection is often used for downstream processing, giving a schema will make output validation and parsing simple.Using the above parameters, define the method to identify the objects in image. Please note that the input image will needs to be
base64
encoded and passed as part of the API payload. The validation prompt categorizes the objects as present, not-present and unsure. The prompt below is appended with each identified object.
This function verifies if the objects identified and returned in
get_objects_from_model
function is accurate.I exercise image identification and feedback evaluation in our
main()
method. While I tried to "enforce" the models to provide outputs complying with specific schemas using detailed prompts, the model outputs occasionally don't comply with the specified schema. To account for variations in model response, two techniques are showcased in here.1. Retry when there's exception: Occasionally, the object schema returned by the model in
get_objects_from_model
function varies, resulting in failure of parsing and data-extraction. Simply re-running the model invocation provides better outcomes. To avoid cost-overrun, I can limit the maximum retries to a specific number (3 in this blog)2. Alternate pattern matching: In
validate_objects
function, the model occasionally returned values mapped to different keys. To account for this variation, alternate matching patterns are given during evaluation.Run the code from command line:
$ python multimodal-image-analysis-with-bedrock.py
Sample output for image(
sample-image.jpg
) given in the git repository:In the above output, "Books" is initially identified as an object in the given image. During feedback processing using
validate_objects
function, Amazon Bedrock could not identify "Books" in the image, hence it will be removed from the list. The list of objects before and after are logged in console. Please note that if you run this code, your output might differ from outputs of run above. This blog provides you with a technique to do object identification from images and seeking feedback from the same model to reduce hallucinations. The schema validations and fail-safe implemented in
main()
can be potentially included get_objects_from_model()
and validate_objects()
functions separately. Invoking models repeatedly for validation increases the accuracy and also increases cost of overall solution. Use of different Bedrock models to balance accuracy and cost can potentially help offset/reduce this increased cost. Full code for this article can be found at https://github.com/gopinaath/multimodal-image-analysis-bedrock. - The content above is to indicate the art of the possible. There exist opportunities to refactor and reduce duplicate code, runtime optimizations, and prompt engineering.
- If you'd like to dive deeper into this, look here for a hands-on workshop: Building with Amazon Bedrock
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.