Reduce hallucinations using feedback - Amazon Bedrock multi-modal capabilities
As in Large Language Models (LLMs), Large Multi-modal models (LMMs) also cause hallucination challenges. Processing the outputs' feedback from these models can increase the accuracy.
Anthropic Claude-3-sonnet
model. Depending on the size and complexity of the image in context, you may need to adjust max_tokens
parameter. The prompt asks what is expected of the model and the schema format. Since object detection is often used for downstream processing, giving a schema will make output validation and parsing simple.base64
encoded and passed as part of the API payload. get_objects_from_model
function is accurate.main()
method. While I tried to "enforce" the models to provide outputs complying with specific schemas using detailed prompts, the model outputs occasionally don't comply with the specified schema. To account for variations in model response, two techniques are showcased in here.get_objects_from_model
function varies, resulting in failure of parsing and data-extraction. Simply re-running the model invocation provides better outcomes. To avoid cost-overrun, I can limit the maximum retries to a specific number (3 in this blog)validate_objects
function, the model occasionally returned values mapped to different keys. To account for this variation, alternate matching patterns are given during evaluation.$ python multimodal-image-analysis-with-bedrock.py
sample-image.jpg
) given in the git repository:validate_objects
function, Amazon Bedrock could not identify "Books" in the image, hence it will be removed from the list. The list of objects before and after are logged in console. Please note that if you run this code, your output might differ from outputs of run above. main()
can be potentially included get_objects_from_model()
and validate_objects()
functions separately. Invoking models repeatedly for validation increases the accuracy and also increases cost of overall solution. Use of different Bedrock models to balance accuracy and cost can potentially help offset/reduce this increased cost. Full code for this article can be found at https://github.com/gopinaath/multimodal-image-analysis-bedrock. - The content above is to indicate the art of the possible. There exist opportunities to refactor and reduce duplicate code, runtime optimizations, and prompt engineering.
- If you'd like to dive deeper into this, look here for a hands-on workshop: Building with Amazon Bedrock
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.