Reduce hallucinations using feedback - Amazon Bedrock multi-modal capabilities
As in Large Language Models (LLMs), Large Multi-modal models (LMMs) also cause hallucination challenges. Processing the outputs' feedback from these models can increase the accuracy.
1
pip install boto3
1
2
3
4
5
6
7
8
import json
import logging
import base64
import boto3
from botocore.exceptions import ClientError
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)
1
2
3
4
5
6
7
8
9
10
11
12
def run_multi_modal_prompt(bedrock_runtime, messages, max_tokens):
body = json.dumps(
{
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": max_tokens,
"messages": messages
}
)
response = bedrock_runtime.invoke_model(
body=body, modelId=model_id, )
response_body = json.loads(response.get('body').read())
return response_body
Anthropic Claude-3-sonnet
model. Depending on the size and complexity of the image in context, you may need to adjust max_tokens
parameter. The prompt asks what is expected of the model and the schema format. Since object detection is often used for downstream processing, giving a schema will make output validation and parsing simple.1
2
3
4
5
6
7
8
9
10
model_id = 'anthropic.claude-3-sonnet-20240229-v1:0'
max_tokens = 2000
prompt_text_identify_objects = """
Accurately identify each object and its location in this image.
If there are multiple objects of the same type, identify each separately.
After identifying, double-check if each of the identified object is present the given picture.
Ignore architectural features like floor, wall, etc. Give response in JSON format using double quotes.
Sample JSON format: {"objects": {"Object-1 Name": "Object-1 location", "Object-2 name": "Object-2 location"}}.
This is only a sample JSON document with two placeholder items. You will likely have more objects in the image.
"""
base64
encoded and passed as part of the API payload. 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
def get_objects_from_model(input_image, prompt_text):
try:
bedrock_runtime = boto3.client(service_name='bedrock-runtime')
with open(input_image, "rb") as image_file:
content_image = base64.b64encode(image_file.read()).decode('utf8')
message = {
"role": "user",
"content": [
{"type": "image", "source": {"type": "base64", "media_type": "image/jpeg", "data": content_image}},
{"type": "text", "text": prompt_text}
]
}
messages = [message]
response = run_multi_modal_prompt(bedrock_runtime, messages, max_tokens)
logger.debug(response)
content = response["content"]
first_item = content[0]
text = first_item["text"]
# Extract the JSON string from the text
start = text.index("{")
end = text.rindex("}") + 1
json_string = text[start:end]
logger.debug(json_string)
text_json = json.loads(json_string)
objects = text_json["objects"]
return objects
except Exception as e:
logger.error("Exception occurred: %s", e)
return None
1
prompt_text_validate_objects = "Answer if the given object can be identified in the image. If the identified object is present in the image, respond with 'Yes'. If the identified objects are not present in the image, respond with 'No'. If you are unsure, respond with 'Unsure'. Give response in JSON format. Use double quotes for constructing json objects. Don't add newline characters. Does the image contain "
get_objects_from_model
function is accurate.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
def validate_objects(input_image, prompt_text):
try:
bedrock_runtime = boto3.client(service_name='bedrock-runtime')
with open(input_image, "rb") as image_file:
content_image = base64.b64encode(image_file.read()).decode('utf8')
message = {
"role": "user",
"content": [
{"type": "image", "source": {"type": "base64", "media_type": "image/jpeg", "data": content_image}},
{"type": "text", "text": prompt_text}
]
}
messages = [message]
response = run_multi_modal_prompt(bedrock_runtime, messages, max_tokens)
logger.debug(response)
content = response["content"]
logger.debug(content)
return content
except Exception as e:
logger.error("Exception occurred: %s", e)
return None
main()
method. While I tried to "enforce" the models to provide outputs complying with specific schemas using detailed prompts, the model outputs occasionally don't comply with the specified schema. To account for variations in model response, two techniques are showcased in here.get_objects_from_model
function varies, resulting in failure of parsing and data-extraction. Simply re-running the model invocation provides better outcomes. To avoid cost-overrun, I can limit the maximum retries to a specific number (3 in this blog)validate_objects
function, the model occasionally returned values mapped to different keys. To account for this variation, alternate matching patterns are given during evaluation.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
def main():
input_image_1 = "./sample-image.jpg"
# sometimes the JSON output from the model can be inconsistent. Retry 3 times before giving up
retry_count = 0
while retry_count < 3:
try:
objects = get_objects_from_model(input_image_1, prompt_text_identify_objects)
pretty_json_objects_image_1 = json.dumps(objects, indent=4)
logger.info(pretty_json_objects_image_1)
break
except ValueError as e:
logger.error(f"Error invoking model: {e}")
retry_count += 1
if retry_count == 3:
logger.error("Maximum retries reached. Exiting.")
return
keys = list(objects.keys())
logger.info(keys)
key_list_to_be_removed = []
for key in objects:
validation_result = validate_objects(input_image_1, prompt_text_validate_objects + str(key) + " ? ")
first_item = validation_result[0]
text_field = first_item["text"]
validation_result_inner_dict = json.loads(text_field)
# response is sometimes called "response" and sometimes called "answer"
response_value = validation_result_inner_dict.get("response", validation_result_inner_dict.get("answer", None))
if(not (response_value == "Yes")):
key_list_to_be_removed.append(key)
logger.info("Objects to be removed :" + str(key_list_to_be_removed))
for key in key_list_to_be_removed:
objects.pop(key)
logger.info(json.dumps(objects, indent=4))
if __name__ == "__main__":
main()
$ python multimodal-image-analysis-with-bedrock.py
sample-image.jpg
) given in the git repository:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
INFO:__main__:{
"Ceiling light fixtures": "On the ceiling, multiple irregular-shaped pendant light fixtures",
"Curtains": "Hanging on windows near the outdoor scenery",
"Sofas": "Two gray sofas in the center of the living room",
"Coffee table": "A rectangular wooden coffee table between the sofas",
"Potted plant": "A potted plant near the corner of the room",
"Floor lamp": "A standing floor lamp next to one of the sofas",
"Area rug": "A large dark-colored area rug underneath the coffee table",
"Wall art": "A framed artwork hanging on the wall",
"Books": "Books placed on the coffee table"
}
INFO:__main__:['Ceiling light fixtures', 'Curtains', 'Sofas', 'Coffee table', 'Potted plant', 'Floor lamp', 'Area rug', 'Wall art', 'Books']
INFO:__main__:Objects to be removed :['Books']
INFO:__main__:{
"Ceiling light fixtures": "On the ceiling, multiple irregular-shaped pendant light fixtures",
"Curtains": "Hanging on windows near the outdoor scenery",
"Sofas": "Two gray sofas in the center of the living room",
"Coffee table": "A rectangular wooden coffee table between the sofas",
"Potted plant": "A potted plant near the corner of the room",
"Floor lamp": "A standing floor lamp next to one of the sofas",
"Area rug": "A large dark-colored area rug underneath the coffee table",
"Wall art": "A framed artwork hanging on the wall"
}
validate_objects
function, Amazon Bedrock could not identify "Books" in the image, hence it will be removed from the list. The list of objects before and after are logged in console. Please note that if you run this code, your output might differ from outputs of run above. main()
can be potentially included get_objects_from_model()
and validate_objects()
functions separately. Invoking models repeatedly for validation increases the accuracy and also increases cost of overall solution. Use of different Bedrock models to balance accuracy and cost can potentially help offset/reduce this increased cost. Full code for this article can be found at https://github.com/gopinaath/multimodal-image-analysis-bedrock. - The content above is to indicate the art of the possible. There exist opportunities to refactor and reduce duplicate code, runtime optimizations, and prompt engineering.
- If you'd like to dive deeper into this, look here for a hands-on workshop: Building with Amazon Bedrock
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.