logo
Menu
Streamlining Large Language Model Interactions with Amazon Bedrock Converse API

Streamlining Large Language Model Interactions with Amazon Bedrock Converse API

Amazon Bedrock's Converse API provides a consistent interface to seamlessly invoke various large language models, eliminating complex helper functions. Code examples showcase its simplicity compared to previous unique integrations per model. A demo highlights leveraging Claude 3 for multimodal image description, effortlessly harnessing large models' capabilities through this unified API.

Haowen Huang
Amazon Employee
Published Jun 11, 2024

Overview

Amazon Bedrock has introduced the Converse API, a game-changing tool that streamlines interactions with its AI models by providing a consistent interface. This API allows developers to invoke different Amazon Bedrock models without accounting for model-specific parameters or implementations. A key advantage is that it offers a consistent experience across models, eliminating the need to manage model-specific implementations. Developers can write code once and seamlessly use it with various models available on Amazon Bedrock in AWS Regions where the service is offered.
The supported models and model features of the Converse API are detailed at the following link, which you can check frequently as more models will become available via the Converse API:
To help developers quickly understand the new Converse API, I'll start with a code example from before the Converse API was released. Then I'll provide the sample Converse API code directly from the official website, allowing you to see the significant difference in how the Converse API can greatly streamline AI model interactions. Finally, I'll highlight the vision feature with Claude 3, demonstrating how easily you can leverage the Converse API to unleash the potential of Large Language Models on Amazon Bedrock.

World Before Bedrock Converse API

In the past, developers had to write complex helper functions to unify the input and output formats across different AI models. For example, during an Amazon Bedrock development workshop in Hong Kong, a helper function of 116 lines of code was required to achieve a consistent way of calling various foundation models. I’ll show the code below.
The provided code is a Python function that invokes different language models from various providers (Anthropic, Mistral, AI21, Amazon, Cohere, and Meta) using the AWS Bedrock. The `invoke_model` function takes input parameters such as the prompt, model name, temperature, top-k, top-p, and stop sequences, and returns the generated output text from the specified language model. The function uses the `boto3` library to interact with the AWS Bedrock and sends the appropriate input data based on the provider's API requirements. The code also includes a main function that sets up the AWS Bedrock Runtime client, specifies a model and prompt, and calls the `invoke_model` function to generate the output text, which is then printed.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
import json
import boto3

def invoke_model(client, prompt, model,
accept = 'application/json', content_type = 'application/json',
max_tokens = 512, temperature = 1.0, top_p = 1.0, top_k = 200, stop_sequences = [],
count_penalty = 0, presence_penalty = 0, frequency_penalty = 0, return_likelihoods = 'NONE'
):
# default response
output = ''
# identify the model provider
provider = model.split('.')[0]
# InvokeModel
if (provider == 'anthropic'):
input = {
'prompt': prompt,
'max_tokens_to_sample': max_tokens,
'temperature': temperature,
'top_k': top_k,
'top_p': top_p,
'stop_sequences': stop_sequences
}
body=json.dumps(input)
response = client.invoke_model(body=body, modelId=model, accept=accept,contentType=content_type)
response_body = json.loads(response.get('body').read())
output = response_body['completion']
elif (provider == 'mistral'):
input = {
'prompt': prompt,
'max_tokens': max_tokens,
'temperature': temperature,
'top_k': top_k,
'top_p': top_p,
'stop': stop_sequences
}
body=json.dumps(input)
response = client.invoke_model(body=body, modelId=model, accept=accept,contentType=content_type)
response_body = json.loads(response.get('body').read())
results = response_body['outputs']
for result in results:
output = output + result['text']
elif (provider == 'ai21'):
input = {
'prompt': prompt,
'maxTokens': max_tokens,
'temperature': temperature,
'topP': top_p,
'stopSequences': stop_sequences,
'countPenalty': {'scale': count_penalty},
'presencePenalty': {'scale': presence_penalty},
'frequencyPenalty': {'scale': frequency_penalty}
}
body=json.dumps(input)
response = client.invoke_model(body=body, modelId=model, accept=accept,contentType=content_type)
response_body = json.loads(response.get('body').read())
completions = response_body['completions']
for part in completions:
output = output + part['data']['text']
elif (provider == 'amazon'):
input = {
'inputText': prompt,
'textGenerationConfig': {
'maxTokenCount': max_tokens,
'stopSequences': stop_sequences,
'temperature': temperature,
'topP': top_p
}
}
body=json.dumps(input)
response = client.invoke_model(body=body, modelId=model, accept=accept,contentType=content_type)
response_body = json.loads(response.get('body').read())
results = response_body['results']
for result in results:
output = output + result['outputText']
elif (provider == 'cohere'):
input = {
'prompt': prompt,
'max_tokens': max_tokens,
'temperature': temperature,
'k': top_k,
'p': top_p,
'stop_sequences': stop_sequences,
'return_likelihoods': return_likelihoods
}
body=json.dumps(input)
response = client.invoke_model(body=body, modelId=model, accept=accept,contentType=content_type)
response_body = json.loads(response.get('body').read())
results = response_body['generations']
for result in results:
output = output + result['text']
elif (provider == 'meta'):
input = {
'prompt': prompt,
'max_gen_len': max_tokens,
'temperature': temperature,
'top_p': top_p
}
body=json.dumps(input)
response = client.invoke_model(body=body, modelId=model, accept=accept,contentType=content_type)
response_body = json.loads(response.get('body').read())
output = response_body['generation']
# return
return output

# main function
bedrock = boto3.client(
service_name='bedrock-runtime'
)
model = 'mistral.mistral-7b-instruct-v0:2'
prompt = """

Human: Explain how chicken swim to an 8 year old using 2 paragraphs.

Assistant:
"""

output = invoke_model(client=bedrock, prompt=prompt, model=model)
print(output)
The number of lines in the code above only shows the implementation of the interface function for these few models. As the number of large models that need to be unified increases, the amount of code will continue to grow.
You can refer to the following document link for the detail of this 116-line of code sample to invoke different language models in one function:

World With Bedrock Converse API

The following code snippet from the AWS official website demonstrates how simple to call the Converse API operation with a model on Amazon Bedrock.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
def generate_conversation(bedrock_client,
model_id,
system_text,
input_text
):
……
# Send the message.
response = bedrock_client.converse(
modelId=model_id,
messages=messages,
system=system_prompts,
inferenceConfig=inference_config,
additionalModelRequestFields=additional_model_fields
)
……
For the full code, you can refer to the following link:
To demonstrate how to fully utilize the Converse API, here’s an example that sends both text and an image to the Claude 3 Sonnet model using the converse() method. The code reads in an image file, creates the message payload with the text prompt and image bytes, and then prints out the model's description of the scene.
To test it with different images, simply update the input file path. This showcases the Converse API's multimedia capabilities for multimodal applications.
The two images are pictures taken from my window in the beautiful city of Hong Kong while writing this blog post. They show the street view of Causeway Bay in Hong Kong, as displayed below:
Causeway Bay Street View, Hong Kong (Image 1)
Causeway Bay Street View, Hong Kong (Image 1)
Causeway Bay Street View, Hong Kong (Image 2)
Causeway Bay Street View, Hong Kong (Image 2)
For the code part, I wrote a generate_conversation_with_image() function and modified some code in main() function based on the AWS official website code mentioned previously. The details are as follows:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
def generate_conversation_with_image(bedrock_client,
model_id,
input_text,
input_image
):
"""
Sends a message to a model.
Args:
bedrock_client: The Boto3 Bedrock runtime client.
model_id (str): The model ID to use.
input text : The input message.
input_image : The input image.

Returns:
response (JSON): The conversation that the model generated.

"""


logger.info("Generating message with model %s", model_id)

# Message to send.

with open(input_image, "rb") as f:
image = f.read()

message = {
"role": "user",
"content": [
{
"text": input_text
},
{
"image": {
"format": 'png',
"source": {
"bytes": image
}
}
}
]
}

messages = [message]

# Send the message.
response = bedrock_client.converse(
modelId=model_id,
messages=messages
)

return response

def main():
"""
Entrypoint for Anthropic Claude 3 Sonnet example.
"""


logging.basicConfig(level=logging.INFO,
format="%(levelname)s: %(message)s")

model_id = "anthropic.claude-3-sonnet-20240229-v1:0"
input_text = "What's in this image?"
input_image = "IMG_1_Haowen.jpg"

try:

bedrock_client = boto3.client(service_name="bedrock-runtime")

response = generate_conversation_with_image(
bedrock_client, model_id, input_text, input_image)

output_message = response['output']['message']

print(f"Role: {output_message['role']}")

for content in output_message['content']:
print(f"Text: {content['text']}")

token_usage = response['usage']
print(f"Input tokens: {token_usage['inputTokens']}")
print(f"Output tokens: {token_usage['outputTokens']}")
print(f"Total tokens: {token_usage['totalTokens']}")
print(f"Stop reason: {response['stopReason']}")

except ClientError as err:
message = err.response['Error']['Message']
logger.error("A client error occurred: %s", message)
print(f"A client error occured: {message}")

else:
print(
f"Finished generating text with model {model_id}.")

if __name__ == "__main__":
main()
For image #1, I obtained the following result from the model:
Multimodal Image Description by Claude 3 for Image #1
Multimodal Image Description by Claude 3 for Image #1
For your convenience, I have copied the model’s output here:
1
2
3
4
5
6
7
Role: assistant
Text: This image shows a dense urban cityscape with numerous high-rise residential and office buildings in Hong Kong. In the foreground, there are sports facilities like a running track, soccer/football fields, and tennis/basketball courts surrounded by the towering skyscrapers of the city. The sports venues provide open green spaces amidst the densely packed urban environment. The scene captures the juxtaposition of modern city living and recreational amenities in a major metropolitan area like Hong Kong.
Input tokens: 1580
Output tokens: 103
Total tokens: 1683
Stop reason: end_turn
Finished generating text with model anthropic.claude-3-sonnet-20240229-v1:0.
For image #2, I just simply modify the “input_image” path in the code to the new image path. When I inputted image #2 as the new image for Claude 3 Sonnet model, I obtained the following result from the model:
Multimodal Image Description by Claude 3 for Image #2
Multimodal Image Description by Claude 3 for Image #2
For your convenience, I have copied the model’s output here:
1
2
3
4
5
6
7
8
9
10
11
12
13
Role: assistant
Text: This image shows an aerial view of a dense urban city skyline, likely in a major metropolitan area. The cityscape is dominated by tall skyscrapers and high-rise apartment or office buildings of varying architectural styles, indicating a highly developed and populous city center.

In the foreground, a major highway or expressway can be seen cutting through the city, with multiple lanes of traffic visible, though the traffic appears relatively light in this particular view. There are also some pockets of greenery interspersed among the buildings, such as parks or green spaces.

One notable feature is a large billboard or advertisement for the luxury brand Chanel prominently displayed on the side of a building, suggesting this is a commercial and shopping district.

Overall, the image captures the concentrated urban density, modern infrastructure, and mixture of residential, commercial, and transportation elements characteristic of a major cosmopolitan city.
Input tokens: 1580
Output tokens: 188
Total tokens: 1768
Stop reason: end_turn
Finished generating text with model anthropic.claude-3-sonnet-20240229-v1:0.

Summary

Amazon Bedrock's new Converse API simplifies interactions with large language models by providing a consistent interface, eliminating the need for model-specific implementations. Previously, developers had to write complex helper functions with hundreds of lines of code to unify input/output formats across models. The Converse API allows seamlessly invoking various models using the same API across AWS regions, significantly reducing code complexity. Code examples demonstrate the Converse API's simplicity compared to the previous approach requiring unique integrations per model provider. A demo highlights leveraging Claude 3 via the Converse API for multimodal image description, showcasing how it can effortlessly harness large language models' capabilities. Overall, the Converse API streamlines utilizing different large models from Amazon Bedrock, reducing development efforts through a consistent interface.
Note: The cover image for this blog post was generated using the SDXL 1.0 model on Amazon Bedrock. The prompt given was as follows:
a developer sitting in the cafe, comic, graphic illustration, comic art, graphic novel art, vibrant, highly detailed, colored, 2d minimalistic
 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments