logo
Menu
Process Amazon Bedrock's Response Stream with JavaScript

Process Amazon Bedrock's Response Stream with JavaScript

Learn how to use Claude 3 through Amazon Bedrock's InvokeModelWithResponseStream API and process the response with the AWS SDK for JavaScript.

Dennis Traub
Amazon Employee
Published Mar 18, 2024
Last Modified Mar 20, 2024
Here's how you can process Amazon Bedrock's response stream with the AWS SDK for JavaScript v3, using Anthropic Claude 3 Haiku as an example.
No fluff, just code. Let's go!

The Step-by-step Guide

Step 1: Install the required packages

1
2
npm install @aws-sdk/client-bedrock-runtime
npm install @aws-sdk/credential-provider-node

Step 2: Import the packages at the top of your script

1
2
3
4
import { defaultProvider } from "@aws-sdk/credential-provider-node";
import {
BedrockRuntimeClient, InvokeModelWithResponseStreamCommand,
} from "@aws-sdk/client-bedrock-runtime";

Step 3: Create an instance of the Amazon Bedrock Runtime client

1
2
3
4
const client = new BedrockRuntimeClient({
credentialDefaultProvider: defaultProvider,
region: "us-east-1",
});

Step 4: Prepare the payload with your prompt

1
2
3
4
5
6
7
const prompt = "Tell me a story!";

const payload = {
anthropic_version: "bedrock-2023-05-31",
max_tokens: 1000,
messages: [{ role: "user", content: [{ type: "text", text: prompt }] }],
};

Step 5: Invoke Claude with the payload and wait for the API to respond

1
2
3
4
5
6
7
8
9
const modelId = "anthropic.claude-3-haiku-20240307-v1:0";

const command = new InvokeModelWithResponseStreamCommand({
contentType: "application/json",
body: JSON.stringify(payload),
modelId,
});

const apiResponse = await client.send(command);

Step 6: Decode and process the chunks of the response stream

The stream contains different chunk types, allowing us to extract the message along with some additional information:
  • The "message_start" chunk contains the role the model has attached to the message.
  • The "content_block_delta" chunk contains the actual message parts.
  • The "message_stop" chunk contains some metrics, like token count.
The individual chunks can be processed inside a for await ... of loop:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
let completeMessage = "";

for await (const item of apiResponse.body) {
// Decode each chunk
const chunk = JSON.parse(new TextDecoder().decode(item.chunk.bytes));

// Get its type
const chunk_type = chunk.type;

// Process the chunk depending on its type
if (chunk_type === "message_start") {
// The "message_start" chunk contains the message's role
console.log(`The message's role: ${chunk.message.role}`)
} else if (chunk_type === "content_block_delta") {
// The "content_block_delta" chunks contain the actual response text

// Print each individual chunk in real-time
process.stdout.write(chunk.delta.text);

// ... and add it to the complete message
completeMessage = completeMessage + chunk.delta.text;

} else if (chunk_type === "message_stop") {
// The "message_stop" chunk contains some metrics
const metrics = chunk["amazon-bedrock-invocationMetrics"];
console.log(`\nNumber of input tokens: ${metrics.inputTokenCount}`);
console.log(`Number of output tokens: ${metrics.outputTokenCount}`);
console.log(`Invocation latency: ${metrics.invocationLatency}`);
console.log(`First byte latency: ${metrics.firstByteLatency}`);
}
}

The complete script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
import { defaultProvider } from "@aws-sdk/credential-provider-node";
import {
BedrockRuntimeClient, InvokeModelWithResponseStreamCommand,
} from "@aws-sdk/client-bedrock-runtime";

// Create a new Bedrock Runtime client instance.
const client = new BedrockRuntimeClient({
credentialDefaultProvider: defaultProvider,
region: "us-east-1",
});

// Prepare the payload for the model.
const prompt = "Tell me a story!";
const payload = {
anthropic_version: "bedrock-2023-05-31",
max_tokens: 1000,
messages: [{ role: "user", content: [{ type: "text", text: prompt }] }],
};

// Invoke Claude with the payload and wait for the API to respond.
const modelId = "anthropic.claude-3-haiku-20240307-v1:0";
const command = new InvokeModelWithResponseStreamCommand({
contentType: "application/json",
body: JSON.stringify(payload),
modelId,
});
const apiResponse = await client.send(command);

// Process and print the stream in real-time
let completeMessage = "";

for await (const item of apiResponse.body) {
// Decode each chunk
const chunk = JSON.parse(new TextDecoder().decode(item.chunk.bytes));

// Get its type
const chunk_type = chunk.type;

// Process the chunk depending on its type
if (chunk_type === "message_start") {
// The "message_start" chunk contains the message's role
console.log(`The message's role: ${chunk.message.role}`)
} else if (chunk_type === "content_block_delta") {
// The "content_block_delta" chunks contain the actual response text

// Print each individual chunk in real-time
process.stdout.write(chunk.delta.text);

// ... and add it to the complete message
completeMessage = completeMessage + chunk.delta.text;

} else if (chunk_type === "message_stop") {
// The "message_stop" chunk contains some metrics
const metrics = chunk["amazon-bedrock-invocationMetrics"];
console.log(`\nNumber of input tokens: ${metrics.inputTokenCount}`);
console.log(`Number of output tokens: ${metrics.outputTokenCount}`);
console.log(`Invocation latency: ${metrics.invocationLatency}`);
console.log(`First byte latency: ${metrics.firstByteLatency}`);
}
}

// Print the complete message.
console.log("\nComplete response:");
console.log(completeMessage)
And that's it!
Learned something new? Like this post or let me know in the comments.
If you want to learn more, have a look at the following resources:

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

2 Comments