Handling streaming and structured data using Amazon Bedrock

Introduction

At Telescope, we work with financial and brokerage applications to power their rich and intelligent stock discovery experiences using AI. To provide these experiences, we needed a large language model (LLM) that is predictable given the financial industry we operate in, provides fast user experiences and has the ability to understand and process structured financial markets data.

In this blog post, you will learn about our experience adopting Claude on Amazon Bedrock, how to design effective prompts when handling structured data, and how to handle streaming inputs and outputs to provide fast user experiences. For those adopting Claude models for their generative AI capabilities, it's crucial to understand that creating tailored prompts, rather than reusing ones from other models, is key to fully harnessing the model's capabilities for optimal results and generate consistent responses.

Initial impressions

The moment we gained access to Amazon Bedrock from our AWS account team, we eagerly adopted Claude model. It has been a game-changer, particularly in terms of speed and context window size (as at time of writing, these are 200,000 tokens which equates to about 150,000 words).

We evaluated a number of models for our use case of providing investment suggestions. The first criteria was predictable results on complex language tasks. This is especially important given the financial industry we operate in. In addition, we operate with vast amounts of structured financial markets data and we needed a model that can process and understand structured data. Furthermore, our product is used by consumer applications where fast user experience is paramount. Therefore, having streaming capabilities is critical. Given the depth of our requests, some can take over 30 seconds to fully complete. Having results returned in chunks as soon as they have been computed matters significantly for the user experiences that we wanted.

Prompting basics

As a large language model (LLM), Claude is fantastic at processing complex language-based tasks quickly. However, one surprise we encountered was how it processes structured data. To optimize responses, XML tags can be used to identify structured data. At first, moving back to XML felt odd, considering JSON is the standard for modern applications. However, using XML in your prompt context is vital in helping the model understand the structure of your data.

To best illustrate this, let’s start with a simple prompt and tailor it towards a consistent, structured format.

Prompt:

Response:

Great! We’ve got our list of 3 companies. However, there are several issues if we want to automate parsing and use these results to power a user interface.

Firstly, Claude likes to clarify what its task is. Secondly, there’s a more subtle thing that’s hard to notice; the
response started with a single space character because our prompt didn’t have one after the Assistant section. We can fix both issues in one go by providing this preamble directly into our prompt:

Prompt:

Response:

As you can see, our response is now structured with a numbered list and CEO: prefix. If we wanted to parse this into a structured format, we could probably do some string parsing using regular expressions. However, this may be flaky given that each response isn't consistent. Further tweaks to the prompt
might inadvertently yield more variations in response structure too, breaking our automation. We're going to need a solid data structure, not just loosely
formatted text.

Prompting for structured data as XML

Instead, we can ask for a response in XML format. For example:

Prompt:

Response:

Now, we’re starting to get something more useful to automate! However, something that's not obvious here is that every time you run this, you'll get
different names for some of the XML tags. This is assuming your temperature is high enough as higher temperature leads to more creative samples and
more variations in phrasing. For example, I re-ran the same prompt and got tags named <response>, <companies> or <Companies> so that's still not reliable enough for automation. The final step is to describe the XML structure you want.

Prompt:

Response:

Perfect! Even with a high temperature; you can run this as many times as you want, and you'll always get a consistent XML structure you can parse using standard libraries from the language of your choice.

Tips and tricks

There's an obvious downside to using XML: it's verbose. So not only will you consume more tokens (increased costs) but responses also take longer. However,
for the high reliability requirements of parsing structure responses, this is a worthy trade-off. While I'm no expert on the internal workings of LLMs, but it appears logical that the verbose nature of XML plays to Claude's strengths.

If you use ask for a JSON formatted response, you notice that Claude will often (but not always) surround the response with markdown-style triple backticks and occasionally get invalid JSON responses too.

In order to reduce costs, it's worth pointing out that you can use abbreviated XML tag names. In the example above, I asked for each company to be wrapped in <comp> tags. It's not huge, but it means 2 or 3 less tokens used per company for the same result. With this in place, each
company in this XML is only about 2 or 3 tokens bigger than it's JSON equivalent.

Another benefit to using XML is that it's much easier to parse a partially streamed response. Say a user asks for a list of 100 companies, even with Claude’s speedy responses you'd still be looking at a loading spinner for over 60 seconds.

Instead, by leveraging Bedrock's InvokeModelWithResponseStream, you can do an easy string search to look for completed </comp> tags as you build up the streamed response, and then parse just that <comp> block in a data structure and send it onto the client. This is definitely a crude option, so you may also want to look into your runtime ecosystem to see if there's any partial XML processing libraries available.

Example: Parsing streamed XML in real time using Go

At Telescope, we've chosen Go as our language for building our API products. This means we have access to Go's excellent standard library which
supports real-time decoding of streamed XML. Here's a complete example of invoking a streaming chat completion with Amazon Bedrock and parsing the
stream using Go's built-in xml package.

Conclusion

Our journey with Claude on Amazon Bedrock revealed substantial benefits in designing prompts that maximize the effectiveness. In particular, using XML for structured data for greater accuracy. As essential to provide fast user experiences, we leveraged Bedrock's streaming capabilities. We provided an example of how you can partially parse structured response data and provide this to the user as soon as they are processed.

To learn more about Telescope, visit our website.

To get started with generative AI on AWS, please see resources:

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Site Terms, Privacy, and more.

Handling streaming and structured data using Amazon Bedrock

Introduction

Initial impressions

Prompting basics

Prompting for structured data as XML

Tips and tricks

Example: Parsing streamed XML in real time using Go

Conclusion

Comments