Find Amazon Bedrock models for immediate on-demand invocation

Ever found yourself picking an AI model on Bedrock, only to realize you can't invoke it? You're not alone. In this post, I'm going to show you a few lines of code that filter exactly the models you're looking for.

Dennis Traub
Amazon Employee
Published Mar 1, 2024
Last Modified Mar 20, 2024
One of the great features of Amazon Bedrock is the growing amount of available foundation models accessible via a single API. You've got the flexibility to use them straight out of the box or fine-tune them to meet your unique needs.
However, not all of the models can be used in every context. Some generate text, others generate images or embeddings. Some are available for on-demand generation, while others are specialized for provisioned throughput. And some can be fine-tuned, while others allow continued pre-training.
And with this wide-array of available models, I just ran into an error when experimenting with a new one, that - as it turned out - I couldn't even invoke.
This took me some time to troubleshoot, but I eventually found the issue, and here's a quick write-up, just in case it ever happens to you as well.

Exploring the available models

Let's start by having a look at all available models by calling Bedrock's ListFoundationModels API:
At the time of writing, this script returned 44 models. And if you look at the individual model ids, you will find some interesting details. Let's have a look at these three models by Anthropic, indicated by their ids:
If you know what to look for, the ids are easy to read: The first one is Claude v2, the second and third are Claude v2.1, with one difference though: The latter has a large context window of 200k tokens.

Invoking the models

Now let's see if there are any differences by invoking them, one after the other, using a very simple prompt:
When running the script, you'll see that the first two work as expected. In fact, given this very simple prompt, both versions return the exact same result.
The third one, however, throws an error due to an apparently invalid model id:
Can you see that very long validation pattern in the last line above? I spent quite some time, so you don't have to, and I finally found the culprit: The last part of the id, the one defining the context window size (:200k), is not allowed.

Troubleshooting the issue

Now, why does it accept some ids but not others... 🤔
Let's see if we can find the anything useful by getting some more information about that specific model:
This returns a lot of information about the model, and actually includes a small, but critical detail for our use-case, right there on line 10 👇
Turns out the only supported inference type for the model is PROVISIONED, i.e. it is only available with provisioned throughput, and can't be used for on-demand invocations.
If we run the same script but with the other two models, welld get "inferenceTypesSupported": ["ON_DEMAND"] instead, indicating that they indeed accept on-demand invocations, as they already did in our example above.

Filtering Models for On-Demand Inference

Now, to avoid choosing models that don't fit the use-case and potentially running into an error down the line, how can we get a list of only those models that support on-demand inference?
This is actually quite simple, by adding a filter to the ListFoundationModels call.
Let's modify our original script, adding byInferenceType="ON_DEMAND" (line 6), and see what happens:
When running this script, we'll get a refined list (27 models at the time I'm writing this), excluding those not supporting on-demand inference like anthropic.claude-v2:1:200k.
So, to make sure you get a list of models tailored to your specific use-case, make sure to leverage the respective request parameters to filter the response of the ListFoundationModels API.

Want to learn more?

If you enjoyed this post or learned something new, please give it a like. If you want to learn more about this and other topics, check out the following resources:

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.