Estimating AWS workload cost with GenAI

Estimating the cost of running a workload on AWS is one of the most common things I do as a Solutions Architect. Once I help a customer scope out how to design a workload, "how much will it cost" is the natural next question.

Estimating cost often requires judgement and prior experience. Often the inputs are not completely defined up front; we may not know for sure what the actual production volume will be, or exactly which instance type to pick for a database. Frequently we need to evaluate several different options for building a workload, each of which has a different cost structure. And finally, we need to translate the workload inputs into the pricing levers that AWS services use. For example, if a web site has 1,000 visitors per day, we need to translate that into the number of calls to the Lambda function that implements the site's serving logic.

There are a number of online calculators, including the AWS calculator, but I prefer to build a spreadsheet. With a spreadsheet, or some similar tool like a Jupyter notebook, I can easily run and compare different scenarios. I can clearly identify the key inputs and assumptions, and show how they affect all the AWS services used. I have a standard spreadsheet template I use, with one tab for each service, plus a summary tab, and a chart showing the cost grouped by functional area.

In order to make creating the spreadsheet a little bit easier and more consistent, I built a GenAI workflow that uses an LLM to help me build the spreadsheet. The input to the workflow is just some description of the workload architecture, with as much detail as we have, including any information about the projected usage. Then the workflow continues as shown in the flowchart below, with most steps representing one call to an LLM.

Identifying specific services

The first step is to identify the specific AWS services used. In order to do this, I made a call to the Claude 3 Sonnet model in Amazon Bedrock, using this prompt:

Locating public pricing

The next step is finding the public pricing page for each AWS service. We do this with a simple call to a web search tool. Once we identify the URL, we just programmatically grab the text from the pricing page.

Conversation about each service

The next steps are performed in a conversation loop with Claude 3 Sonnet, where we keep extending the conversation by adding new messages using the Converse API. That preserves the context of the conversation flow about a specific service.

We ask the LLM to identify the specific inputs we need to estimate the cost for a specific AWS service, using the information from the public pricing page.

Next, we add a new message to the conversation, asking Claude to pull out any information from the workload description that we can use to fill in those pricing inputs.

Now we ask Claude to see if it can estimate any inputs for which we don't have direct data, by using other information in the workload description. This proved to be a key step in having the model do the translation from the workload description to specific pricing levers.

The final step in this conversation has the LLM start creating the output spreadsheet. We don't really want to ask the LLM to do math; Excel can do that for us. Rather, we want it to list very discrete pieces of information, like the unit cost and quantity. We also want to indicate the level of confidence we have in the estimate, so we know where we need to dig deeper to get more precision in our estimate.

Review

At this point we have a list of cost line items. We are ready to do an initial review; we ask Claude to score the output along four dimensions.

Sample output

At this point we have data that we can store into a spreadsheet using a python library. It is not a finished product, but ideally it has done about 80% of the work. Then we can look at the areas where the estimate is low confidence, or just seems off somehow, and dig a little deeper.

Here's an example of the line items for using EFS in one example workload.

And here's the example score:

Lessons learned

You'll note that I used very granular steps in the workflow. Rather than asking Claude to estimate the cost all at once, I tackle on service at a time. For each service, I go step-by-step, identifying the pricing inputs, reviewing for missing data, and so on.

Telling Claude to specifically try to estimate missing data is also a key step. There's often a lot of ambiguity in a workload description, and we want to make some logical assumptions to make progress.

Finally, the output clearly identifies where we don't have high confidence in the estimate. That's important for the sake of transparency: we want the ultimate consumer of this spreadsheet to be aware that it's just an estimate, and some areas need more refinement.

For the examples I tried, compared to estimates I had built manually, I found that this approach got me into the right ballpark. It didn't match what I did exactly; it sometimes made different assumptions, and more helpfully, it identified some things that I overlooked the first time.

More generally, I think this is an interesting example of a manually-intensive workflow that GenAI can help automate, even in the face of considerable ambiguity and complexity.

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Site Terms, Privacy, and more.