AWS Logo
Menu
Amazon Bedrock powered agents that can browse the internet

Amazon Bedrock powered agents that can browse the internet

... with 5 lines of code!

Shreyas Subramanian
Amazon Employee
Published Jan 23, 2025

Getting Started

Browser-use is a great tool to let your AI agents connect to the web. Connecting Browser-use with Bedrock lets you use models available via bedrock for agent use cases that need a connection to the web. Here's a link to browser-use. To get this to work with Amazon Bedrock:
First, install browser-use and dependencies:
In a terminal, do:
Basic imports and setup that is required:
Notice that here we will be using the cross region inference profile for Claude Sonnet 3-5 v2.
Next, define your task and run the agent. Here, we will be trying to find cheap flights from IAD to SEA on google flights. This instruction is defined in the task variable.
If you are not running this on a notebook, the agent will open up your default browser and you can see the agent navigate through the web.

High-Level Observations

  1. The agent interacts with a headless Chrome browser to perform the flight search
  2. It uses a three-part logging structure for each step:
  • Evaluation of current state
  • Memory/context retention
  • Next goal setting
  1. The agent uses indexed elements (13, 14, etc.) to interact with the webpage
  2. Each action is numbered (e.g., Action 1/5) showing planned sequences
  3. The controller provides feedback about UI changes ("Something new appeared")
  4. The agent adapts its strategy based on UI feedback
  5. Each step is clearly demarcated with "šŸ“ Step X"

LLM calls

The agent uses Claude for several critical functions:
  1. State Evaluation: After each action, it evaluates the current state of the webpage
  2. Memory Management: Maintains context about what's been completed and what's left
  3. Goal Setting: Determines the next logical step based on current state
  4. UI Adaptation: Recognizes and handles dynamic UI elements like autocomplete dropdowns
  5. Result Formatting: Structures the final output in a human-readable format
So every step needs 4 LLM calls: Evaluation of the observation [eval], what to store in [memory], setting the goal for the next step [next goal] and the Action. Here the action and action input is in one output [Action].
As an example, take a look at this step:
INFO [agent] šŸ‘ Eval: Success - IAD is entered and we're on the flight search page
INFO [agent] šŸ§  Memory: IAD is set as departure, need to enter SEA and dates Feb 7-26
INFO [agent] šŸŽÆ Next goal: Enter Seattle (SEA) as the destination
INFO [agent] šŸ› ļø Action 1/1: {"input_text":{"index":14,"text":"SEA"}}
The controller then takes over and does the actual action. Simplifying the action space to the index elements and a standard format is the reason this works well!
Let's take a deeper look into the logs of a simple task:

task = """Visit flights.google.com and find cheap flights from IAD to SEA from Feb 7th to 26th."""

Looks simple enough? Let's see what the agent does:
  • Initial Task Setup:
The agent begins with clear task understanding and uses a simple action to navigate. The controller confirms successful navigation.
  • First Form Fill Attempt:
the agent attempts to fill all fields at once but encounters an interruption after typing "IAD". The controller detects new UI elements (likely the airport autocomplete dropdown).
  • Handling Airport Selection:
Agent recognizes the autocomplete popup and adjusts its strategy to handle it.
  • Destination Input:
Agent confirms IAD is set and moves to inputting SEA, again encountering an autocomplete popup.
  • Handling SEA Selection:
Agent recognizes need to select SEA from dropdown and performs the selection.
  • Date Input and Search:
Agent proceeds to input dates after confirming airports are set.
  • Final Search and Results:
Agent successfully executes search, observes prices in calendar view, and provides a formatted summary with total cost.

Conclusion

This detailed log analysis reveals how Claude on Bedrock can be used as an autonomous web agent using the browser-use tool that successfully navigates the complexities of a modern website. The agent:
  1. Handles dynamic UI elements like autocomplete dropdowns and popups
  2. Maintains clear understanding of progress through the task
  3. Adapts when initial strategies don't work (throttles were not shown here, but it recovered from that as well)
  4. Uses indexed elements to interact with the page (which is the main reason this even works!)
  5. Successfully finds and reports flight prices. Validator checks, which are additional LLM calls help in progressing and concluding the task
In the next blog, we will dive deeper and observe how the agent navigates the web to complete slightly more complicated tasks and analyze failure points.
Ā 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments