
Learn Web Scraping with AWS Bedrock Agents
A beginner friendly guide to set up and deploy AWS Bedrock Agents for web scraping with Lambda, Streamlit, and Anthropic Claude.
- (PART 1) AWS Bedrock Agents Introduction
- (PART 2) Hands-on project to implement a simple Bedrock agent that web scrapes a URL provided by the user
1. Understand Tasks
- Takes complex requests from users
- Breaks them down into smaller, manageable steps
- Figures out what needs to be done in what order
- Can call APIs to get things done
- Access your company’s data when needed
- Execute multiple steps automatically
1. Instructions
- Like a manual that tells the agent what it can do
- Sets boundaries for the agent’s actions
- Defines its specific purpose
- The specific things an agent can do
- Usually connected to Lambda functions
- Example: searching a database, creating a ticket, or sending an email
- Reference information the agent can use
- Company documents, FAQs, policies
- Helps the agent give accurate responses
- API Schema (like webscrape-schema.json which will be used in the project below)
- Lambda function for business logic
- Action group configuration
- Parameters and response definitions
- RAG to retrieve accurate product information
- Agent capabilities to orchestrate the complete customer interaction, including checking inventory, processing returns, or updating customer records
1. I used the console route to create one action group to scrape web URL of my choice
2. I used CloudFormation to create streamlit interface on EC2 and I modified the original interface elements using Vim editor (via EC2 instance connect)
— Use Anthropic Claude 3.5 Sonnet as the core model for the agent.
— Add an action group that links the agent to a Lambda function, enabling web scraping.
2. Lambda Function Deployment
— Write a Python-based function to fetch and parse webpage content.
— Package dependencies like `urllib.request` and `BeautifulSoup` into a Lambda layer.
— Deploy the function and integrate it into the Bedrock agent.
3. Streamlit App on EC2
— Deploy a Streamlit app using a CloudFormation template.
— The app enables users to send queries, view scraped data, and explore the agent’s capabilities interactively.
Create and deploy a Lambda function to handle web scraping. The function will:
- Accept a URL input.
- Scrape the webpage using `urllib.request` and `BeautifulSoup`.
- Return the cleaned data in JSON format.
Add a Lambda layer for libraries not natively supported, like `BeautifulSoup`.
Define the agent’s behavior using an OpenAPI schema in json format and link it to the Lambda function through an action group.
- Use Bedrock Console to test out webscaping functionality

- Deploy a Streamlit app for a user-friendly interface.
- Update the app with agent credentials to enable query handling.
Test the agent’s functionality by providing URLs and viewing the scraped content on Streamlit interface.

- Reminder to clean up, clean up, clean up all your AWS resources! 🧹
- It cost me between 0.25 cents to 1 dollar for this project to utilize Bedrock (so don’t forget to clean up after your session) 💲
- I used Q developer in both VS Code and on AWS Console to understand the code, tweak or debug as needed 👩🏽💻
- You can use both my readme files and build on AWS's build-on-project readme file for a more detailed step-by-step approach (links in references section below)
- Build-on-AWS GitHub profile (https://github.com/build-on-aws/bedrock-agents-webscraper?tab=readme-ov-file#step-1-aws-lambda-function-configuration)
- ‘Ross and his jpeg’ website which was my muse to test out Bedrock Agent’s web scraping capabilities!
- Build-on-AWS bedrock webscraper project (https://github.com/build-on-aws/bedrock-agents-webscraper?tab=readme-ov-file#step-1-aws-lambda-function-configuration)
- Q Developer, for helping me debug and understand all the code artifacts!
- My implementation of this project- https://github.com/lulu3202/bedrock_web_crawler_agent
- My YT video on this content - https://www.youtube.com/watch?v=tKEu-K2YTTc
🌟🌟🌟 The expert in anything was once a beginner — Helen Hayes🌟🌟🌟