
Adding LLM capabilities to Crawl4AI in 30 Minutes
Learn how AI tools can speed up your development process
1
2
3
4
5
6
7
8
9
10
11
12
13
from crawl4ai import WebCrawler
# Create an instance of WebCrawler
crawler = WebCrawler()
# Warm up the crawler (load necessary models)
crawler.warmup()
# Run the crawler on a URL
result = crawler.run(url="https://aws.amazon.com/blogs/aws/category/artificial-intelligence/generative-ai/")
# Print the extracted content
print(result.markdown)
extraction_strategy.py
file contained the perform_completion_with_backoff
function, which seemed to be where the LLM was being invoked.1
2
3
4
5
6
7
8
9
10
11
12
13
result = crawler.run(
url=url,
word_count_threshold=1,
extraction_strategy= LLMExtractionStrategy(
provider= "anthropic/claude-3.5-sonnet", api_token = os.getenv('ANTHROPIC_API_KEY'),
schema=OpenAIModelFee.schema(),
extraction_type="schema",
instruction="""From the crawled content, extract all mentioned model names along with their fees for input and output tokens.
Do not miss any models in the entire content. One extracted model JSON format should look like this:
{"model_name": "claude-3.5-sonnet", "input_fee": "US$15.00 / 1M tokens", "output_fee": "US$3.00 / 1M tokens"}."""
),
bypass_cache=True,
)
I need to update the repo to allow for Amazon Bedrock to be a provider. The idea is that we will have bedrock as a provide and a model_id. Then we will have the code us that for making an request in with a new "perform_completion_with_backoff" function for bedrock. This would use this function for the api request. ...
Good start, the thing with Bedrock is that there are always new models, so lets not constrain it by "hard coding" the model e.g bedrock/anthropic.claude-v2:1
Also boto3 will get the creds automatically in the environment so lets not do os.getenv("AWS_ACCESS_KEY_ID")
"content": "cannot access local variable 'content' where it is not associated with a value"
1
2
3
4
if self.provider.startswith("bedrock"):
content = response
else:
content = response.choices[0].message.content
- Rapid Integration in Unfamiliar Territory: In just 30 minutes, I added significant new functionality to a codebase I had never worked with before. This dramatic speed of integration opens up new possibilities for quick prototyping and feature additions, even when facing unfamiliar code.
- AI as a Coding Partner, Not a Replacement: Amazon Q /dev served as an intelligent assistant, handling initial implementation details. However, my developer expertise was crucial for understanding where to integrate the code and how to debug issues. This synergy between AI assistance and human oversight represents a new paradigm in coding efficiency.
- Accelerated Learning and Adaptation: By working with AI-generated code in a new codebase, I gained insights into both Crawl4AI's structure and leveraging Amazon Bedrock's API. This process not only solved an immediate problem but also served as a powerful learning tool, demonstrating how AI can help developers quickly adapt to new technologies and codebases.
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.