How to develop a tag-based web searcher using AI

Preview

technology used

Streamlit
Langchain
Bedrock (anthropic.claude-instant-v1)
Opensearch

Purpose of development

In recent years, as the number of websites and web pages has increased exponentially, it is becoming increasingly difficult to effectively manage the web pages stored by individuals. In particular, when a large number of web pages are stored, it is difficult to search and use them because it is difficult to check the main information and contents of each page one by one. Accordingly, this service extracts and provides key information and appropriate tags on the corresponding page through an AI algorithm based on the web page URL entered by the user. Specifically, the overall contents of the web page are summarized, and representative keywords related to the contents are extracted as tags. Since the information extracted in this way is stored as a brief description and tag of each page, users can use it much more efficiently in the process of searching and managing pages. In particular, it has the advantage of being able to search related pages immediately when a specific tag is selected. This service is an AI-based tagging solution for managing users' web pages, and is expected to be useful to individual users or companies that store large amounts of web pages.

Brief description of the feature

Users register by entering the URL of the webpage they want to bookmark.
When a page is registered and refreshed, the AI analyzes the contents of the page and automatically generates appropriate tags.
Users can click on the generated tag to search for other pages with the same tag.

These tag-based searches allow you to quickly browse pages with similar content.
In addition, page-specific tags allow you to grasp the content and characteristics of the saved page at a glance.

Step1. Configure AWS credentials

Create a new IAM user.
The IAM User you created connects its policy (Amazon BedrockFullAccess, Amazon OpenSearchServiceFullAccess).
Perform credentials through aws configure using the users created above through the CMD window.
At this time, the region must select the region in which the Opensearch is generated.

Step2. Configure opensearch

Connect Opensearch dashboard, Management > Security > Roles > search all_access
Click manage mapping in all_access > Mapped users
Map the ARNs of IAM users created in the previous step,

Connect Opensearch dashboard > Management > Click Devtools
Opensearch should set the index as follows.

In Opensearch, insert the following example documents.

Step3. Code settings

Run IDE similar to Pycharm.
Install the required libraries as follows.

Enter the code below.
Change the region and host according to your environment in the code below.

If the code above has been entered, use the command below to execute it.

streamlit run <your python file>.py

You can successfully check the page below.

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Site Terms, Privacy, and more.