Build a Full Stack Large Language Model Powered Chatbot: Extract Insights from Unstructured Documents
Use AWS and Opensource tools to build your own car-savvy AI assistant.
Diagram: Converting From PDF to Embeddings in Vector Database
Step 1 - Deploy GPT-J 6B FP16 Embedding Model With Amazon SageMaker JumpStart
Step 2 - Deploy the Flan T5 XXL LLM With Amazon SageMaker JumpStart
Step 3 - Check the Status of the Deployed Model Endpoints
Step 4 - Create the Amazon OpenSearch Cluster
Step 5 - Build the Document Ingestion and Embedding Workflow
Startup Script and File Structure
Build and Publish the Docker Image
Build Infrastructure for Event-Driven PDF Embeddings Workflow
Step 6 - Deploy Real-Time Q&A API With Llm Contextual Support
Diagrammatic Overview of Realtime Q & A Support From T5-Flan-XXL LLM
Build and Publish the Docker Image for the API
Build the CloudFormation Stack for Hosting the API Endpoint
Step 7 - Create and Deploy the Website with the Integrated Chatbot
About | |
---|---|
✅ AWS experience | 200 - Intermediate |
⏱ Time to complete | 60 minutes |
💰 Cost to complete | ~9$ an hour. To take advantage of AWS free tier for services like (Amazon OpenSearch)[https://aws.amazon.com/opensearch-service/pricing/] use t2.small.search or t3.small.search instance. Additionally you can further reduce costs, by using a t5-flan small / t5-flan-xl LLM instead of a t5-flan-xxl LLM for experimentation. |
🧩 Prerequisites | 1. You should have an active AWS account. If you don't have one, you can sign up on the AWS website 2. Make sure you have the AWS Command Line Interface (CLI) installed on your local machine, and it should be properly configured with the necessary credentials and default region. You can configure it using the aws configure command. 3. Download and install Docker Engine. Follow the installation instructions for your specific operating system. |
💻 Code Sample | Code sample used in tutorial on GitHub |
- The very first step to provide in-context learning is to ingest the PDF document and convert it into text chunks, generate vector representations of these text chunks called "embeddings" and finally store these embeddings in a vector database.
- Vector databases enable us to perform a "similarity search" against the text embeddings that are stored in it.
- Amazon SageMaker JumpStart provides one click executable solution templates for setting up the infrastructure for pre-trained, open-source models. We will be using Amazon SageMaker JumpStart to deploy the Embedding Model and the Large Language Model.
- Amazon OpenSearch is a search and analytics engine that can search for nearest neighbors of points in a vector space, making it suitable as a vector database.
Infrastructure/opensearch-vectordb.yaml
. Execute the aws cloudformation create-stack
command as follows to create the Amazon OpenSearch Cluster. Before executing the command we have to replace <username>
and <password>
with our own values.- Chunk the text from the PDF document.
- Convert the text chunks into embeddings (vector representations).
- Store the embeddings in Amazon OpenSearch.
create-embeddings-save-in-vectordb\startup_script.py
file. This Python script, startup_script.py
, performs several tasks related to document processing, text embedding, and insertion into an Amazon OpenSearch cluster. The script downloads the PDF document from the Amazon S3 bucket, the loaded document is then split into smaller text chunks. For each chunk, the text content is sent to the GPT-J 6B FP16 Embedding model endpoint deployed on Amazon SageMaker (retrieved from the TEXT_EMBEDDING_MODEL_ENDPOINT_NAME environment variable) to generate text embeddings. The generated embeddings, along with other information are then inserted into the Amazon OpenSearch index. The script retrieves configuration parameters and credentials from environment variables, making it adaptable for different environments. This script is intended to be run within a Docker container for consistent execution.startup_script.py
, we proceed to build the Dockerfile from the create-embeddings-save-in-vectordb
folder and push the image to Amazon Elastic Container Registry (Amazon ECR). Amazon Elastic Container Registry (Amazon ECR) is a fully managed container registry offering high-performance hosting, so we can reliably deploy application images and artifacts anywhere. We will use the AWS CLI and Docker CLI to build and push the Docker Image to Amazon ECR. Replace <AWS Account Number>
with the correct AWS Account Number in all the commands below.- Retrieve an authentication token and authenticate the Docker client to the registry in the AWS CLI.
- Build your Docker image using the following command.
- After the build completes, tag the image so we can push the image to this repository:
- Run the following command to push this image to the newly created Amazon ECR repository:
Infrastructure/fargate-embeddings-vectordb-save.yaml
. We will need to override the parameters to match the AWS environment.aws cloudformation create-stack
command:- BucketName: This parameter represents the Amazon S3 bucket where we will drop the PDF documents.
- VpcId and SubnetId: These parameters specify where the Fargate task will run.
- ImageName: This is the name of the Docker Image in your Amazon Elastic Container Registry (ECR) for save-embedding-vectordb.
- TextEmbeddingModelEndpointName: Use this parameter to provide the name of the Embedding Model deployed on Amazon SageMaker in Step 1.
- VectorDatabaseEndpoint: Specify the Amazon OpenSearch domain endpoint url.
- VectorDatabaseUsername and VectorDatabasePassword: These parameters are for the credentials needed to access the Amazon OpenSearch Cluster created in Step 4.
- VectorDatabaseIndex: Set the name of the index in Amazon OpenSearch where the PDF Document embeddings will be stored.
startup-script.py
file, responsible for generating embeddings in Amazon OpenSearch under a new OpenSearch index named carmanual
.carmanual
index with embeddings as shown below.RAG-langchain-questionanswer-t5-llm
folder of our GitHub repository, with the core logic located in the app.py
file. This Flask-based application defines a /qa
route for question-answering.TEXT_EMBEDDING_MODEL_ENDPOINT_NAME
environment variable, pointing to the Amazon SageMaker endpoint, to transform the question into numerical vector representations known as embeddings
. These embeddings capture the semantic meaning of the text.carmanual
based on the embeddings derived from user queries. Following this step, the API calls the T5 Flan LLM endpoint, indicated by the environment variable T5FLAN_XXL_ENDPOINT_NAME
, also deployed on Amazon SageMaker. The endpoint utilizes the retrieved text chunks from Amazon OpenSearch as context
to generate responses. These text chunks, obtained from Amazon OpenSearch, serve as valuable context for the T5 Flan LLM endpoint, allowing it to produce meaningful responses to user queries. The API Code uses LangChain to orchestrate all these interactions.app.py
, we proceed to build the Dockerfile from the RAG-langchain-questionanswer-t5-llm
folder and push the image to Amazon ECR. We will use the AWS CLI and Docker CLI to build and push the Docker Image to Amazon ECR. Replace <AWS Account Number>
with the correct AWS Account Number in all the commands below.- Retrieve an authentication token and authenticate the Docker client to the registry in the AWS CLI.
- Build the Docker image using the following command.
- After the build completes, tag the image so we can push the image to this repository:
- Run the following command to push this image to the newly created Amazon ECR repository:
Infrastructure/fargate-api-rag-llm-langchain.yaml
. We will need to override the parameters to match the AWS environment. Here are the key parameters to update in the aws cloudformation create-stack
command:- DemoVPC: This parameter specifies the Virtual Private Cloud (VPC) where your service will run.
- PublicSubnetIds: This parameter requires a list of public subnet IDs where your load balancer and tasks will be placed.
- Imagename: Provide the name of the Docker Image in your Amazon Elastic Container Registry (ECR) for qa-container.
- TextEmbeddingModelEndpointName: Specify the endpoint name of the Embeddings model deployed on Amazon SageMaker in Step 1.
- T5FlanXXLEndpointName: Set the endpoint name of the T5-FLAN endpoint deployed on Amazon SageMaker in Step 2.
- VectorDatabaseEndpoint: Specify the Amazon OpenSearch domain endpoint url.
- VectorDatabaseUsername and VectorDatabasePassword: These parameters are for the credentials needed to access the OpenSearch Cluster created in Step 4.
- VectorDatabaseIndex: Set the name of the index in Amazon OpenSearch where your service data will be stored. The name of the index that we have used in this example is carmanual.
ecs-questionanswer-llm
stack. In this tab, we will find essential information, including the API endpoint. Below is an example of what the output will resemble:homegrown_website_and_bot
.We will use the AWS CLI and Docker CLI to build and push the Docker Image to Amazon ECR for the front end website. Replace <AWS Account Number>
with the correct AWS Account Number in all the commands below.- Retrieve an authentication token and authenticate the Docker client to the registry in the AWS CLI.
- Build the Docker image using the following command:
- After the build completes, tag the image so we can push the image to this repository:
- Run the following command to push this image to the newly created Amazon ECR repository:
Infrastructure\fargate-website-chatbot.yaml
file. We will need to override the parameters to match the AWS environment. Here are the key parameters to update in the aws cloudformation create-stack
command:- DemoVPC: This parameter specifies the Virtual Private Cloud (VPC) where your website will be deployed.
- PublicSubnetIds: This parameter requires a list of public subnet IDs where your load balancer and tasks for the website will be placed.
- Imagename: Provide the name of the Docker Image in your Amazon Elastic Container Registry (ECR) for the website.
- QUESTURL: Specify the endpoint URL of the API deployed in Step 6. It is of the format
http://<DNS Name of API ALB>/qa
CloudFormation Outputs
tab for the ecs-website-chatbot
stack. In this tab we will find the DNS name of the Application Load Balancer (ALB) associated with the front end. Below is an example of what the output will resemble:- Amazon SageMaker: As you progress with SageMaker, familiarize yourself with additional algorithms it offers.
- Amazon OpenSearch: Familiarize yourself with K-NN algorithm and other distance algorithms
- Langchain: LangChain is a framework designed to simplify the creation of applications using LLMs.
- Embeddings: An embedding is a numerical representation of a piece of information, for example, text, documents, images, audio, etc.
- Amazon SageMaker JumpStart: SageMaker JumpStart provides pre-trained, open-source models for a wide range of problem types to help you get started with machine learning.
- Log in to the AWS CLI.Make sure you have the AWS CLI properly configured with the required permissions to perform these actions.
- Delete the PDF file from the Amazon S3 bucket by executing the following command. Replace your-bucket-name with the actual name of your Amazon S3 bucket and adjust the path to your PDF file as needed.
- Delete the CloudFormation stacks. Replace the stack names with the actual names of your CloudFormation stacks.
- Delete SageMaker endpoints. Replace
endpoint-name-1
andendpoint-name-2
with the names of your SageMaker endpoints.
- Deployed GPT-J 6B FP16 Embedding Model.
- Deployed the Flan T5 XXL LLM.
- Created an Amazon OpenSearch cluster.
- Built a document ingestion and embedding workflow.
- Deployed a real-time Q&A API with LLM support.
- Created and deployed a website with an integrated chatbot.
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.