Step by Step to convert the RAG project to bash and run it in the CLI of an EC2 instance.

Engineer Enrique Aguilar Martinez

AWS Community Builders

This article would not have been possible without the help of Sree Deekshitha Yerra

This time I bring you how to deploy a RAG project in an EC2 instance through programmatic access in CLI, this is a simpler way when you want to develop this type of work.

1. Installation of Dependencies:

# Install PyTorch

sudo apt-get update

sudo apt-get install python3-pip

pip install pytorch

# Install PyTorch with CUDA (If available on your instance)

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Install Transformers

pip install transformers

# Install Dataset (To load the Wikipedia data set)

pip install datasets

(Optional) Combine files into a single module:

If you prefer to have a single Python module, you can combine the code from all three files into a single file called rag_project.py.

Try the files:

You can test the functionality of each file or the combined module by running the code from the CLI. For example, to test the retrieval.py file, you can:

python retrieval.py

Remember to replace placeholder implementations with your actual retrieval, response generation, and sorting logic.

Create the first Python file (retrieval.py):

Use a text editor like nano or vim to create the first file called retrieval.py. Below is an example of a basic recovery function.

2. Creating Python files:

Create three separate Python files for each part of the RAG project:

a) retriever.py

from datasets import load_dataset

def retrieve(query, dataset, top_k=5):

# Placeholder retrieval function

return dataset.select(range(top_k))

if __name__ == "__main__":

dataset = load_dataset('wikipedia', '20220301.en')['train']

retrieved_docs = retrieve("What is the capital of India?", dataset)

print(retrieved_docs)

b) generator.py

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model_name = "t5-base"

tokenizer = AutoTokenizer.from_pretrained(model_name)

model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

def generate_answer(context, question):

input_text = f"context: {context} question: {question}"

input_ids = tokenizer.encode(input_text, return_tensors="pt")

generated_ids = model.generate(input_ids)

generated_text = tokenizer.decode(generated_ids[0],

skip_special_tokens=True)

return generated_text

if __name__ == "__main__":

context = "New Delhi is the capital of India."

question = "What is the capital of India?"

generated_text = generate_answer(context, question)

print(generated_text)

c) ranker.py

from sklearn.metrics.pairwise import cosine_similarity

import numpy as np

from transformers import AutoModelForSequenceClassification

model_name = "sentence-transformers/all-mpnet-base-v2" # Sentence embedding model

model = AutoModelForSequenceClassification.from_pretrained(model_name)

def rank_responses(responses, query_embedding):

response_embeddings = [model.encode(resp) for resp in responses]

similarities = cosine_similarity([query_embedding], response_embeddings)

ranked_indices = np.argsort(similarities[0])[::-1]

return [responses[i] for i in ranked_indices]

if __name__ == "__main__":

responses = ["New Delhi is the capital of India","Paris is the capital of France.", "The capital of Germany is Berlin."]

query_embedding = model.encode("What is the capital of India?")

ranked_responses = rank_responses(responses, query_embedding)

print(ranked_responses[0])

3. Creating the run_rag.sh bash script:

#!/bin/bash

# Run retrieval

python retriever.py > retrieved_docs.txt

# Read the retrieved responses

responses=$(cat retrieved_docs.txt)

# Run the generator for each response

for response in $responses; do

python generator.py --context "$response" --question "$1" >> generated_answers.txt

donated

# Read the generated responses

generated_answers=$(cat generated_answers.txt)

# Run the rank

python ranker.py --responses "$generated_answers" --query "$1"

# Clean temporary files

rm retrieved_docs.txt generated_answers.txt

4. RAG project execution:

1. Upload the Python files and bash script to your EC2 instance.

2. Assign execute permissions to the bash script:

3. chmod +x run_rag.sh

Run the script with the question as argument:

./run_rag.sh "What is the capital of Mexico?"

5. Output:

The script will print the response with the highest ranking.

Additional considerations:

Downloading the Wikipedia dataset may take time. You can download it beforehand and upload it to your EC2 instance.
You can modify the bash script to perform other operations, such as saving the output to a file or sending a notification.
You may need to install dependencies for the sentence embedding model (sentence-transformers) in ranker.py.

Remember that this is a basic example, and you can customize it according to your needs.

Site Terms, Privacy, and more.

Step by Step to convert the RAG project to bash and run it in the CLI of an EC2 instance.

This time I bring you how to deploy a RAG project in an EC2 instance through programmatic access in CLI, this is a simpler way when you want to develop this type of work.

Comments