Step by Step to convert the RAG project to bash and run it in the CLI of an EC2 instance.

This time I bring you how to deploy a RAG project in an EC2 instance through programmatic access in CLI, this is a simpler way when you want to develop this type of work.

Published Jun 12, 2024
Engineer Enrique Aguilar Martinez
AWS Community Builders
This article would not have been possible without the help of Sree Deekshitha Yerra
This time I bring you how to deploy a RAG project in an EC2 instance through programmatic access in CLI, this is a simpler way when you want to develop this type of work.
1. Installation of Dependencies:
# Install PyTorch
sudo apt-get update
sudo apt-get install python3-pip
pip install pytorch
# Install PyTorch with CUDA (If available on your instance)
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# Install Transformers
pip install transformers
# Install Dataset (To load the Wikipedia data set)
pip install datasets
(Optional) Combine files into a single module:
  • If you prefer to have a single Python module, you can combine the code from all three files into a single file called rag_project.py.
Try the files:
  • You can test the functionality of each file or the combined module by running the code from the CLI. For example, to test the retrieval.py file, you can:
python retrieval.py
Remember to replace placeholder implementations with your actual retrieval, response generation, and sorting logic.
Create the first Python file (retrieval.py):
  • Use a text editor like nano or vim to create the first file called retrieval.py. Below is an example of a basic recovery function.
2. Creating Python files:
Create three separate Python files for each part of the RAG project:
a) retriever.py
from datasets import load_dataset
def retrieve(query, dataset, top_k=5):
# Placeholder retrieval function
return dataset.select(range(top_k))
if __name__ == "__main__":
dataset = load_dataset('wikipedia', '20220301.en')['train']
retrieved_docs = retrieve("What is the capital of India?", dataset)
print(retrieved_docs)
b) generator.py
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model_name = "t5-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
def generate_answer(context, question):
input_text = f"context: {context} question: {question}"
input_ids = tokenizer.encode(input_text, return_tensors="pt")
generated_ids = model.generate(input_ids)
generated_text = tokenizer.decode(generated_ids[0],
skip_special_tokens=True)
return generated_text
if __name__ == "__main__":
context = "New Delhi is the capital of India."
question = "What is the capital of India?"
generated_text = generate_answer(context, question)
print(generated_text)
c) ranker.py
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
from transformers import AutoModelForSequenceClassification
model_name = "sentence-transformers/all-mpnet-base-v2" # Sentence embedding model
model = AutoModelForSequenceClassification.from_pretrained(model_name)
def rank_responses(responses, query_embedding):
response_embeddings = [model.encode(resp) for resp in responses]
similarities = cosine_similarity([query_embedding], response_embeddings)
ranked_indices = np.argsort(similarities[0])[::-1]
return [responses[i] for i in ranked_indices]
if __name__ == "__main__":
responses = ["New Delhi is the capital of India","Paris is the capital of France.", "The capital of Germany is Berlin."]
query_embedding = model.encode("What is the capital of India?")
ranked_responses = rank_responses(responses, query_embedding)
print(ranked_responses[0])
3. Creating the run_rag.sh bash script:
#!/bin/bash
# Run retrieval
python retriever.py > retrieved_docs.txt
# Read the retrieved responses
responses=$(cat retrieved_docs.txt)
# Run the generator for each response
for response in $responses; do
python generator.py --context "$response" --question "$1" >> generated_answers.txt
donated
# Read the generated responses
generated_answers=$(cat generated_answers.txt)
# Run the rank
python ranker.py --responses "$generated_answers" --query "$1"
# Clean temporary files
rm retrieved_docs.txt generated_answers.txt
4. RAG project execution:
1. Upload the Python files and bash script to your EC2 instance.
2. Assign execute permissions to the bash script:
3. chmod +x run_rag.sh
Run the script with the question as argument:
./run_rag.sh "What is the capital of Mexico?"
5. Output:
The script will print the response with the highest ranking.
Additional considerations:
  • Downloading the Wikipedia dataset may take time. You can download it beforehand and upload it to your EC2 instance.
  • You can modify the bash script to perform other operations, such as saving the output to a file or sending a notification.
  • You may need to install dependencies for the sentence embedding model (sentence-transformers) in ranker.py.
Remember that this is a basic example, and you can customize it according to your needs.
 

Comments