
How to build a ChatGPT-Powered AI tool to learn technical things fast
A step-by-step guide to setting up a free ML environment, leveraging language models and ChatGPT APIs to extract insights from YouTube videos, and empowering yourself to learn faster and more efficiently like never before!
About | ||
---|---|---|
✅ AWS Level | Intermediate - 200 | |
⏱ Time to complete | 30 minutes | |
💰 Cost to complete | Free when using the OpenAI API credit or less than $0.10 | |
🧩 Prerequisites | - Amazon SageMaker Studio Lab Account - Foundational knowledge of Python | |
📢 Feedback | Any feedback, issues, or just a 👍 / 👎 ? | |
⏰ Last Updated | 2023-07-24 |
- How to set up free ML Dev Environment
- How to utilize pretrained open-source ML models
- How to use ChatGPT APIs
- Part 1 - Setup: SageMaker Studio Lab and OpenAI API keys
- Part 2 - Obtaining a YouTube video transcript
- Part 3 - Summarizing and translating a transcript using ML models
- Part 4 - Extracting steps and creating a quiz using ChatGPT APIs
Note: We will be using free resources in this tutorial. The only potential cost that you may incur is for utilizing ChatGPT APIs if you already consumed all free credits - in which case it will cost a few cents. When you create an OpenAI account, you will be given $5 to use within the first 3 months. This is enough to run hundreds of API requests.
Request free account
. Fill in the required information in the form and submit your request. You will receive an email to verify your email address. Follow the instructions in the email.Please note that your account request needs to be approved before you can register for a Studio Lab account. The review process typically takes up to 5 business days. Once your account request is approved, you will receive an email containing a link to the Studio Lab account registration page. This link will remain active for 7 days after your request is approved.
Create new secret key
. Provide a name, copy the key and save it. You won’t be able to access the key again!Please note that OpenAI currently offers a $5 credit for new users, allowing you to start experimenting with their APIs at no cost. This credit is available for use during the first 3 months of your account. After the initial 3-month period, the pricing will transition to a pay-as-you-go model. To get detailed information about the rates and pricing structure, I recommend visiting the pricing page on the OpenAI website.
Model | Input | Output |
---|---|---|
4K context | $0.0015 / 1K tokens | $0.002 / 1K tokens |
16K context | $0.003 / 1K tokens | $0.004 / 1K tokens |
learn-with-ai.ipynb
.Shift + Enter
or click the Play
button at the top to execute it.1
2
3
4
5
6
7
8
9
10
#installing libraries
!pip install python-dotenv
!pip install openai
!pip install youtube_dl
!pip install youtube_transcript_api
!pip install torchaudio
!pip install sentencepiece
!pip install sacremoses
!pip install transformers
1
2
3
4
5
6
7
8
9
#importing dependencies
import re
from youtube_transcript_api import YouTubeTranscriptApi
import torch
import torchaudio
import openai
import textwrap
from transformers import pipeline
youtube_url
variable. To get a YouTube video url, copy the URL up to the "&" sign, as shown in the screenshot below.Note: I recommend starting with a video that is under 30 minutes. This will allow you to complete the tutorial more quickly, as executing commands for longer videos will take more time.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Specify the YouTube video URL
youtube_url = "https://www.youtube.com/watch?v=b9rs8yzpGYk"
# Extract the video ID from the URL using regular expressions
match = re.search(r"v=([A-Za-z0-9_-]+)", youtube_url)
if match:
video_id = match.group(1)
else:
raise ValueError("Invalid YouTube URL")
# Get the transcript from YouTube
transcript = YouTubeTranscriptApi.get_transcript(video_id)
# Concatenate the transcript into a single string
transcript_text = ""
for segment in transcript:
transcript_text += segment["text"] + " "
print(transcript_text)
YouTubeTranscriptApi.get_transcript(video_id)
method to retrieve the YouTube transcript using the YouTube API. This method provides accurate and official captions associated with the video.model_checkpoint
variable in the code below.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
from transformers import pipeline
# Replace this with your own checkpoint
model_checkpoint = "Helsinki-NLP/opus-mt-en-es"
translator = pipeline("translation", model=model_checkpoint)
# Define the maximum sequence length
max_length = 512
# Split the input text into smaller segments
segments = [transcript_text[i:i+max_length] for i in range(0, len(transcript_text), max_length)]
# Translate each segment and concatenate the results
translated_text = ""
for segment in segments:
result = translator(segment)
translated_text += result[0]['translation_text']
print(translated_text)
transcript_text
variable with the translated_text
variable that contains the translated text. By applying the summarization model to the transcript, we can generate a concise summary of the video's content.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
from transformers import pipeline, AutoTokenizer
# Instantiate the tokenizer and the summarization pipeline
tokenizer = AutoTokenizer.from_pretrained('stevhliu/my_awesome_billsum_model')
summarizer = pipeline("summarization", model='stevhliu/my_awesome_billsum_model', tokenizer=tokenizer)
# Define chunk size in number of words
chunk_size = 200 # you may need to adjust this value depending on the average length of your words
# Split the text into chunks
words = transcript_text.split()
chunks = [' '.join(words[i:i+chunk_size]) for i in range(0, len(words), chunk_size)]
# Summarize each chunk
summaries = []
for chunk in chunks:
# Summarize the chunk
summary = summarizer(chunk, max_length=100, min_length=30, do_sample=False)
# Extract the summary text
summary_text = summary[0]['summary_text']
# Add the summary to our list of summaries
summaries.append(summary_text)
# Join the summaries back together into a single summary
final_summary = ' '.join(summaries)
print(final_summary)
openai.api_key
variable in your code.Note: I recommend utilizing the OpenAI Playground to further explore and experiment with the OpenAI API models. The OpenAI Playground is a user-friendly web-based tool that allows you to test prompts and gain familiarity with the API's functionalities. It provides an interactive environment to fine-tune your prompts and observe the model's responses.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
def split_text_into_chunks(text, max_chunk_size):
return textwrap.wrap(text, max_chunk_size)
openai.api_key = "provide your key here"
max_chunk_size = 4000
transcript_chunks = split_text_into_chunks(transcript_text, max_chunk_size)
summaries = ""
for chunk in transcript_chunks:
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo-16k",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": f"{chunk}\n\nCreate short concise summary"}
],
max_tokens=250,
temperature=0.5
)
summaries += response['choices'][0]['message']['content'].strip() + " "
print("Summary:")
print(summaries)
system
parameter represents the instructions or context provided to the model to guide its behavior. It sets the overall behavior, tone, or role of the AI assistant. For example: "You are a technical instructor that provides step-by-step guidance". This helps set the expectation for the AI model and provides guidance on how it should respond.user
parameter represents the input from the user. It is where you provide your specific requests, questions, or instructions to the AI model. For example, you might use a user
prompt like, "Generate steps to follow from the transcript text".1
2
3
4
5
6
7
8
9
10
11
12
13
14
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo-16k",
messages=[
{"role": "system", "content": "You are a technical instructor."},
{"role": "user", "content": transcript_text},
{"role": "user", "content": "Generate steps to follow from text."},
]
)
# The assistant's reply
guide= response['choices'][0]['message']['content']
print("Steps:")
print(guide)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo-16k",
messages=[
{"role": "system", "content": "You are a helpful assistant that generates questions."},
{"role": "user", "content": transcript_text},
{"role": "user", "content": "Generate 10 quiz questions based on the text with multiple choices."},
]
)
# The assistant's reply
quiz_questions = response['choices'][0]['message']['content']
print("Quiz Questions:")
print(quiz_questions)
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.