
Creating Training Videos from Articles: Generative AI, AWS, and Python Approach
Dive into an innovative content creation process that harnesses the power of LLM models on Amazon Bedrock to convert written articles into dynamic video presentations. This approach utilizes Amazon Polly for lifelike text-to-speech conversion, combined with Python libraries visual generation of video.
- Python 3.x
- boto3
- botocore
- beautifulsoup4
- pptx
- pymupdf
- moviepy
- Select an AWS region that supports the Anthropic Claude 3.5 Sonnet model. I'm using us-west-2 (Oregon). You can check the documentation for model support by region.
- Configure Amazon Bedrock model access for your account and region. Example here
- Required execution role for Amazon Sagemaker Studio. For this demo we will need access to Amazon Bedrock
1
2
3
4
5
import boto3
import json
from botocore.exceptions import BotoCoreError, ClientError
bedrock_runtime = boto3.client("bedrock-runtime")
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def invoke_model(user_message,model_id):
conversation = [
{
"role": "user",
"content": [{"text": user_message}],
}
]
complete_response = ""
try:
# Send the message to the model, using a basic inference configuration.
streaming_response = bedrock_runtime.converse_stream(
modelId=model_id,
messages=conversation,
inferenceConfig={"maxTokens": 2000, "temperature": 0.5, "topP": 0.9},
)
# Extract and print the streamed response text in real-time.
for chunk in streaming_response["stream"]:
if "contentBlockDelta" in chunk:
text = chunk["contentBlockDelta"]["delta"]["text"]
complete_response += text
print(text, end="")
print()
return complete_response
except (ClientError, Exception) as e:
print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
exit(1)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Function to clean up and create a folder
def setup_folder(folder_path):
if os.path.exists(folder_path):
# Remove all files in the folder
for filename in os.listdir(folder_path):
file_path = os.path.join(folder_path, filename)
try:
if os.path.isfile(file_path):
os.unlink(file_path)
except Exception as e:
print(f"Error deleting {file_path}: {e}")
else:
# Create the folder if it doesn't exist
os.makedirs(folder_path)
print(f"Folder setup completed: {folder_path}")
1
2
3
4
5
6
# Set up the folders
audio_folder = 'lecture_audio'
setup_folder(audio_folder)
audio_folder = 'slide_images'
setup_folder(audio_folder)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#Read from latest whats new based on link provided
import requests
from bs4 import BeautifulSoup
def scrape_content(url):
# Send a GET request to the URL
response = requests.get(url)
# Initialize an empty string to store the full output
full_output = ""
# Check if the request was successful
if response.status_code == 200:
# Parse the HTML content
soup = BeautifulSoup(response.text, 'html.parser')
# Find all elements with class "wn-title"
titles = soup.find_all(class_='wn-title')
# Find all elements with class "wn-body"
bodies = soup.find_all(class_='wn-body')
# Function to extract text and links from an element
def extract_content(element):
content = []
for child in element.descendants:
if child.name == 'a' and child.has_attr('href'):
content.append(f"{child.text.strip()} ({child['href']})")
elif isinstance(child, str) and child.strip():
content.append(child.strip())
return ' '.join(content)
# Process and store the results
full_output += "Titles:\n"
for title in titles:
title_content = extract_content(title)
full_output += f"{title_content}\n"
#print(title_content)
full_output += "\nBodies:\n"
for body in bodies:
body_content = extract_content(body)
full_output += f"{body_content}\n---\n"
#print(body_content)
#print("---") # Separator between body contents
else:
error_message = f"Failed to retrieve the webpage. Status code: {response.status_code}"
full_output += error_message
print(error_message)
return full_output
url = "https://aws.amazon.com/about-aws/whats-new/2024/09/llama-3-2-generative-ai-models-amazon-bedrock/" # Replace with your target URL
result = scrape_content(url)
print(result)
Titles:
Llama 3.2 generative AI models now available in Amazon BedrockBodies:
Posted on: Sep 25, 2024
---
The Llama 3.2 collection of models are now available in Amazon Bedrock (https://aws.amazon.com/bedrock/) Amazon Bedrock . Llama 3.2 represents Meta’s latest advancement in large language models (LLMs). Llama 3.2 models are offered in various sizes, from small and medium-sized multimodal models, 11B and 90B parameter models, capable of sophisticated reasoning tasks including multimodal support for high resolution images to lightweight text-only 1B and 3B parameter models suitable for edge devices. Llama 3.2 is the first Llama model to support vision tasks, with a new model architecture that integrates image encoder representations into the language model. In addition to the existing text capable Llama 3.1 8B, 70B, and 405B models, Llama 3.2 supports multimodal use cases. You can now use four new Llama 3.2 models — 90B, 11B, 3B, and 1B — from Meta in Amazon Bedrock to unlock the next generation of AI possibilities. With a focus on responsible innovation and system-level safety, Llama 3.2 models help you build and deploy cutting-edge generative AI models and applications, leveraging Llama in Amazon Bedrock to ignite new innovations like image reasoning and are also more accessible for on edge applications. The new models are also designed to be more efficient for AI workloads, with reduced latency and improved performance, making them suitable for a wide range of applications. Meta’s Llama 3.2 90B and 11B models are available in Amazon Bedrock in the US West (Oregon) Region, and in the US East (Ohio, N. Virginia) Regions via cross-region inference (https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference-support.html) cross-region inference . Llama 3.2 1B and 3B models are available in the US West (Oregon) and Europe (Frankfurt) Regions, and in the US East (Ohio, N. Virginia) and Europe (Ireland, Paris) Regions via cross-region inference. To learn more, read the launch blog (https://aws.amazon.com/blogs/aws/introducing-llama-3-2-models-from-meta-in-amazon-bedrock-a-new-generation-of-multimodal-vision-and-lightweight-models) launch blog , Llama product page (https://aws.amazon.com/bedrock/llama/) Llama product page , and documentation (https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-meta.html) documentation . To get started with Llama 3.2 in Amazon Bedrock, visit the Amazon Bedrock console Amazon Bedrock console .
---
1
2
article = result
number_of_slides=2
1
2
3
4
5
6
7
8
9
10
11
12
13
#language = spanish | english | hindi
language = 'hindi'
if language=='spanish':
voice_id='Mia'
language_code='es-MX'
elif language=='hindi':
voice_id='Kajal'
language_code='en-IN'
else :
voice_id='Joanna'
language_code='en-US'
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
prompt = f"""
You are a video instructor who needs to create a lecture to teach a class of technical employees for training.
Create a {number_of_slides} slides lecture based on the following article and translate to {language}.
<article>
{article}
<article>
Each slide should contain the following instructions:
<instructions>
1. title: a single sentence that summarizes the main point
2. key_points: a list of between 2 and 5 bullet points. Use phrases or code snippets, not full sentences.
3. lecture_notes: 3-5 sentences explaining the key points in easy-to-understand language. Expand on the points using other information from the article. If the bullet point is code, explain what the code does.
4. Add one extra slide, the first with just title of the article and with welcome message and what you will cover and last slide as a thank you outro slide
5. Translate the numbers in title, key_points and lecture_notes into the required language also
6. Output should be only in json array format with no preamble
7. Please add amazon polly SSML tags in all lecture_notes where you think is fit like Adding <break> - adding a pause, <s> - adding pause between sentence and <p> - adding pause between paragraphs and max 1-2 words only. Add a 2 sec pause towards the end of each lecture notes. Example
<example>
<speak>Mary had a little lamb <break time="3s"/>Whose fleece was white as snow.</speak>
</example>
</instructions>
"""
1
response = invoke_model(prompt, "anthropic.claude-3-5-sonnet-20240620-v1:0")
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
[
{
"title": "Llama ३.२ जनरेटिव AI मॉडल्स अब Amazon Bedrock में उपलब्ध",
"key_points": [
"स्वागत संदेश",
"कवर किए जाने वाले विषय"
],
"lecture_notes": "<speak>नमस्कार और इस व्याख्यान में आपका स्वागत है। <break time=\"1s\"/> आज हम Amazon Bedrock में नए Llama ३.२ जनरेटिव AI मॉडल्स के बारे में चर्चा करेंगे। <s>हम इन मॉडल्स की विशेषताओं, उपलब्धता और उनके संभावित अनुप्रयोगों पर ध्यान केंद्रित करेंगे।</s> <p>चलिए शुरू करते हैं!</p> <break time=\"2s\"/></speak>"
},
{
"title": "Llama ३.२ मॉडल्स की विशेषताएं और क्षमताएं",
"key_points": [
"विभिन्न आकार: १B से ९०B पैरामीटर",
"मल्टीमोडल समर्थन",
"उच्च रिज़ॉल्यूशन छवि प्रसंस्करण",
"एज डिवाइस के लिए उपयुक्त"
],
"lecture_notes": "<speak>Llama ३.२ मॉडल्स विभिन्न आकारों में आते हैं, <break time=\"0.5s\"/> १ बिलियन से लेकर ९० बिलियन पैरामीटर तक। <s>ये मॉडल मल्टीमोडल कार्यों का समर्थन करते हैं, जिसमें उच्च रिज़ॉल्यूशन छवियों का प्रसंस्करण शामिल है।</s> <p>छोटे मॉडल्स एज डिवाइस पर भी चल सकते हैं, जो इन्हें विभिन्न अनुप्रयोगों के लिए उपयुक्त बनाता है।</p> <break time=\"2s\"/></speak>"
},
{
"title": "Amazon Bedrock में Llama ३.२ की उपलब्धता",
"key_points": [
"US West (Oregon), US East (Ohio, N. Virginia)",
"Europe (Frankfurt, Ireland, Paris)",
"क्रॉस-रीजन इन्फरेंस",
"Amazon Bedrock कंसोल पर उपलब्ध"
],
"lecture_notes": "<speak>Llama ३.२ मॉडल्स अब Amazon Bedrock पर विभिन्न क्षेत्रों में उपलब्ध हैं। <break time=\"0.5s\"/> ये US West और US East के साथ-साथ यूरोप के कुछ क्षेत्रों में भी मिल सकते हैं। <s>क्रॉस-रीजन इन्फरेंस की सुविधा भी उपलब्ध है।</s> <p>आप इन मॉडल्स का उपयोग करने के लिए Amazon Bedrock कंसोल पर जा सकते हैं।</p> <break time=\"2s\"/></speak>"
},
{
"title": "धन्यवाद और समापन",
"key_points": [
"व्याख्यान का सारांश",
"अतिरिक्त संसाधन"
],
"lecture_notes": "<speak>आज के व्याख्यान के लिए धन्यवाद। <break time=\"1s\"/> हमने Llama ३.२ मॉडल्स की विशेषताओं और Amazon Bedrock में उनकी उपलब्धता पर चर्चा की। <s>अधिक जानकारी के लिए, कृपया लॉन्च ब्लॉग, Llama प्रोडक्ट पेज और दस्तावेज़ीकरण देखें।</s> <p>अपने AI प्रोजेक्ट्स में इन नए मॉडल्स का लाभ उठाएं!</p> <break time=\"2s\"/></speak>"
}
]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
#ppt creation process
from pptx import Presentation
from pptx.util import Pt, Inches
from pptx.dml.color import RGBColor
presentation = Presentation()
print(len(lecture_json))
total_slide_count=len(lecture_json)
slide_count=0
for lecture in lecture_json:
slide_count = slide_count+1
if(slide_count==1):
slide = presentation.slides.add_slide(presentation.slide_layouts[0])
elif(slide_count==total_slide_count):
slide = presentation.slides.add_slide(presentation.slide_layouts[0])
else:
slide = presentation.slides.add_slide(presentation.slide_layouts[1])
# Add a light blue background
background = slide.background
fill = background.fill
fill.solid()
fill.fore_color.rgb = RGBColor(230, 240, 250) # Light blue color
# Set the slide title
title = slide.shapes.title
title.text = lecture['title']
#adding new
title_font = title.text_frame.paragraphs[0].font
title_font.name = 'Arial'
title_font.size = Pt(32)
title_font.color.rgb = RGBColor(0, 0, 0) # Black color
# Add key points as bullet points
textframe = slide.placeholders[1].text_frame
for key_point in lecture['key_points']:
p = textframe.add_paragraph()
p.text = key_point
p.level = 1
#add new
p_font = p.font
p_font.name = 'Arial'
p_font.size = Pt(18)
p_font.color.rgb = RGBColor(0, 0, 0) # Black color
# Add lecture notes to the notes section of the slide
notes_frame = slide.notes_slide.notes_text_frame
notes_frame.text = lecture['lecture_notes']
#adding new
notes_font = notes_frame.paragraphs[0].font
notes_font.name = 'Arial'
notes_font.size = Pt(12)
presentation.save('lecture.pptx')
print("lecture.pptx successfully created")
- Take each of the "lector_notes" from json in order, convert to audio using Amazon Polly
- Take each of the pdf pages and convert to image
- Then using python library moviepy, stitch everything together.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
#Start polly process
from contextlib import closing
# Initialize a session using Amazon Polly
polly_client = boto3.Session().client('polly')
def generate_audio(text, output_filename):
try:
# Request speech synthesis
response = polly_client.synthesize_speech(
Engine='neural',
Text=text,
OutputFormat='mp3',
VoiceId=voice_id, # You can change the voice as needed,
TextType='ssml',
LanguageCode=language_code
)
except (BotoCoreError, ClientError) as error:
print(f"Error generating audio for {output_filename}: {error}")
return False
#print("Audio returned")
# Access the audio stream from the response
if "AudioStream" in response:
with closing(response["AudioStream"]) as stream:
try:
# Write the audio to a file
with open(output_filename, "wb") as file:
file.write(stream.read())
except IOError as error:
print(f"Error writing audio to file {output_filename}: {error}")
return False
# Create audio files for each lecture
for index, lecture in enumerate(lecture_json, start=1):
lecture_notes = lecture['lecture_notes']
output_filename = os.path.join(audio_folder, f"lecture_audio_{index:02d}.mp3")
#lecture_notes = '<speak>' + lecture_notes + '<break time="2s"/>' + '</speak>'
#print(lecture_notes)
success = generate_audio(lecture_notes, output_filename)
if not success:
print(f"Failed to generate audio for lecture {index}")
print("Audio generation process completed.")
else:
print(f"Could not generate audio for {output_filename}")
return False
print(f"Successfully created audio file: {output_filename}")
return True
1
2
3
4
5
Successfully created audio file: lecture_audio/lecture_audio_01.mp3
Successfully created audio file: lecture_audio/lecture_audio_02.mp3
Successfully created audio file: lecture_audio/lecture_audio_03.mp3
Successfully created audio file: lecture_audio/lecture_audio_04.mp3
Audio generation process completed.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import fitz # PyMuPDF
import os
def pdf_to_images(pdf_path, output_folder='slide_images', zoom=2):
# Create output folder if it doesn't exist
if not os.path.exists(output_folder):
os.makedirs(output_folder)
# Open the PDF
pdf = fitz.open(pdf_path)
# Iterate over each page
for page_num in range(len(pdf)):
page = pdf[page_num]
# Render page to an image
pix = page.get_pixmap(matrix=fitz.Matrix(zoom, zoom))
# Save the image with the correct naming convention
image_path = os.path.join(output_folder, f'slide_{page_num+1:02d}.png')
pix.save(image_path)
print(f'Saved slide {page_num+1} as {image_path}')
print(f'Converted {len(pdf)} slides to images in {output_folder}')
# Close the PDF
pdf.close()
# Usage
pdf_path = 'lecture.pdf' # Replace with your PDF file path
pdf_to_images(pdf_path)
1
2
3
4
5
Saved slide 1 as slide_images/slide_01.png
Saved slide 2 as slide_images/slide_02.png
Saved slide 3 as slide_images/slide_03.png
Saved slide 4 as slide_images/slide_04.png
Converted 4 slides to images in slide_images
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
import os
from moviepy.editor import *
def create_lecture_video(slide_folder, audio_folder, output_file, fps=24):
slide_files = sorted([f for f in os.listdir(slide_folder) if f.endswith('.png')])
audio_files = sorted([f for f in os.listdir(audio_folder) if f.endswith('.mp3')])
clips = []
for slide, audio in zip(slide_files, audio_files):
audio_clip = AudioFileClip(os.path.join(audio_folder, audio))
slide_clip = ImageClip(os.path.join(slide_folder, slide)).set_duration(audio_clip.duration)
slide_clip = slide_clip.set_audio(audio_clip)
clips.append(slide_clip)
full_video = concatenate_videoclips(clips)
print(f"Video duration: {full_video.duration} seconds")
print(f"Video fps: {fps}")
# Set the fps for the full video
full_video = full_video.set_fps(fps)
# Write the final video
print("Writing final video...")
full_video.write_videofile(
output_file,
fps=fps,
codec="libx264",
audio_codec="aac",
temp_audiofile="temp_audio.m4a",
remove_temp=True
)
print(f"Video created successfully: {output_file}")
# Usage
slide_folder = 'slide_images'
audio_folder = 'lecture_audio'
output_file = 'lecture.mp4'
create_lecture_video(slide_folder, audio_folder, output_file)
1
2
3
4
5
6
7
8
9
10
11
12
Video duration: 85.91 seconds
Video fps: 24
Writing final video...
Moviepy - Building video lecture.mp4.
MoviePy - Writing audio in temp_audio.m4a
MoviePy - Done.
Moviepy - Writing video lecture.mp4
Moviepy - Done !
Moviepy - video ready lecture.mp4
Video created successfully: lecture.mp4
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.