Creating Training Videos from Articles: Generative AI, AWS, and Python Approach

Today we will see an art of possible demo where we will leverage Generative AI to Transform Articles into Video Presentations.

This innovative content creation workflow leverages Amazon Bedrock's Large Language Models (LLMs) to transform written articles into lecture notes and then use services like Amazon Polly and Python to create engaging video presentations. The process integrates Amazon Polly for natural-sounding text-to-speech conversion and utilizes Python libraries for visual content generation, resulting in dynamic video outputs from textual input.

Image not found

Note: This is demo code for illustrative purposes only. Not intended for production use.

Prerequisites

Libraries

Python 3.x
boto3
botocore
beautifulsoup4
pptx
pymupdf
moviepy

Setup

Select an AWS region that supports the Anthropic Claude 3.5 Sonnet model. I'm using us-west-2 (Oregon). You can check the documentation for model support by region.
Configure Amazon Bedrock model access for your account and region. Example here
Required execution role for Amazon Sagemaker Studio. For this demo we will need access to Amazon Bedrock

We will first import our boto3 and other required libraries and initialize our bedrock client sdk

1
2
3
4
5
import boto3
import json
from botocore.exceptions import BotoCoreError, ClientError

bedrock_runtime = boto3.client("bedrock-runtime")

Now we will initialize the function which will call our LLMs , this will use the converse api and also will store the final output to be passed to the LLM prompt as context

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def invoke_model(user_message,model_id):
    conversation = [
        {
            "role": "user",
            "content": [{"text": user_message}],
        }
    ]

    complete_response = ""
    
    try:
        # Send the message to the model, using a basic inference configuration.
        streaming_response = bedrock_runtime.converse_stream(
            modelId=model_id,
            messages=conversation,
            inferenceConfig={"maxTokens": 2000, "temperature": 0.5, "topP": 0.9},
        )
    
        # Extract and print the streamed response text in real-time.
        for chunk in streaming_response["stream"]:
            if "contentBlockDelta" in chunk:
                text = chunk["contentBlockDelta"]["delta"]["text"]
                complete_response += text
                print(text, end="")
        print()

        return complete_response

    except (ClientError, Exception) as e:
        print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
        exit(1)

We will now create a helper function to create our folders which will be used in this demo

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Function to clean up and create a folder
def setup_folder(folder_path):
    if os.path.exists(folder_path):
        # Remove all files in the folder
        for filename in os.listdir(folder_path):
            file_path = os.path.join(folder_path, filename)
            try:
                if os.path.isfile(file_path):
                    os.unlink(file_path)
            except Exception as e:
                print(f"Error deleting {file_path}: {e}")
    else:
        # Create the folder if it doesn't exist
        os.makedirs(folder_path)
    print(f"Folder setup completed: {folder_path}")

Now set up audio and images folders

1
2
3
4
5
6
# Set up the folders
audio_folder = 'lecture_audio'
setup_folder(audio_folder)

audio_folder = 'slide_images'
setup_folder(audio_folder)

Next for the purpose of demo, we will create a function which can read the content of AWS whats new URL and prepare the article for our training video

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#Read from latest whats new based on link provided
import requests
from bs4 import BeautifulSoup

def scrape_content(url):
    # Send a GET request to the URL
    response = requests.get(url)
    
    # Initialize an empty string to store the full output
    full_output = ""
    
    # Check if the request was successful
    if response.status_code == 200:
        # Parse the HTML content
        soup = BeautifulSoup(response.text, 'html.parser')
        
        # Find all elements with class "wn-title"
        titles = soup.find_all(class_='wn-title')
        
        # Find all elements with class "wn-body"
        bodies = soup.find_all(class_='wn-body')
        
        # Function to extract text and links from an element
        def extract_content(element):
            content = []
            for child in element.descendants:
                if child.name == 'a' and child.has_attr('href'):
                    content.append(f"{child.text.strip()} ({child['href']})")
                elif isinstance(child, str) and child.strip():
                    content.append(child.strip())
            return ' '.join(content)
        
        # Process and store the results
        full_output += "Titles:\n"
        for title in titles:
            title_content = extract_content(title)
            full_output += f"{title_content}\n"
            #print(title_content)
        
        full_output += "\nBodies:\n"
        for body in bodies:
            body_content = extract_content(body)
            full_output += f"{body_content}\n---\n"
            #print(body_content)
            #print("---")  # Separator between body contents
    else:
        error_message = f"Failed to retrieve the webpage. Status code: {response.status_code}"
        full_output += error_message
        print(error_message)
    
    return full_output

url = "https://aws.amazon.com/about-aws/whats-new/2024/09/llama-3-2-generative-ai-models-amazon-bedrock/"  # Replace with your target URL
result = scrape_content(url)
print(result)

You will see the following output, this will be used as a context for the prompt

Titles:
Llama 3.2 generative AI models now available in Amazon BedrockBodies:
Posted on: Sep 25, 2024
---
The Llama 3.2 collection of models are now available in Amazon Bedrock (https://aws.amazon.com/bedrock/) Amazon Bedrock . Llama 3.2 represents Meta’s latest advancement in large language models (LLMs). Llama 3.2 models are offered in various sizes, from small and medium-sized multimodal models, 11B and 90B parameter models, capable of sophisticated reasoning tasks including multimodal support for high resolution images to lightweight text-only 1B and 3B parameter models suitable for edge devices. Llama 3.2 is the first Llama model to support vision tasks, with a new model architecture that integrates image encoder representations into the language model. In addition to the existing text capable Llama 3.1 8B, 70B, and 405B models, Llama 3.2 supports multimodal use cases. You can now use four new Llama 3.2 models — 90B, 11B, 3B, and 1B — from Meta in Amazon Bedrock to unlock the next generation of AI possibilities. With a focus on responsible innovation and system-level safety, Llama 3.2 models help you build and deploy cutting-edge generative AI models and applications, leveraging Llama in Amazon Bedrock to ignite new innovations like image reasoning and are also more accessible for on edge applications. The new models are also designed to be more efficient for AI workloads, with reduced latency and improved performance, making them suitable for a wide range of applications. Meta’s Llama 3.2 90B and 11B models are available in Amazon Bedrock in the US West (Oregon) Region, and in the US East (Ohio, N. Virginia) Regions via cross-region inference (https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference-support.html) cross-region inference . Llama 3.2 1B and 3B models are available in the US West (Oregon) and Europe (Frankfurt) Regions, and in the US East (Ohio, N. Virginia) and Europe (Ireland, Paris) Regions via cross-region inference. To learn more, read the launch blog (https://aws.amazon.com/blogs/aws/introducing-llama-3-2-models-from-meta-in-amazon-bedrock-a-new-generation-of-multimodal-vision-and-lightweight-models) launch blog , Llama product page (https://aws.amazon.com/bedrock/llama/) Llama product page , and documentation (https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-meta.html) documentation . To get started with Llama 3.2 in Amazon Bedrock, visit the Amazon Bedrock console Amazon Bedrock console .
---

Now we will set the article from above result and also provide how many slides we want to create? This will be used to create the final video.

1
2
article = result 
number_of_slides=2

Now we will provide options to select the language and then accordingly set the voice id and language code for Amazon Polly. Below are for neural engine voices

1
2
3
4
5
6
7
8
9
10
11
12
13
#language = spanish | english | hindi

language = 'hindi'

if language=='spanish':
    voice_id='Mia'
    language_code='es-MX'
elif language=='hindi':
    voice_id='Kajal'
    language_code='en-IN'
else :
    voice_id='Joanna'
    language_code='en-US'

Now lets look at the prompt. This prompt instructs an LLM to create a comprehensive video lecture content for technical employee training. It takes an input article as context and transforms it into a structured presentation content with a specified number of slides, translating the content into a designated language. Each slide content is meticulously organized with a concise title, key bullet points, and detailed lecture notes. The LLM enhances the presentation by adding an introductory slide and a concluding thank-you slide. To facilitate text-to-speech conversion using Amazon Polly, the prompt directs the LLM to incorporate SSML tags for natural-sounding narration. The final output is formatted as a JSON array, ready for further processing in a video creation pipeline.

Note : You can always change this based on your requirements, this is for illustration and art of possible purpose.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
prompt = f"""
You are a video instructor who needs to create a lecture to teach a class of technical employees for training.

Create a {number_of_slides} slides lecture based on the following article and translate to {language}.
<article>
{article}
<article>
Each slide should contain the following instructions:
<instructions>
  1. title: a single sentence that summarizes the main point
  2. key_points: a list of between 2 and 5 bullet points. Use phrases or code snippets, not full sentences.
  3. lecture_notes: 3-5 sentences explaining the key points in easy-to-understand language. Expand on the points using other information from the article. If the bullet point is code, explain what the code does.
  4. Add one extra slide, the first with just title of the article and with welcome message and what you will cover and last slide as a thank you outro slide
  5. Translate the numbers in title, key_points and lecture_notes into the required language also
  6. Output should be only in json array format with no preamble 
  7. Please add amazon polly SSML tags in all lecture_notes where you think is fit like Adding <break> - adding a pause, <s> - adding pause between sentence and <p> - adding pause between paragraphs and max 1-2 words only. Add a 2 sec pause towards the end of each lecture notes. Example 
  <example>
  <speak>Mary had a little lamb <break time="3s"/>Whose fleece was white as snow.</speak>
  </example>
</instructions>
"""

Now we will invoke our model and see the response, we are using claude 3.5 for our demo purpose.

1
response = invoke_model(prompt, "anthropic.claude-3-5-sonnet-20240620-v1:0")

As you can see the response is in Hindi which is the selected language for our demo. It has also has ssml tags from Amazon Polly . You can always test with other languages in this demo.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
[
  {
    "title": "Llama ३.२ जनरेटिव AI मॉडल्स अब Amazon Bedrock में उपलब्ध",
    "key_points": [
      "स्वागत संदेश",
      "कवर किए जाने वाले विषय"
    ],
    "lecture_notes": "<speak>नमस्कार और इस व्याख्यान में आपका स्वागत है। <break time=\"1s\"/> आज हम Amazon Bedrock में नए Llama ३.२ जनरेटिव AI मॉडल्स के बारे में चर्चा करेंगे। <s>हम इन मॉडल्स की विशेषताओं, उपलब्धता और उनके संभावित अनुप्रयोगों पर ध्यान केंद्रित करेंगे।</s> <p>चलिए शुरू करते हैं!</p> <break time=\"2s\"/></speak>"
  },
  {
    "title": "Llama ३.२ मॉडल्स की विशेषताएं और क्षमताएं",
    "key_points": [
      "विभिन्न आकार: १B से ९०B पैरामीटर",
      "मल्टीमोडल समर्थन",
      "उच्च रिज़ॉल्यूशन छवि प्रसंस्करण",
      "एज डिवाइस के लिए उपयुक्त"
    ],
    "lecture_notes": "<speak>Llama ३.२ मॉडल्स विभिन्न आकारों में आते हैं, <break time=\"0.5s\"/> १ बिलियन से लेकर ९० बिलियन पैरामीटर तक। <s>ये मॉडल मल्टीमोडल कार्यों का समर्थन करते हैं, जिसमें उच्च रिज़ॉल्यूशन छवियों का प्रसंस्करण शामिल है।</s> <p>छोटे मॉडल्स एज डिवाइस पर भी चल सकते हैं, जो इन्हें विभिन्न अनुप्रयोगों के लिए उपयुक्त बनाता है।</p> <break time=\"2s\"/></speak>"
  },
  {
    "title": "Amazon Bedrock में Llama ३.२ की उपलब्धता",
    "key_points": [
      "US West (Oregon), US East (Ohio, N. Virginia)",
      "Europe (Frankfurt, Ireland, Paris)",
      "क्रॉस-रीजन इन्फरेंस",
      "Amazon Bedrock कंसोल पर उपलब्ध"
    ],
    "lecture_notes": "<speak>Llama ३.२ मॉडल्स अब Amazon Bedrock पर विभिन्न क्षेत्रों में उपलब्ध हैं। <break time=\"0.5s\"/> ये US West और US East के साथ-साथ यूरोप के कुछ क्षेत्रों में भी मिल सकते हैं। <s>क्रॉस-रीजन इन्फरेंस की सुविधा भी उपलब्ध है।</s> <p>आप इन मॉडल्स का उपयोग करने के लिए Amazon Bedrock कंसोल पर जा सकते हैं।</p> <break time=\"2s\"/></speak>"
  },
  {
    "title": "धन्यवाद और समापन",
    "key_points": [
      "व्याख्यान का सारांश",
      "अतिरिक्त संसाधन"
    ],
    "lecture_notes": "<speak>आज के व्याख्यान के लिए धन्यवाद। <break time=\"1s\"/> हमने Llama ३.२ मॉडल्स की विशेषताओं और Amazon Bedrock में उनकी उपलब्धता पर चर्चा की। <s>अधिक जानकारी के लिए, कृपया लॉन्च ब्लॉग, Llama प्रोडक्ट पेज और दस्तावेज़ीकरण देखें।</s> <p>अपने AI प्रोजेक्ट्स में इन नए मॉडल्स का लाभ उठाएं!</p> <break time=\"2s\"/></speak>"
  }
]

Now we will start the process to create a PPT with the content above. It will iterate the json response. You can use this pptx to even do your presentation, change your theme etc.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
#ppt creation process
from pptx import Presentation
from pptx.util import Pt, Inches
from pptx.dml.color import RGBColor
presentation = Presentation()

print(len(lecture_json))
total_slide_count=len(lecture_json)
slide_count=0
for lecture in lecture_json:
    slide_count = slide_count+1
    if(slide_count==1):
        slide = presentation.slides.add_slide(presentation.slide_layouts[0])
    elif(slide_count==total_slide_count):
        slide = presentation.slides.add_slide(presentation.slide_layouts[0])
    else:
        slide = presentation.slides.add_slide(presentation.slide_layouts[1])

    
    # Add a light blue background
    background = slide.background
    fill = background.fill
    fill.solid()
    fill.fore_color.rgb = RGBColor(230, 240, 250)  # Light blue color
    
    # Set the slide title
    title = slide.shapes.title
    title.text = lecture['title']
    #adding new 
    title_font = title.text_frame.paragraphs[0].font
    title_font.name = 'Arial'
    title_font.size = Pt(32)
    title_font.color.rgb = RGBColor(0, 0, 0)  # Black color
    
    # Add key points as bullet points
    textframe = slide.placeholders[1].text_frame
    for key_point in lecture['key_points']:
        p = textframe.add_paragraph()
        p.text = key_point
        p.level = 1
        #add new 
        p_font = p.font
        p_font.name = 'Arial'
        p_font.size = Pt(18)
        p_font.color.rgb = RGBColor(0, 0, 0)  # Black color
    
    # Add lecture notes to the notes section of the slide
    notes_frame = slide.notes_slide.notes_text_frame
    notes_frame.text = lecture['lecture_notes']
    #adding new
    notes_font = notes_frame.paragraphs[0].font
    notes_font.name = 'Arial'
    notes_font.size = Pt(12)

presentation.save('lecture.pptx')
print("lecture.pptx successfully created")

Note : Since I was on mac, I was not able to properly convert the pptx to pdf using python . For this step download the pptx manually and export to as lecture.pdf. Then move it to the same location where your code is.

We will now do the following

Take each of the "lector_notes" from json in order, convert to audio using Amazon Polly
Take each of the pdf pages and convert to image
Then using python library moviepy, stitch everything together.

Lets start the Amazon Polly process

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
#Start polly process
from contextlib import closing

# Initialize a session using Amazon Polly
polly_client = boto3.Session().client('polly')

def generate_audio(text, output_filename):

    try:
        # Request speech synthesis
        response = polly_client.synthesize_speech(
            Engine='neural',
            Text=text,
            OutputFormat='mp3',
            VoiceId=voice_id,  # You can change the voice as needed,
            TextType='ssml',
            LanguageCode=language_code
        )
    except (BotoCoreError, ClientError) as error:
        print(f"Error generating audio for {output_filename}: {error}")
        return False
    #print("Audio returned")
    # Access the audio stream from the response
    if "AudioStream" in response:
        with closing(response["AudioStream"]) as stream:
            try:
                # Write the audio to a file
                with open(output_filename, "wb") as file:
                    file.write(stream.read())
            except IOError as error:
                print(f"Error writing audio to file {output_filename}: {error}")
                return False

# Create audio files for each lecture
for index, lecture in enumerate(lecture_json, start=1):
    lecture_notes = lecture['lecture_notes']
    output_filename = os.path.join(audio_folder, f"lecture_audio_{index:02d}.mp3")
    #lecture_notes = '<speak>' + lecture_notes + '<break time="2s"/>' + '</speak>'
    #print(lecture_notes)
    
    success = generate_audio(lecture_notes, output_filename)
    
    if not success:
        print(f"Failed to generate audio for lecture {index}")

print("Audio generation process completed.")

    else:
        print(f"Could not generate audio for {output_filename}")
        return False

    print(f"Successfully created audio file: {output_filename}")
    return True

1
2
3
4
5
Successfully created audio file: lecture_audio/lecture_audio_01.mp3
Successfully created audio file: lecture_audio/lecture_audio_02.mp3
Successfully created audio file: lecture_audio/lecture_audio_03.mp3
Successfully created audio file: lecture_audio/lecture_audio_04.mp3
Audio generation process completed.

Now convert each page from pdf to image

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import fitz  # PyMuPDF
import os

def pdf_to_images(pdf_path, output_folder='slide_images', zoom=2):
    # Create output folder if it doesn't exist
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)

    # Open the PDF
    pdf = fitz.open(pdf_path)

    # Iterate over each page
    for page_num in range(len(pdf)):
        page = pdf[page_num]
        
        # Render page to an image
        pix = page.get_pixmap(matrix=fitz.Matrix(zoom, zoom))
        
        # Save the image with the correct naming convention
        image_path = os.path.join(output_folder, f'slide_{page_num+1:02d}.png')
        pix.save(image_path)
        
        print(f'Saved slide {page_num+1} as {image_path}')

    print(f'Converted {len(pdf)} slides to images in {output_folder}')

    # Close the PDF
    pdf.close()

# Usage
pdf_path = 'lecture.pdf'  # Replace with your PDF file path

pdf_to_images(pdf_path)

1
2
3
4
5
Saved slide 1 as slide_images/slide_01.png
Saved slide 2 as slide_images/slide_02.png
Saved slide 3 as slide_images/slide_03.png
Saved slide 4 as slide_images/slide_04.png
Converted 4 slides to images in slide_images

Now final step is to bring all this together to create the final video using python's moviepy library.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
import os
from moviepy.editor import *

def create_lecture_video(slide_folder, audio_folder, output_file, fps=24):

    slide_files = sorted([f for f in os.listdir(slide_folder) if f.endswith('.png')])
    audio_files = sorted([f for f in os.listdir(audio_folder) if f.endswith('.mp3')])

    clips = []
    for slide, audio in zip(slide_files, audio_files):
        audio_clip = AudioFileClip(os.path.join(audio_folder, audio))
        slide_clip = ImageClip(os.path.join(slide_folder, slide)).set_duration(audio_clip.duration)
        slide_clip = slide_clip.set_audio(audio_clip)
        clips.append(slide_clip)

    full_video = concatenate_videoclips(clips)
    
    print(f"Video duration: {full_video.duration} seconds")
    print(f"Video fps: {fps}")

    # Set the fps for the full video
    full_video = full_video.set_fps(fps)

    # Write the final video
    print("Writing final video...")
    full_video.write_videofile(
        output_file,
        fps=fps,
        codec="libx264",
        audio_codec="aac",
        temp_audiofile="temp_audio.m4a",
        remove_temp=True
    )

    print(f"Video created successfully: {output_file}")

# Usage
slide_folder = 'slide_images'
audio_folder = 'lecture_audio'
output_file = 'lecture.mp4'

create_lecture_video(slide_folder, audio_folder, output_file)

1
2
3
4
5
6
7
8
9
10
11
12
Video duration: 85.91 seconds
Video fps: 24
Writing final video...
Moviepy - Building video lecture.mp4.
MoviePy - Writing audio in temp_audio.m4a
                                                                     
MoviePy - Done.
Moviepy - Writing video lecture.mp4
                                                                
Moviepy - Done !
Moviepy - video ready lecture.mp4
Video created successfully: lecture.mp4

You can view the full demo and also the final video here.

Happy Building !

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Select your cookie preferences

Site Terms, Privacy, and more.

Creating Training Videos from Articles: Generative AI, AWS, and Python Approach

Prerequisites

Comments