
Creating Training Videos from Articles: Generative AI, AWS, and Python Approach
Dive into an innovative content creation process that harnesses the power of LLM models on Amazon Bedrock to convert written articles into dynamic video presentations. This approach utilizes Amazon Polly for lifelike text-to-speech conversion, combined with Python libraries visual generation of video.
Nitin Eusebius
Amazon Employee
Published Oct 3, 2024
Last Modified Nov 4, 2024
Today we will see an art of possible demo where we will leverage Generative AI to Transform Articles into Video Presentations.
This innovative content creation workflow leverages Amazon Bedrock's Large Language Models (LLMs) to transform written articles into lecture notes and then use services like Amazon Polly and Python to create engaging video presentations. The process integrates Amazon Polly for natural-sounding text-to-speech conversion and utilizes Python libraries for visual content generation, resulting in dynamic video outputs from textual input.

Note: This is demo code for illustrative purposes only. Not intended for production use.
Libraries
- Python 3.x
- boto3
- botocore
- beautifulsoup4
- pptx
- pymupdf
- moviepy
Setup
- Select an AWS region that supports the Anthropic Claude 3.5 Sonnet model. I'm using us-west-2 (Oregon). You can check the documentation for model support by region.
- Configure Amazon Bedrock model access for your account and region. Example here
- Required execution role for Amazon Sagemaker Studio. For this demo we will need access to Amazon Bedrock
We will first import our boto3 and other required libraries and initialize our bedrock client sdk
Now we will initialize the function which will call our LLMs , this will use the converse api and also will store the final output to be passed to the LLM prompt as context
We will now create a helper function to create our folders which will be used in this demo
Now set up audio and images folders
Next for the purpose of demo, we will create a function which can read the content of AWS whats new URL and prepare the article for our training video
You will see the following output, this will be used as a context for the prompt
Titles:
Llama 3.2 generative AI models now available in Amazon BedrockBodies:
Posted on: Sep 25, 2024
---
The Llama 3.2 collection of models are now available in Amazon Bedrock (https://aws.amazon.com/bedrock/) Amazon Bedrock . Llama 3.2 represents Meta’s latest advancement in large language models (LLMs). Llama 3.2 models are offered in various sizes, from small and medium-sized multimodal models, 11B and 90B parameter models, capable of sophisticated reasoning tasks including multimodal support for high resolution images to lightweight text-only 1B and 3B parameter models suitable for edge devices. Llama 3.2 is the first Llama model to support vision tasks, with a new model architecture that integrates image encoder representations into the language model. In addition to the existing text capable Llama 3.1 8B, 70B, and 405B models, Llama 3.2 supports multimodal use cases. You can now use four new Llama 3.2 models — 90B, 11B, 3B, and 1B — from Meta in Amazon Bedrock to unlock the next generation of AI possibilities. With a focus on responsible innovation and system-level safety, Llama 3.2 models help you build and deploy cutting-edge generative AI models and applications, leveraging Llama in Amazon Bedrock to ignite new innovations like image reasoning and are also more accessible for on edge applications. The new models are also designed to be more efficient for AI workloads, with reduced latency and improved performance, making them suitable for a wide range of applications. Meta’s Llama 3.2 90B and 11B models are available in Amazon Bedrock in the US West (Oregon) Region, and in the US East (Ohio, N. Virginia) Regions via cross-region inference (https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference-support.html) cross-region inference . Llama 3.2 1B and 3B models are available in the US West (Oregon) and Europe (Frankfurt) Regions, and in the US East (Ohio, N. Virginia) and Europe (Ireland, Paris) Regions via cross-region inference. To learn more, read the launch blog (https://aws.amazon.com/blogs/aws/introducing-llama-3-2-models-from-meta-in-amazon-bedrock-a-new-generation-of-multimodal-vision-and-lightweight-models) launch blog , Llama product page (https://aws.amazon.com/bedrock/llama/) Llama product page , and documentation (https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-meta.html) documentation . To get started with Llama 3.2 in Amazon Bedrock, visit the Amazon Bedrock console Amazon Bedrock console .
---
Now we will set the article from above result and also provide how many slides we want to create? This will be used to create the final video.
Now we will provide options to select the language and then accordingly set the voice id and language code for Amazon Polly. Below are for neural engine voices
Now lets look at the prompt. This prompt instructs an LLM to create a comprehensive video lecture content for technical employee training. It takes an input article as context and transforms it into a structured presentation content with a specified number of slides, translating the content into a designated language. Each slide content is meticulously organized with a concise title, key bullet points, and detailed lecture notes. The LLM enhances the presentation by adding an introductory slide and a concluding thank-you slide. To facilitate text-to-speech conversion using Amazon Polly, the prompt directs the LLM to incorporate SSML tags for natural-sounding narration. The final output is formatted as a JSON array, ready for further processing in a video creation pipeline.
Note : You can always change this based on your requirements, this is for illustration and art of possible purpose.
Now we will invoke our model and see the response, we are using claude 3.5 for our demo purpose.
As you can see the response is in Hindi which is the selected language for our demo. It has also has ssml tags from Amazon Polly . You can always test with other languages in this demo.
Now we will start the process to create a PPT with the content above. It will iterate the json response. You can use this pptx to even do your presentation, change your theme etc.
Note : Since I was on mac, I was not able to properly convert the pptx to pdf using python . For this step download the pptx manually and export to as lecture.pdf. Then move it to the same location where your code is.
We will now do the following
- Take each of the "lector_notes" from json in order, convert to audio using Amazon Polly
- Take each of the pdf pages and convert to image
- Then using python library moviepy, stitch everything together.
Lets start the Amazon Polly process
Now convert each page from pdf to image
Now final step is to bring all this together to create the final video using python's moviepy library.
You can view the full demo and also the final video here.
Happy Building !
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.