Automating Short-form Content Using Amazon Bedrock

Automating Short-form Content Using Amazon Bedrock

This article presents a Gen AI-powered tool that automatically edits long videos into short ones with a case study from the AWS Korea YouTube channel.

Kihoon Kwon
Amazon Employee
Published Oct 10, 2024
This article is written by two AWS solutions architects based in Seoul, South Korea, Kihoon Kwon and Sukwon Lee.

Introduction

This article introduces a tool that automates the process of editing long-form videos into short-form videos using generative AI models, and share a case study of how it was utilized on the AWS Korea YouTube channel.
Ever since its appearance, generative AI is being adopted across various industries. From simple chatbots to customer service contact centers, metadata extraction from images, and video generation, generative AI is no longer a new or distant future technology, but is gradually becoming a part of our daily lives. This article introduces a tool that automates the editing of short-form videos using generative AI, which can be deployed and utilized in your own environment.
This sample tool was used to manage the AWS Korea YouTube channel. By leveraging this tool, videos previously uploaded to the channel were re-edited into short-form content without additional professional editor, uploading 1-2 videos per day for about 7 weeks. Users can automate the editing of longer, horizontal videos up to 1 hour in length into 10+ vertical short-form videos under 1 minute. The tool allows users to select the desired Amazon Bedrock model, and also generates essential elements for short-form videos such as subtitles and titles. To address the issue of having to select a portion of the screen when converting to short-form, the tool also supports the ability for reviewers to select the desired sections of the screen.
The creation of this tool utilized AWS services such as Amazon Bedrock, the easiser way to scale Gen AI applications, the automatic speech recognition (ASR) service Amazon Transcribe, and the media service for video editing, Amazon Elemental MediaConvert, which allows you to process video files. The entire process of these services is composed of a serverless architecture using AWS Step Functions and AWS Lambda, resulting in a highly cost-effective structure that only charges for the amount of video processed. The tool is developed as a web application and configured to deploy backend resources and be hosted through AWS Amplify.

What is Short-form Videos?

Short-form videos generally refer to short video content with a length of less than 1 minute, and are characterized by a vertical 9:16 format suitable for mobile phones rather than the traditional horizontal 16:9 format. Since gaining popularity on various platforms in the 2020s, they have become the most consumed content on all mainstream social media platforms. In fact, according to an OpenSurvey report from February 2023, 7 out of 10 consumers (68.9%) consume short-form videos.
This trend has led to the use of short-form videos as a marketing tool in all industries beyond retail. Various content created to raise brand awareness using short-form videos can be found on social media platforms. Among short-form videos, a significant portion consists of content that re-edits highlights from long-form videos. They serve as a gateway to drive traffic to the original video by leveraging highlights, or are used to deliver messages concisely by dividing longer videos into more digestible segments.
However, the re-editing process involves repetitive and time-consuming tasks such as understanding the entire content, selecting which parts to extract, and inserting subtitles by modifying the format to fit short-form videos. The tool introduced today automates this editing work using generative AI, while also providing the flexibility for humans to easily review and edit necessary tasks.

Use case of AWS Korea Youtube Channel

AWS Korea Youtube Channel
AWS Korea Youtube Channel
AWS Korea regularly uploads session videos on customer case studies and guides for over 200 AWS services to the AWS Korea YouTube channel. Over the past 10 years, the number of uploaded videos has reached 2,000. To make the content of these videos more digestible and quickly reviewable for those who want to use AWS, AWS Korea planned to re-edit them into short-form videos. However, at the same time, a sustainable solution was needed to continuously re-edit past, present, and future videos without additional dedicated editor. The tool introduced below was developed and utilized for this purpose.
Using this tool, AWS Korea re-edited 8 previously uploaded long-form videos over a 7-week period, uploading a total of about 80 short-form videos. Most of the editing for this was automated, and the entire editing process could be completed in less than an hour, including the time to select desired scenes. Currently, these edited videos have recorded a total of more than 25,000 views.
AWS Korea Youtube Shorts Tab
AWS Korea Youtube Shorts Tab

Gen AI Short-form Generator

This tool can be found in the Gen AI Video Short-form Generator Repo. Detailed deployment guides can be found in the README file, and it can be deployed and utilized in your own AWS account.

Architecture Diagram

Architecture Diagram
Architecture Diagram
The overall architecture and flow of the utilized tool is as follows:
First, the Frontend was built and hosted through AWS Amplify. Authentication and authorization were configured using Amazon Cognito by leveraging AWS Amplify libraries.
Next, communication between the Frontend and Backend is based on AWS AppSync, a GraphQL service, and Amazon EventBridge, event-driven architecture service.
Videos, subtitle files, transcription results, etc. are stored and managed in Amazon S3, and data about processed videos per user and extracted highlight sections are stored and managed in Amazon DynamoDB, NoSQL DB service.
Two main Step Functions Workflows are used for automation. The extraction of highlights and editing up to the review stage after uploading the video to be edited is handled in Workflow 1, and the final short-form production after review is done in Workflow 2.
The user experience-based flow is as follows:

1. Select the model you want to use and upload the video

The first step of demo
The first step of demo
Users upload the video they want to edit through a UI like the one above. Furthermore, they can select the LLM model to be used for topic extraction and highlight section derivation from the dropdown menu.
At this point, the model's higher performance might lead to more accurate highlight section extraction and fewer errors. However, this is also related to the length of the video, so it is important to select an appropriate model by comparing the length of the video to be processed and the model's performance. There is always a tradeoff between the price and the performance, and one model does not fit all.

2. Highlight section extraction using generative AI (Step Functions Workflow 1)

The second step of demo
The second step of demo
AWS Step Functions Workflow 1 is a workflow that is called when the user first uploads a video and outputs various highlight videos. Users can perform a final review and editting of the multiple videos generated through this workflow before creating the final short-form.
The sequence is as follows:
  1. When the video upload is complete, the first Step Functions Workflow is invoked.
  2. Amazon Transcribe transcribes the transcript of the video.
  3. The transcription of the video is input into the selected Amazon Bedrock model to derive the main topics covered in the video.
  4. The main topics derived in step (3) and the transcription result are re-input into Amazon Bedrock to derive the script parts (highlights) covering each topic. The highlights are selected by the LLM from various parts throughout the video that cover the given topic and are extracted to contain only the key points within the given word count.
  5. Using AWS Lambda, the actual video timeframes corresponding to the script parts extracted in (4) are extracted from the transcription result generated by Amazon Transcribe in (2). The timeframes are calculated using the Python difflib library and word-level timestamps.
  6. Based on the extracted timeframes and the original video, AWS Elemental MediaConvert is used to re-edit the long video into short, horizontal videos for the corresponding parts.
  7. For the re-edited videos under 1 minute, subtitle files are generated using Amazon Transcribe.

3. Review process

In the review process, the user performs the following:
  1. Review and modify the generated video title
  2. Review and modify subtitles
  3. Select screen frames for each section.
When modifying, the title and subtitles are reflected with the changed content in DynamoDB and S3, where the original title was stored, during the final video production.
The third step of demo
The third step of demo
  1. Users can modify the generated title in Edit Title.
  2. In Edit Video Frame, users can cut the sections as desired and select the scene they want to show in the final video in a 1x1 ratio for that section. Users can crop and select the screen as much as they want. Currently, the tool has a fixed 1x1 ratio, but this can be changed with simple modifications to the application code and backend logic.
  3. Users can correct inaccurate transcription results for the subtitle file (typos, mis-spelling and others) to be inserted into the final video.

4. Create a final short-form video (Step Functions Workflow 2)

The fourth step of demo
The fourth step of demo
After the review and frame selection for each section, clicking the "Shortify" button generates the final short-form. The final short-form generation is executed as Step Functions Workflow 2.
  1. AWS Lambda generates a Job Template for running AWS Elemental MediaConvert based on the information about the user-selected scenes for each section.
  2. Then, AWS Lambda invokes AWS Elemental MediaConvert to generate the final video.

5. Check out the final video

The final step of demo
The final step of demo
The example result of the short-form video generated by the architecture and flow described above is shown in the screen above.

The Prompts when Extracting Topics and Highlight Sections

The prompts used to instruct the FM for topic and highlight section extraction in the tool are as follows:
  1. Topics Extraction
This prompt instructs the model to find topics in order from the entire video content. It provides the video script to the model and asks it to find 15 topics. The output format is also specified as JSON. This number can be adjusted as needed, and the prompt can also be optimized according to the type of video to be processed.
  1. Highlights Extraction
Here, the model is instructed to extract the corresponding script parts for the topics it extracted earlier and reconstruct them into a complete script. The model is also instructed to generate a video title for the part covering that topic after understanding the whole. The output format is also specified as JSON. The model is instructed to extract the entire part in less than 200 words. If you want to make the video length shorter, you can reduce this part.
The tool uses these prompts, but continuous improvement is essential for effective prompt engineering. By checking and monitoring the results and gradually refining the prompts, you can create a tool optimized for your own videos. Various prompt engineering techniques such as step-by-step thinking, using XML tags, and examples were utilized, and the actual prompts input 5 examples to the model as a 5-shot. More detailed guides on prompt engineering can be found at this link.

Conclusion

The generative AI-based short-form video automatic editing tool introduced in this article presents new possibilities in the fields of content creation and marketing. It goes beyond simply automating video editing to suggest ways to maximize the value of existing content and reach wider target audiences more quickly and effectively.
To effectively utilize this tool, consider the following points:
  1. Quality of the original content: The quality of the original content is important for producing good results.
  2. Model selection and prompt optimization: Refer to the provided prompts, but adjust them to fit the characteristics of your content to create optimal results.
  3. Importance of the review process: The review process is essential for the quality of the final video.
  4. Continuous feedback and improvement: Consider the tool to be in a state with the most basic functions, and continuously monitor the results and make improvements.
  5. Keeping up with the latest model trends: As AI models are rapidly evolving, periodically test and apply the latest models.
The combination of generative AI and AWS services can create new solutions that were previously unimaginable. Consider introducing generative AI in areas of your business where repetitive and time-consuming tasks can be automated in a similar way.
 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments