I Built an App to Dub Videos Using AI
AI tools can feel overwhelming, but they're basically just new endpoints to call. Here's what I built to help serve my Spanish-speaking audience.
- The state machine is triggered by an event. It is very simple to define that using EventBridge rules when you create the state machine. Here you can see an example using AWS SAM.
- Most of the logic of this state machine is by calling the AWS Services directly. This is done by using the direct integration that AWS Step Functions provides with over 200 services. In the following example, you can see how you can start a transcription job directly from the state machine, and you can pass all the parameters. This example is written with Amazon State Language (ASL), the language you use to define state machines.
1
2
3
4
5
6
7
8
9
10
11
12
TranscribeVideo:
Comment: 'Given the input video starts a transcription job'
Type: Task
Next: WaitForTranscribe
Resource: 'arn:aws:states:::aws-sdk:transcribe:startTranscriptionJob'
Parameters:
Media:
MediaFileUri.$: States.Format('s3://{}/{}', $.detail.bucket.name, $.detail.object.key)
TranscriptionJobName.$: $$.Execution.Name
OutputBucketName: ${TranscribedBucket}
OutputKey.$: States.Format('{}.txt', $.detail.object.key)
LanguageCode: en-US
- The use of AWS Step Function intrinsic functions: Intrinsic functions help you to perform basic data processing operations without using a task. You can manipulate arrays, strings, hashes, create unique IDs, base64 decode or encode, and many other operations directly from the state machine. Whenever you see the
States.XXX
, this means that an intrinsic function is being used. The following example uses intrinsic functions two times nested, when creating the key for the object to store in S3, it first splits a string (States.StringSplit
) and then it gets the element in the third place (States.ArrayGetItem
).
1
2
3
4
5
6
7
8
9
Store Transcript in S3:
Type: Task
Next: FormatURI
Resource: arn:aws:states:::aws-sdk:s3:putObject
ResultPath: $.result
Parameters:
Bucket: ${TranscribedBucket}
Key.$: States.ArrayGetItem(States.StringSplit($.TranscriptionJob.Transcript.TranscriptFileUri, '/'),3)
Body.$: $.transcription.filecontent.results.transcripts[0].transcript
1
2
3
4
5
6
7
8
9
10
11
12
13
GenerateVideoMetadataFunction:
Type: AWS::Serverless::Function
Properties:
CodeUri: src/
Handler: index.lambda_handler
Runtime: python3.9
MemorySize: 128
Timeout: 600
Policies:
- Statement:
- Effect: Allow
Action: 'bedrock:*'
Resource: '*'
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import boto3
import json
import os
bedrock = boto3.client(service_name='bedrock-runtime')
def lambda_handler(event, context):
key = event['detail']['object']['key']
key = key.split('/')[1]
prompt="Given the transcript provided at the end of the prompt, return a JSON object with the following properties: description, titles, and tags. For the description write a compelling description for a YouTube video, that is maximum two paragraphs long and has good SEO. Don't focus on the person who is mentioned in the video, just focus on the content of the video. The description should be in the same language as the video. For the title, return an array of 5 different title options for this video. For the tags, provide an array of 20 tags for the video. Here is the transcript of the video: {}".format(event['body']['filecontent'])
body = json.dumps({
"prompt": prompt,
"maxTokens": 1525,
"temperature": 0.7,
"topP": 1,
"stopSequences":[],
"countPenalty":{"scale":0},
"presencePenalty":{"scale":0},
"frequencyPenalty":{"scale":0}})
modelId = 'ai21.j2-ultra-v1'
accept = 'application/json'
contentType = 'application/json'
response = bedrock.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
response_body = json.loads(response.get('body').read())
description = json.dumps(response_body.get("completions")[0].get("data").get("text"))
result = {"key": key,
"description": description,
"region": os.environ['AWS_REGION']
}
return result
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.