logo
Menu
Tutorial

Build A Translator App in 30 Min or Less

Use Amazon Translate, Amazon Comprehend, Amazon Lambda, Amazon Polly, and Amazon Lex to bring a translation application to life and test it in 30 minutes or less.

Published Sep 15, 2023

Half an hour might not seem like enough time for an important project, but it's enough time to build and test a language application on AWS. There are hundreds of translator apps to help us engage with various cultures, people, and the 7,000+ languages spoken globally. However, building your own app gives you hands-on experience. Creating something yourself piece-by-piece is where the real learning happens: the key is gain new skills and developing your abilities through practice.

In this blog, you'll build a translation app. In just a few steps, you can make one capable of identifying the input language, translating into multiple languages, and generating audio files with correct pronunciation. I'll guide you step-by-step on how to combine AWS services to bring your app to life and test it.

βœ… AWS Level
Intermediate - 200
⏱ Time to complete
30 minutes
πŸ’° Cost to complete
πŸ“’ Feedback
Any feedback, issues, or just a πŸ‘ / πŸ‘Ž ?
⏰ Last Updated
2023-09-15

  • How to use Artificial Intelligence service to detect the dominant language in texts.
  • How to use Artificial Intelligence service to translate text.
  • How to use Artificial Intelligence service to convert text into lifelike speech.
  • How to create a Artificial Intelligence conversational interfaces bot to handle translation requests.

In this tutorial you are going to create a translator chatbot app, with Amazon Lex that will handle the frontend interaction with the user, and the backend will be in charge of an AWS Lambda Function with the AWS SDK for Python library Boto3 code using the following AWS services:

Diagram translator chatbot app"
Fig 1. Diagram translator chatbot app.

  • Part 1 - Create the Function That Detects the Language and Translates It Into the Desired Languag 🌎.
  • Part 2 - Create the Function to Converts Text Into Lifelike Speech 🦜.
  • Part 3 - Configure the Chatbot Interface With Amazon LexπŸ€–.
  • Part 4 - Build the Interface Between the Backend and the Frontend.
  • Part 5 - Integrate the Backend With the Frontend.
  • Part 6 - Let’s Get It to Work!
  • Part 7 - Deploy Your Translator App.

You may doubt this build's speed. But, keep reading, you'll discover it can be done in under 30 minutes.

Let’s get started!

In this part you are going to use two fully managed AI service, Amazon Translate to translate across common languages unstructured text (UTF-8) documents or to build applications that work in multiple languagues using TranslateText from Boto3 Translate client, and Amazon Comprehend to detect the dominant language of the text you want to translate using the API DetectDominantLanguage from Boto3 Comprehend client.

You can also use Amazon Translate to determine the source language of the text, making an internal call to Amazon Comprehend to determine the source language, I'll explain how.

  • Text (string): The text to translate.
  • SourceLanguageCode (string): One of the supported language codes for the source text, if you specify auto, Amazon Translate will call Amazon Comprehend to determine the source language βœ….
  • TargetLanguageCode (string): One of the supported language codes for the target text.

1
2
3
4
5
6
7
8
9
10
11
12
import boto3

translate_client = boto3.client('translate')

def TranslateText (text,language):
response = translate_client.translate_text(
Text=text,
SourceLanguageCode="auto",
TargetLanguageCode=language
)
text_ready = response['TranslatedText']
return text_ready

To create the Text to Speech function you are going to use Amazon Polly, an AI service that uses advanced deep learning technologies to synthesize natural sounding human speech. It allows developers to convert text into lifelike speech that can be integrated into their applications.

In this part you'll use the Boto3 Polly client to call the APIs StartSpeechSynthesisTask, and GetSpeechSynthesisTask to retrieve information of SpeechSynthesisTask based on its TaskID. It delivers status and a link to the Amazon S3 Bucket containing the output of the task.

converts text into lifelike speech
Fig 2. Converts text into lifelike speech.

StartSpeechSynthesisTask ParametersGetSpeechSynthesisTask Parameter
  • OutputFormat(string): the value at which the result will be returned and it can be: json,mp3, ogg_vorbis or pcm.
  • OutputS3BucketName (string): It's the Amazon S3 bucket name to which the output file will be saved.
  • Text (string): The input text to synthesize.
  • Engine (string): It specifies the engine (standard or neural) for Amazon Polly to use when processing input text for speech synthesis.
  • VoiceId (string): Voice ID is used for the synthesis.
  • TaskId (string): This is the Amazon Polly generated identifier for a speech synthesis task.

Amazon Polly supports multiple languages and voices that allow synthesized speech to sound very natural and humanlike. To generate the best audio we must choose the right voice for each language. Use the following dictionaries in Python:

1
2
3
4
5
6
#Match the language code from Amazon Translate with the right voice from Amazon Polly.

def get_target_voice(language):
to_polly_voice = dict( [ ('en', 'Amy'), ('es', 'Conchita'), ('fr', 'Chantal'), ('pt-PT', 'Cristiano'),('it', 'Giorgio'),("sr","Carmen"),("zh","Hiujin") ] )
target_voice = to_polly_voice[language]
return target_voice

  • StartSpeechSynthesisTask:
1
2
3
4
5
6
7
8
9
10
11
12
13
import boto3
polly_client = boto3.client('polly')
def start_taskID(target_voice,bucket_name,text):
response = polly_client.start_speech_synthesis_task(
VoiceId=target_voice,
OutputS3BucketName = bucket_name,
OutputFormat= "mp3",
Text= text,
Engine= "standard")

task_id = response['SynthesisTask']['TaskId']
object_name = response['SynthesisTask']['OutputUri'].split("/")[-1]
return task_id, object_name
  • GetSpeechSynthesisTask:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import time
def get_speech_synthesis(task_id):
max_time = time.time() + 2
while time.time() < max_time:
response_task = polly_client.get_speech_synthesis_task(
TaskId=task_id
)
status = response_task['SynthesisTask']['TaskStatus']
print("Polly SynthesisTask: {}".format(status))
if status == "completed" or status == "failed":
if status == "failed":
TaskStatusReason = response_task['SynthesisTask']['TaskStatusReason']
print("TaskStatusReason: {}".format(TaskStatusReason))
else:
value= response_task['SynthesisTask']['OutputUri']
print("OutputUri: {}".format(value))
break

time.sleep(2)
return status

🚨Note: This application will not wait for the SpeechSynthesisTask, since the duration depends on the length of the text. GetSpeechSynthesisTask only delivers the status of the task id.

By default, the files in an S3 bucket are private, only the object owner has permission to access them. However, the object owner may share objects with others by creating a presigned URL, for the application you do it using the API GeneratePresignedUrl from Boto3 S3 client:

1
2
3
4
5
6
7
8
9
s3_client = boto3.client("s3")

def create_presigned_url(bucket_name, object_name, expiration=3600):
value = object_name.split("/")[-1]
response = s3_client.generate_presigned_url('get_object',
Params={'Bucket': bucket_name,
'Key': value},
ExpiresIn=expiration)
return response

Up to this point, you have learned how to develop the backend of an application that can take in text input, translate it into a chosen output language, and produce audio with correct pronunciation. Moving forward, our focus will shift to constructing the user interface so users can engage with the translation and text-to-speech features.

Amazon Lex is an AWS service that allows developers to build conversational interfaces for applications using voice and text. It provides the deep functionality and flexibility of natural language understanding (NLU) and automatic speech recognition (ASR) and simplifies building natural conversation experiences for applications without needing specialized AI/ML skills. It can also be integrated with mobile, web, contact center, messaging platform and other AWS services like AWS Lambda functios.

Lex, has the following components:

ComponentDescripcionValue
LanguageYou can select any Languages and supported by Amazon Lex V2.English (US)
IntentAn intent represents an action that the user wants to performTranslateIntent
Slot typesAllow a Lex bot to dynamically collect data from users during a conversational flow in order to complete actions and provide customized responses. There are built-in slot types and Custom Slot Types. In this tutorial you're going to create custom slots.
  • text_to_translate: the text you want to translate.
  • language: the language into which you want to translate the text.
UtterancesIndicate the user's intent, activate the chat. It should be in the language of the chatbot
  • I want to translate
  • I want to do a translation
  • I want to translate to {language}

Follow the steps below to set up Amazon Lex on the console:

  1. Sign in to the AWS Management Console and open the Amazon Lex console
  2. Follow the instructions To create an Amazon Lex V2 bot (Console) for the Creation method chosse Create a blank bot and To add a language to a bot

The next step is to create the flow of the conversation using the information from the components.

The language already selected in the previous step, so change the name of Intent in Intent details -> Intent Name and then Save Intent, now create the Slots type (text_to_translate, language).

  1. In the left menu, choose Slot types, then Add slot type and select Add blank slot type.
  2. Complete with the following parameters for each slot type:
Parameterlanguagetext_to_translate
Slot type namelanguagetext_to_translate
Slot value resolutionbuilt-in slot typeRestrict to slot values
Slot type values[0-9][a-z][A-Z]Value: Language Code - new Value: Language (Find Language code and Languages values here) - Add as many as you want
Image resultSlot language
Fig 3. Slot language type.
Slot language
Fig 4. Slot text_to_translate type.

Configure the Translate attempt to fulfill a user's request to make a translation with the following values:

  • Sample utterances: Type the values of Utterances
  • Slots: Select Add Slot and complete the following information:
Parameterlanguagetext_to_translate
Namelanguagetext_to_translate
Slot typelanguageRestrict to slot values
PromptsWhat language do you want to translate?What do you want to translate? Just type and I'll tell you the language... magic!
Image resultSlot language
Fig 5. Slot language required for intent fulfillment.
Slot language
Fig 6. Slot language required for intent fulfillment.

🚨Important: The order matters, make sure that language type is the first slot to be required.

This bot will invoke a Lambda function as a dialog code hook, to validate user input and to fulfill the intent. For that, select Use a Lambda function for initialization and validation (Fig. 7).

Slot language

Fig 7. Use a Lambda function for initialization and validation.

To finish creating the chatbot, press Save intent and then Build in the top left.

πŸ‘©πŸ»β€πŸ’»Note: When you build a Lex bot, you are re-training the bot with updated configurations and logic, allowing it to learn from the new parameters.

The interaction from backend to frontend will be handled through specific states called Dialog Action. It refers to the next action that the bot must perform in its interaction with the user. Possible values are:

  • ConfirmIntent - The next action is asking the user if the intent is complete and ready to be fulfilled. This is a yes/no question such as "Place the order?"
  • Close - Indicates there will not be a response from the user. For example, the statement "Your order has been placed" does not require a response.
  • Delegate - The next action is determined by Amazon Lex.
  • ElicitIntent - The next action is to determine the intent that the user wants to fulfill.
  • ElicitSlot - The next action is to elicit a slot value from the user.

Slots created
Fig 8. Conversation flow.
  1. The user starts the Intent, triggering the Lambda Function which is always listening. Lambda does not receive the expected language value, so it will Delegate Lex to continue handling the conversation by eliciting the language slot.
  2. The user provides the language and Lex interprets the value as a language code. The Lambda Function sees the language value and asks Lex to ElicitSlot text_to_translate.
  3. The user provides the text to translate. Since the text is variable and unpredictable, Lex cannot interpret it as the text_to_translate value. So the Lambda Function interprets the text insted Lex and starts the translation and text-to-speech process. Finally, it replies to Lex with an ElicitIntent containing the translated text and a pre-signed link.

To integrate the backend and frontend, the Lambda Function needs to interpret the format of the Lex output events, which get passed to the Lambda function as input events. The Interpreting the input event format developer guide provides more details. In this tutorial, you will learn how to extract the necessary information from the input events to get the translation application running.

To begin, the Lambda Function must extract the values in interpretations and with possible matches to the user's utterance:

1
2
3
4
5
6
def get_intent(intent_request):
interpretations = intent_request['interpretations'];
if len(interpretations) > 0:
return interpretations[0]['intent']
else:
return None;

interpretation is an array with values of the slots and the state of the conversation.

To extract the values of slots interpreted by Lex you use the following function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
def get_slot(slotname, intent, **kwargs):
try:
slot = intent['slots'].get(slotname)
if not slot:
return None
slotvalue = slot.get('value')
if slotvalue:
interpretedValue = slotvalue.get('interpretedValue')
originalValue = slotvalue.get('originalValue')
if kwargs.get('preference') == 'interpretedValue':
return interpretedValue
elif kwargs.get('preference') == 'originalValue':
return originalValue
# where there is no preference
elif interpretedValue:
return interpretedValue
else:
return originalValue
else:
return None
except:
return None

To maintain the dialogue between the Lambda Function and Lex it is necessary to know the context that a user is using in a session (activeContexts) inside of the state of the user's session (sessionState) value. To get it, use:

1
2
3
4
5
def get_active_contexts(event):
try:
return event['sessionState'].get('activeContexts')
except:
return []

You need the session-specific context information sessionAttributes:

1
2
3
4
5
def get_session_attributes(event):
try:
return event['sessionState']['sessionAttributes']
except:
return {}

To send the DialogAction state use this function definition:

DialogueAction State Function Definition
Delegate
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
def remove_inactive_context(context_list):
if not context_list:
return context_list
new_context = []
for context in context_list:
time_to_live = context.get('timeToLive')
if time_to_live and time_to_live.get('turnsToLive') != 0:
new_context.append(context)
return new_context
def delegate(active_contexts, session_attributes, intent):
print ('delegate!')
active_contexts = remove_inactive_context(active_contexts)
return {
'sessionState': {
'activeContexts': active_contexts,
'sessionAttributes': session_attributes,
'dialogAction': {
'type': 'Delegate'
},
'intent': intent,
'state': 'ReadyForFulfillment'
},
}
ElicitSlot
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
def elicit_slot(slotToElicit, active_contexts, session_attributes, intent, messages):
intent['state'] = 'InProgress'
active_contexts = remove_inactive_context(active_contexts)
if not session_attributes:
session_attributes = {}
session_attributes['previous_message'] = json.dumps(messages)
session_attributes['previous_dialog_action_type'] = 'ElicitSlot'
session_attributes['previous_slot_to_elicit'] = slotToElicit

return {
'sessionState': {
'sessionAttributes': session_attributes,
'activeContexts': active_contexts,
'dialogAction': {
'type': 'ElicitSlot',
'slotToElicit': slotToElicit
},
'intent': intent
},
}
ElicitIntent
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
def elicit_intent(active_contexts, session_attributes, intent, messages):
intent['state'] = 'Fulfilled'
active_contexts = remove_inactive_context(active_contexts)
if not session_attributes:
session_attributes = {}
session_attributes['previous_message'] = json.dumps(messages)
session_attributes['previous_dialog_action_type'] = 'ElicitIntent'
session_attributes['previous_slot_to_elicit'] = None
session_attributes['previous_intent'] = intent['name']

return {
'sessionState': {
'sessionAttributes': session_attributes,
'activeContexts': active_contexts,
'dialogAction': {
'type': 'ElicitIntent'
},
"state": "Fulfilled"
},
'requestAttributes': {},
'messages': messages
}

With the backend and frontend functions built, it's time to integrate them!

With all the code built up in the previous parts, you are going to assemble the Lambda Handler as follows.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
def lambda_handler(event, context):
print(event)
#Lambda Function Input Event and Response Format
interpretations = event['interpretations']
intent_name = interpretations[0]['intent']['name']
intent = get_intent(event)
#need it to Response Format
active_contexts = get_active_contexts(event)
session_attributes = get_session_attributes(event)
previous_slot_to_elicit = session_attributes.get("previous_slot_to_elicit") #to find out when Amazon Lex is asking for text_to_translate and join the conversation.
print(session_attributes)

if intent_name == 'TranslateIntent':
print(intent_name)
print(intent)
language = get_slot('language',intent)
text_to_translate = get_slot("text_to_translate",intent)
print(language,text_to_translate)

if language == None:
print(language,text_to_translate)
return delegate(active_contexts, session_attributes, intent)

if (text_to_translate == None) and (language != None) and (previous_slot_to_elicit != "text_to_translate"):
print(language,text_to_translate)
response = "What text do you want to translate?"
messages = [{'contentType': 'PlainText', 'content': response}]
print(elicit_slot("text_to_translate", active_contexts, session_attributes, intent, messages))
return elicit_slot("text_to_translate", active_contexts, session_attributes, intent, messages)

if previous_slot_to_elicit == "text_to_translate":
print("diferente a none")
text_to_translate = event["inputTranscript"]
text_ready = TranslateText(text_to_translate,language)
target_voice = get_target_voice(language)
object_name,task_id = start_taskID(target_voice,bucket_name,text_ready)

url_short = create_presigned_url(bucket_name, object_name, expiration=3600)

print ("text_ready: ", text_ready)
status = get_speech_synthesis(task_id)

response = f"The translate text is: {text_ready}. Hear the pronunciation here {url_short} "
messages = [{'contentType': 'PlainText', 'content': response}]

print(elicit_intent(active_contexts, session_attributes, intent, messages))
return elicit_intent(active_contexts, session_attributes, intent, messages)

🚨Important: Import the necessary libraries, define the name of the bucket_name and initialize the clients for Boto3 from the AWS services, and Deploy to save.

To allow the Lambda function to invoke AWS services and resources, an execution role with the required permissions must be created. Follow these steps to create it:

  1. Open the Functions page of the Lambda console, and choose the name of a function.
  2. Choose Configuration, and then choose Permissions, then click on the Role name (Fig. 9).
Slots created
Fig 9. Role name.
  1. In the Identity and Access Management (IAM) console, go to Add Permision --> Create inline policy
  2. Select JSON, in the Policy editor. Copy this JSON:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"polly:SynthesizeSpeech",
"polly:StartSpeechSynthesisTask",
"polly:GetSpeechSynthesisTask",
"comprehend:DetectDominantLanguage",
"translate:TranslateText"
],
"Resource": "*"
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::YOU-BUCKET-NAME/*",
"arn:aws:s3:::YOU-BUCKET-NAME"
]
}
]
}

🚨Important: Replace YOU-BUCKET-NAME with your bucket name.

  1. Select Next, write the Policy name and then Create policy.

To trigger a Lambda function when a user interacts with Amazon Lex, you attach the function to a bot alias by following these steps:

  1. Open the Amazon Lex console choose the name of the bot that created in Part 2.
  2. In the left panel choose Aliases, choose the name of the alias (Fig. 10).
Choose Aliases name
Fig 10. Choose Aliases name.
  1. From the list of supported languages, choose the language (English (US)).

  2. Choose the name of the Lambda function to use, then choose the version or alias of the function and choose Save.

The Lambda function is invoked for initialization, validation, and fulfillment.
Fig 11. The Lambda function is invoked for initialization, validation, and fulfillment.

To finish the integration, click Build to create and configure the bot using the new logic, once the building process is finished, you can test the bot(Fig.12).

Testing the bot.
Fig 12. Testing the bot.

You now have a functional translator conversational interfaces bot with text-to-speech that you built and tested quickly using Amazon Lex. However, it is only accessible through the console and you have worked with a draft version.

Draft is the working copy of your bot. You can only update the Draft version and until you publish your first version, Draft is the only version of the bot you have.

You need to create immutable versions in order to bring your bot into production. A version is a numbered snapshot of your work that you can publish for use in different parts of your workflow, such as development, beta deployment, and production.

An alias is a pointer to a specific version of a bot. With an alias, you can easily update the version that your client applications are using without having to change any code. Aliases allow you to seamlessly direct traffic to different versions as needed.

Now that I've explained what a version is, you'll learn how to create versions of your bot and how to point the alias to it.

  1. Go to your Bot, then in the left panel select Bot version and select Create version (Fig. 13).
Create a new bot version.
Fig 13. Create a new bot version.
  1. Create a Bot version and select Create.
  2. Then to Deployment --> Aliases and select Create Alias.
  3. Named de Alias and Associate with a version, choose the new version (Fig. 14), and select Create.
Associate with a version, choose the new version.
Fig 14. Associate with a version, choose the new version.

You already have everything you need to integrate your bot with messaging platforms, mobile apps, and websites, build your application by following some of these instructions:

Building this multilingual app using AWS was, hopefully, an eye-opening experience for you. By leveraging Amazon Comprehend, Amazon Translate, Amazon Polly, and Amazon Lex, you were able to create a powerful translation tool with text-to-speech capabilities in a short amount of time.

The process demonstrated how easy it is to integrate AWS AI through AWS Lambda functions. With some coding knowledge, anyone can build sophisticated applications like language translation and speech synthesis.

Experimenting hands-on is the best way to gain skills. Though translation apps already exist, creating your own solution drives learning. Building things yourself matters more than whether it's already been done.

To read more about [Amazon Polly Sample Code] visit (https://docs.aws.amazon.com/polly/latest/dg/sample-code-overall.html).