All the things that Amazon Comprehend, Rekognition, Textract, Polly, Transcribe, and Others Do
Developers are programmers, but not necessarily experts in all code-related aspects. Creating ML-dependent functions requires specific knowledge of models and algorithms that not everyone has. Fortunately, there are ready-to-use APIs that leverage pre-trained models to run ML functions without ML knowledge, securely.
- Define the input, the location of the object in an Amazon S3 bucket or text.
- Invoke the API using this input.
- Output will be in JSON format.
API Type | How you can do | Service Name |
---|---|---|
🔎 Analysis of images (.png, .jpg) /videos (.mp4) | Label detection (predefined or custom) Image properties and moderation. Facial detection, comparison and analysis. Face search People paths. Personal Protective Equipment Celebrities recognition. Text in image Inappropriate or offensive content | Amazon Rekognition |
🔎 Detection and analysis of text in documents (PNG, JPG, PDF or TIFF) | Processes individual or bundled documents. Detect typed and handwritten text Recognize documents, like financial reports, medical records, ID document (drivers licenses and passports) and tax forms. Extract text, forms, and tables from documents with structured data. | Amazon Textract |
🔎 Natural Language Processing (NLP) and text analysis | Processes documents and extracts information such as: Entities Events Key phrases Dominant language Sentiment Targeted sentiment Syntax analysis. Custom classification and entity recognition. Managing custom models. | Amazon comprehend |
🔎 Text to speech | Supports multiple languages and includes a variety of lifelike voices. Includes a number of Neural Text-to-Speech (NTTS) voices, delivering ground-breaking improvements in speech quality through a new machine learning approach, thereby offering to customers the most natural and human-like text-to-speech voices possible. Neural TTS technology also supports a Newscaster speaking style that is tailored to news narration use cases. | Amazon Polly |
🔎 Speech to Text | Convert audio (Supported formats) to text. Transcribe media in real time (streaming) or you can transcribe media files located in an Amazon S3 bucket (batch). Improve accuracy for your specific use case with language customization, filter content to ensure customer privacy or audience-appropriate language, analyze content in multi-channel audio, partition the speech of individual speakers | Amazon Transcribe |
🔎 Translate | Translate unstructured text (UTF-8) documents or to build applications that work in multiple languages | Amazon Translate |
- Upload the .mp4 video to an Amazon s3 bucket.
- A Lambda Function makes the call to Transcribe API.
- Subtitles file in the original language are downloaded to S3 Bucket.
- A Lambda Function makes the call to Translate API.
- Subtitles file in the new language is downloaded to S3 Bucket.
- Upload the document (PNG, JPG, PDF or TIFF) to an S3 Bucket.
- A Lambda Function makes the call to Textract API.
- With the response from Textract, Lambda Function makes the call to Comprehend API.
- A Lambda Function makes the call to the Translate API.
- The response is saved in an S3 bucket.
- Use case 3: Make Polly Talk 🦜
- From a Jupyter Notebook make the call to Polly API.
- Polly stores the result in a S3 bucket.
- Retrieves the audio.
- Use case 4: Video content moderation ⏯️ 🔫 🚬
- Upload the
.mp4
video to an s3 bucket. - A Lambda Function makes the call to Rekognition API.
- Once the video review is finished, a new Lambda Function retrieves the result and stores it in an s3 bucket.
- Amazon Translate Code Samples.. more code samples
- Amazon Transcribe and Amazon Comprehend Code Samples
- Amazon Polly Code Samples
- Amazon Rekognition Code Samples
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.