AWS Logo
Menu
AWS Transcribe: Converting Audio Files to Text

AWS Transcribe: Converting Audio Files to Text

Turn audio files into text easily with AWS Transcribe

Published Nov 8, 2024
Last Modified Nov 12, 2024
Introduction
  • Benefits: AWS Transcribe offers scalable, accurate, and cost-effective audio transcription. It efficiently processes various audio types and supports use cases like meetings, customer calls, and medical transcriptions. With event-driven automation and seamless AWS integration, it saves time and enhances adaptability, making it ideal for businesses of all sizes. This is the first article in a three-part series that can be used on its own to solve specific business problems or as part of a complete solution.
  • Overview: If you want to transcribe audio files that include channel identification, which refers to distinguishing individual speakers in multi-audio conversations, AWS Transcribe provides robust support for this feature. However, in this example, we will not be using the channel identification option. Instead, we will discuss the use case and the architectural design needed to achieve our goal without channel identification. AWS Transcribe’s standard transcription capabilities can still effectively process audio, making it suitable for scenarios such as meeting transcriptions, call centre recordings, and more.
  • Purpose: The benefits of using the AWS Transcribe service include being a fully managed AWS solution that is cost-effective and enables the automation of audio transcription for various types of files formats (MP4e.g.,) across many industries and business use cases, such as meeting summarization, audio conversations, and other audio recordings. Additionally, Amazon Transcribe Medical is available, which is specifically trained to recognize medical vocabulary and terminology. The service offers numerous features, including custom language models, custom vocabulary, vocabulary filtering, and more. It can also be integrated seamlessly with other AWS services to enhance its capabilities.
  • Objective: In this article, I will provide a high-level overview of how to use a service based on the architecture diagram presented. I will include a few concise code samples focused on an AWS Lambda function that initiates an AWS Transcribe job, while the remaining parts of the architecture will be described in detail. This design follows an event-driven approach but can be adapted to use AWS Step Functions for orchestration if needed. I chose this design for its simplicity and straightforward pattern.
  • Security: The security consideration of any solution always need to part of the design and implementation to insure data privacy, data security, integrity using less privilege approach, to insure that data encrypted in rest and in transit. In this solution we are using services such is Amazon Transcribe that is secure service keep the data encrypted in transit, the S3 where the source and destation data is save is encrypted in rest permissions that granted to AWS Lambda function are limited and to the service that is used by the Lambada.

Architecture

  • Workflow Overview: In this workflow, the user or the system uploads audio files to an AWS S3 bucket. Amazon EventBridge is configured to detect an object creation event, which triggers an AWS Lambda function. This function initiates an Amazon Transcribe job to process the uploaded file. Once the transcription is complete, the output is stored in the S3 bucket in JSON format, enabling future use and integration with other services.
Amazon Transcribe
Services Used
  • Amazon S3: In this solution, we use two S3 buckets: one for storing audio files and another for storing the transcribed JSON files. We configure Amazon EventBridge to trigger an AWS Lambda function when an audio file is uploaded.
  • AWS EventBridge: Trigger the AWS Lambda function upon an Object Create event.
  • AWS Lambda: The function is developed in Python using the AWS SDK, Boto3. It is designed to accept a JSON payload from Amazon EventBridge and create an AWS Transcribe job. This allows seamless integration with event-driven architectures, where the function is automatically triggered upon file upload events. The Python code handles the input payload, configures the transcription parameters, and initiates the job, ensuring that each audio file is processed efficiently and stored in the specified S3 bucket. By default, an Amazon Transcribe job remains available for 90 days after completion. To delete the job upon completion, you can make an API call using the Python SDK Boto3.
  • AWS Transcribe: We use an Amazon Transcribe job for each audio file to ensure accurate and efficient transcription. Each job processes the uploaded audio file and generates a JSON output stored in the designated S3 bucket. This setup allows for scalable handling of multiple audio files, ensuring that each transcription job runs independently and integrates seamlessly with other AWS services.
  • IAM Roles: The IAM roles for the AWS Lambda function include the necessary policies to access both Amazon Transcribe and the S3 bucket.
Here is a code snippet of the AWS Lambda function for demonstration only:
Test Payload
Output JSON transcription file of the AWS Transcribe job

Conclusion

  • Summary: Audio files are uploaded to an S3 bucket, triggering an AWS Lambda function via EventBridge to start an Amazon Transcribe job. The transcribed text is stored as a JSON file in another S3 bucket for future use.
  • Next Steps: In the next blog we will use the transcribe text embed using Bedrock embedding and store in the Amazon OpenSearch vector database. This will allow to perform RAG searches again the recording files.
     

Comments