
Automating Knowledge Management with Amazon BDA
Amazon Bedrock Data Automation extracts insights from video, audio, and text, enabling multimodal knowledge management and contextual search using RAG.
Published Mar 10, 2025
Amazon Bedrock continues to innovate, enabling developers to build intelligent workflows with ease. One of its latest features, Bedrock Data Automation (BDA), empowers organizations to automate data processing and integrate AI-driven insights into their workflows. In this post, we will explore how BDA can be used to enhance knowledge management by transforming meeting recordings into actionable insights stored in a knowledge database.
Imagine a team recording their weekly meetings in Microsoft Teams. These recordings contain valuable discussions, decisions, and deadlines that need to be easily accessible later. With Amazon Bedrock Data Automation, these recordings can be transformed into actionable insights stored in a searchable knowledge database.
The architecture for this solution revolves around automating the process of ingesting meeting recordings from Microsoft Teams, extracting insights using Amazon Bedrock models, and storing these insights in a vector database for retrieval and contextual analysis. Below is a high-level view of the architecture:

- Microsoft Teams Recording: Meetings are recorded in Teams, capturing both audio and video content.
- AWS Lambda: A Lambda function retrieves meeting recordings and initiates the Bedrock Data Automation workflow.
- Amazon Bedrock Data Automation (BDA): Processes the recordings, extracting insights from conversations, video content, and images.
- Amazon S3: Stores processed outputs from BDA for further use.
- Amazon Bedrock Knowledge Base: Pushes insights into a vector database for efficient storage and retrieval.
- Retrieval-Augmented Generation (RAG): Enables contextual queries by retrieving relevant information from the knowledge base using Amazon Bedrock models.
- Recording Retrieval: The process begins with an AWS Lambda function retrieving the meeting recording from Microsoft Teams.
- Data Processing with BDA: Amazon Bedrock Data Automation processes the recording:
- Extracts text from audio (transcript generation).
- Analyzes video content for additional context (e.g., shared slides or visual cues).
- Identifies key insights such as deadlines, action items, or important discussions.
BDA output transcript
- Storage in Knowledge Base: Insights are pushed into a vector database within the Amazon Bedrock Knowledge Base for efficient storage and retrieval.
- Contextual Queries with RAG: Users can query the knowledge base using natural language prompts like:
- "What was the last deadline we mentioned in the last meeting?"
- "What were the key action items discussed last week?"
The system retrieves relevant context using RAG and provides accurate responses powered by Amazon Bedrock models.
Start by cloning the Amazon Bedrock Data Automation GitHub repository
The repository is a simple “bda_manager.py” script to demonstrates how to set up, manage, and run AWS Bedrock Data Automation (BDA) projects. It includes functionality to create a new project, invoke inference on video or image data, retrieve resulting metadata, and process those results. I
- Bedrock Data Automation is currently available only in the us-east-1 (N. Virginia) region.
- You must have a valid S3 bucket (bda_bucket_name) ready for input and output artifacts (You can create it however you want, either through the console or using Boto3.)
- Before using this script, create a BDA project at least once through the AWS Console. This ensures that the required service roles (such as data_automation_profile) are created and available.
- Ensure you have Python 3.7+ installed.
- Install required dependencies: pip install boto3 awswrangler pandas
- Provide valid AWS credentials and region in your environment (e.g., using AWS CLI, environment variables, credentials file, etc.).
- Update the main() function parameters with your own:
- Project name and description
- Input S3 URI (e.g., an image or video file)
- Output S3 URI
- Data automation profile ARN
- Run the script: python bda_manager.py
- Once run, the script will:
- Create or retrieve the specified BDA project
- Trigger the data automation pipeline
- Periodically check and print status updates
- Upon completion, retrieve S3 output, process results that are stored into a parquet file or CSV if needed, and store them back in an S3 location for future reference
The code repository serves as a starting point. To deploy a complete end-to-end solution using Infrastructure as Code (IaC), you can integrate this script into a broader architecture. For guidance on this, refer to my previous blog post. Lightweight Multi-Agent Framework for Amazon Bedrock