
Processing WhatsApp Multimedia with Amazon Bedrock Agents: Images, Video, and Documents
Build a WhatsApp AI assistant using Amazon Bedrock and Amazon Nova models to processes multimedia content such as images, videos, documents, and audio. This serverless solution uses AWS End User Messaging for direct integration.
Your data will be securely stored in your AWS account and will not be shared or used for model training. It is not recommended to share private information because the security of data with WhatsApp is not guaranteed.
You can see the animated demo in the original repository: private-assistant-v2/README.md
- AWS CLI configured with appropriate permissions.
- Python 3.8 or later.
- AWS AWS Cloud Development Kit (CDK) v2.172.0 or later.
whatsapp_in
: Processes incoming WhatsApp messages.transcriber_done
: Handles completed transcription jobs.bedrock_agent
: Invokes the Amazon Bedrock Agent.
- Bucket for storing media files (voice, image, video, document).
messages
: Stores WhatsApp message dataagenthistory
: Stores conversation history for the Amazon Bedrock Converse API, when image, documents and video is processing.
- Topic for receiving WhatsApp events.
- Roles and policies for Lambda functions and Bedrock Agent.
- Agent configured for processing messages.
- Converse API invocation for processing documents, images and videos.
- Agent Alias for versioning.
- Used for transcribing audio messages.
- Natively links WhatsApp Business Account (WABA) and AWS account.
PrivateAssistantV2Stack
class within the private_assistant_v2_stack.py
file.- User sends a WhatsApp message.
- Message is published to the SNS Topic.
whatsapp_in
AWS Lambda function is triggered.- Message is processed based on its type:
- Text: Sent directly to Amazon Bedrock Agent.
- Audio: Transcribed using Amazon Transcribe, once the transcribe job is done.
transcriber_done
Lambda function is triggered and then sent the text to Amazon Bedrock Agent. - Image/Video/Document: Stored in S3, then analyzed by Amazon Bedrock Agent converse API, save the input and response as ConversationHistory Contents in an AgentHistory Amazon DynamoDB table.
bedrock_agent
Lambda function processes the message and generates a response- Response is sent back to the user via WhatsApp.
app.py
: Entry point for the CDK application.private_assistant_v2_stack.py
: Main stack definition for the AI assistant.lambdas/code/
: Contains Lambda functions for processing WhatsApp messages, invoking Bedrock Agent, and handling transcriptions.layers/
: Contains shared code and dependencies for AWS Lambda functions.agent_bedrock/create_agent.py
: Defines the Bedrock Agent configuration.
git clone https://github.com/build-on-aws/building-gen-ai-whatsapp-assistant-with-amazon-bedrock-and-python cd private_assistant_v2
python3 -m venv .venv source .venv/bin/activate # On Windows, use `.venv\Scripts\activate`
pip install -r requirements.txt
cdk synth
cdk deploy
Note the output values, especially the SNS Topic ARN, which will be used for configuring the WhatsApp integration.
You can also follow the more detailed steps in Automate workflows with WhatsApp using AWS End User Messaging Social blog.
agent_data.json
file in the private_assistant_v2/
directory to customize the Bedrock Agent's behavior.- Adjust environment variables in
private_assistant_v2_stack.py
if needed, such as S3 bucket prefixes or DynamoDB table names.
- Send a WhatsApp message to the configured phone number.
- The message will be processed by the
whatsapp_in
Lambda function. - For text messages, the Bedrock Agent will be invoked directly.
- For audio messages, they will be transcribed using Amazon Transcribe before being sent to the Bedrock Agent through the
bedrock_agent
Lambda function. - For images, videos, and documents, they will be stored in S3 and analyzed by the Bedrock converse API through the
bedrock_agent
Lambda function. - The assistant's response will be sent back to the user via WhatsApp.
- Delete the files from the Amazon S3 bucket created in the deployment.
- Run this command in your terminal:
cdk destroy
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.