logo
Menu
Automate RAG Knowledge Base ingestion with Lambda and CDK

Automate RAG Knowledge Base ingestion with Lambda and CDK

Automate your CV Chatbot's data ingestion quickly and simply!

Published Aug 7, 2024
In a recent article, I showed how to create a CV chat bot for your website using AWS Bedrock, Lambda and CDK. you can catch up on that article here. While the system worked simply, what if we need to update our CV? Would we need to create a whole new bot, or manually update the Knowledge Base? Luckily that is not the case, we can automate the ingestion using Lambda, triggered every time we upload an appropriate file to our S3 bucket. Here's how :
First we need our Lambda. It's a small and simple file:
We are using the @aws-sdk/client-bedrock-agent module to interact with Bedrock, the aws-lambda for typing of the event and uuid to create a unique id string. Make sure you install all of these using npm i @aws-sdk/client-bedrock-agent aws-lambda uuid .
The KNOWLEDGE_BASE_ID and DATA_SOURCE_ID will be passed into the Lambda as environment variables, which we will add in CDK shortly, along with filtering the types of files we want to trigger the Lambda.
Now we need to add the Lambda to our CDK, inside the cv-bot-stack.ts we previously created:
We've created our lambda using the aws-cdk-lib/aws-lambda-nodejs module, as we did before with our retrieval and generation lambda in the previous article. We add S3 as a trigger to kick off our Lambda, filtering based on the file name (all files starting with 'CV'), and add the relevant permissions for the Lambda to start the ingestion job. Deploy the CDK and we're done!
You can now upload any relevant file to your S3 bucket and watch as the Data Source for your Knowledge Base is automatically synced.
The upload page for an S3 Bucket
Upload your CV to kick off the sync
Sync History for a Data Source showing a recent sync
Synincg is automatically started when you upload a file

 

Comments