Automate RAG Knowledge Base ingestion with Lambda and CDK
Automate your CV Chatbot's data ingestion quickly and simply!
Published Aug 7, 2024
In a recent article, I showed how to create a CV chat bot for your website using AWS Bedrock, Lambda and CDK. you can catch up on that article here. While the system worked simply, what if we need to update our CV? Would we need to create a whole new bot, or manually update the Knowledge Base? Luckily that is not the case, we can automate the ingestion using Lambda, triggered every time we upload an appropriate file to our S3 bucket. Here's how :
First we need our Lambda. It's a small and simple file:
We are using the
@aws-sdk/client-bedrock-agent
module to interact with Bedrock, the aws-lambda
for typing of the event and uuid
to create a unique id string. Make sure you install all of these using npm i @aws-sdk/client-bedrock-agent aws-lambda uuid
.The
KNOWLEDGE_BASE_ID
and DATA_SOURCE_ID
will be passed into the Lambda as environment variables, which we will add in CDK shortly, along with filtering the types of files we want to trigger the Lambda.Now we need to add the Lambda to our CDK, inside the cv-bot-stack.ts we previously created:
We've created our lambda using the
aws-cdk-lib/aws-lambda-nodejs
module, as we did before with our retrieval and generation lambda in the previous article. We add S3 as a trigger to kick off our Lambda, filtering based on the file name (all files starting with 'CV'), and add the relevant permissions for the Lambda to start the ingestion job. Deploy the CDK and we're done!You can now upload any relevant file to your S3 bucket and watch as the Data Source for your Knowledge Base is automatically synced.