AWS Logo
Menu

DynamoDB Archival to S3 using AWS CDK

Example of archiving data to from DynamoDB to S3 for long-term storage

Published Dec 30, 2024

What's the issue?

DynamoDB is a Serverless database scales under load quickly. Recently we implemented AWS Backup to manage backups for a few large DynamoDB tables. Our AWS bill certainly scaled at a lightning fast rate that day.
AWS Backup does not currently support incremental backups. Every backup taken of a DynamoDB table is a full copy of all of the data. If you couple this with large tables, you can expect some nasty cost increases.
Let’s look at an example storing 1TB of data in DynamoDB for 1 month, taking 1 daily backup and keeping that backup for 30 days.
DynamoDB storage cost: $256 / Month
AWS Backup cost: $2,960 / Month
This backup cost does not include the cost for cross-region or cross-account backups that many organizations will want to leverage to ensure data security.

Solution

For data retention purposes, many organizations will need to hold on to data for a number of years after it is still considered “useful” to the applications it belongs to.
Following this AWS blog from a few years ago I created a proof of concept for archiving data from DynamoDB for long-term storage, with the aim of reducing backup costs.
You can view the example repository on GitHub - it is written in AWS CDK using Typescript. Disclaimer: this was a proof of concept, so the code may not be optimized or follow best practises, but it works!

How does it work?

Architecture diagram showing how the POC works
Solution Overview
The example application I created has a few extra components:
  • WriteLambda - A lambda function that writes to a DynamoDB table with a TTL
  • Event Bridge Schedule - A trigger to write a new record to the DynamoDB table every hour
The application flow consists of:
  1. Records are written to a DynamoDB table.
  2. Once the TTL is reached, the items are deleted by DynamoDB. These events are sent to Kinesis.
  3. Kinesis filters for delete events only, and only delete events that occur as part of a TTL removal.
  4. These records are sent to a Lambda function, which writes them as a JSON file to a S3 bucket.

Demo

Lambda Response after adding item to DynamoDB Table
Above you can see what response is returned when we invoke the WriteLambda through the AWS Console. It's just a default response returned after adding a row to the DynamoDB table.
Editing the TTL using the AWS Console
Above you can see how we can edit the TTL of an item in the DynamoDB console to make it expire sooner than the 1 hour we have set in the code. Using this EPOC time converter tool we can skip the 1 hour TTL delay by getting the current EPOC timestamp and updating our expirationTime field to match that.
Item Archived to S3 bucket after TTL delete completes
Above you can see the result of the archival process taking place successfully as item db8e2c07-e416-4047-9ea1-75b0c6e6010b is now appearing in our S3 bucket.

Closing Thoughts

Although I didn't quite come up with the idea for this, I couldn't find any implementation examples online so thought it would be worthwhile to share this one. If you find this useful, have a better way of doing it or have questions feel free to reach out to me!
Maybe one day we will get incremental backups for DynamoDB, but until then we should try manage costs in other ways such as above.
 

Comments