Data for SaaS : With Aurora DSQL and S3 Tables

As we move into 2025, we're excited to build upon the momentum from our successful S.C.A.L.E series, which has helped architects navigate the complex world of SaaS data architecture decisions. This year brings transformative developments in the AWS data landscape, particularly with game-changing releases like Amazon Aurora Distributed SQL (DSQL) and S3 Tables. Aurora DSQL is revolutionizing how SaaS applications handle distributed data with its ability to maintain strong consistency across regions, while S3 Tables is simplifying data lake management with native Apache Iceberg support and ACID transactions.

In the coming months, we'll dive deep into the intersection of Generative AI and SaaS data architectures, exploring how services like Amazon Bedrock and Amazon Q can be integrated into multi-tenant data solutions. We'll also continue our S.C.A.L.E framework discussions, with practical implementations showcasing how modern AWS services address traditional database trade-offs in scalability, consistency, availability, latency, and evolution. Stay tuned for hands-on tutorials, architectural patterns, and best practices that will help you build resilient, scalable SaaS applications in this exciting era of AI-powered data solutions.

Building on the feedback to our S.C.A.L.E Season 1, we're happy to announce Season 2 will focus on practical implementations using cutting-edge AWS services like Aurora DSQL, S3 Tables, DynamoDB, Bedrock Knowledge Bases etc. Our upcoming episodes will showcase real-world architectures where these services are combined with GenAI capabilities, demonstrating how modern SaaS applications can leverage the perfect balance of our S.C.A.L.E framework principles.

Join us on this journey as we explore the evolving landscape of SaaS data architecture in 2025!

re:Invent and recent release highlights

This would be a very long post if I was to highlight all the exciting things that came out of re:Invent and in the new year. Here are a few relevant data-themed releases:

Amazon Aurora DSQL - Serverless, distributed SQL. Particularly interesting for multi-region deployments requiring strong consistency

Amazon S3 Metadata - Amazon S3 Metadata automatically captures and makes queryable metadata from S3 objects in near real-time, supporting both system-defined and custom metadata.

**Amazon S3 Tables ****and **1 million buckets per account - Being able to create more buckets and assign structured metadata enables better isolation and simplifies organizing customer data. With a larger bucket limit, SaaS providers can segment tenants or data sets more granularly, improving operational efficiency.

Amazon DynamoDB global tables multi-Region strong consistency- Until now, DynamoDB global tables always used eventual consistency. Now you can perform strongly consistent reads across AWS regions.

Blogs and Samples

Blog: Build a managed transactional data lake with Amazon S3 Tables: AWS introduced Amazon S3 Tables at re:Invent 2024 as the first cloud object store with built-in Apache Iceberg support, designed to simplify storing and managing tabular data at scale. S3 Tables offers automatic table maintenance, enhanced security through fine-grained IAM permissions, and up to 3x faster query performance compared to storing Iceberg tables in general purpose S3 buckets. The blog post demonstrates how to build a managed transactional data lake using S3 Tables with Apache Spark on Amazon EMR, including setup instructions, data loading, and performing Apache Iceberg queries.

Blog: Multi-tenant RAG with Amazon Bedrock Knowledge Bases: AWS introduced Amazon S3 Tables at re:Invent 2024 as the first cloud object store with built-in Apache Iceberg support, designed to simplify storing and managing tabular data at scale. S3 Tables offers automatic table maintenance, enhanced security through fine-grained IAM permissions, and up to 3x faster query performance compared to storing Iceberg tables in general purpose S3 buckets. The blog posts demonstrate two different use cases: building a managed transactional data lake using S3 Tables with Apache Spark on Amazon EMR, and implementing multi-tenant RAG (Retrieval Augmented Generation) architectures using Amazon Bedrock Knowledge Bases with three distinct patterns - silo, pool, and bridge - each offering different levels of tenant isolation, variability, management simplicity, and cost-efficiency.

Podcast: Scaling your relational database on AWS - Are relational databases cool again? Are relational databases cool again? In this episode of the AWS Podcast, host Simon Elisha sits down with Josh Hart, Principal Solutions Architect at AWS, to explore how traditional databases are getting a modern makeover. They dive into three game-changing innovations from AWS that are solving age-old database scaling headaches: Aurora Serverless v2 for seamless vertical scaling, Aurora Limitless Database for hassle-free sharding, and the RDS Data API for simplified connection management. Whether you're wrestling with dev/test environments or running production workloads at scale, this episode unpacks practical solutions that could save you time, money, and operational headaches.

Samples: Stream DynamoDB data to S3 Tables using Kinesis Firehose Delivery: This sample illustrates an approach on how to stream data from DynamoDB table to S3 Tables in near real-time using Amazon Kinesis Stream and Kinesis Firehose. Once the data is in S3 Tables, it can be queried using Athena for your analytics purposes.

Additional Recent Data for SaaS Contents

All the latest announcements from re:Invent that are relevant to data architectures on AWS for SaaS builders: re:Cap 2024
Check out more examples of building modern data architectures on AWS in the AWS Data for SaaS patterns repository on GitHub (https://github.com/aws-samples/data-for-saas-patterns)
Head over to YouTube to find more on-demand content for scaling databases (https://www.youtube.com/playlist?list=PLoqD0z_296PbKATwUcaGowmOJwlE_2ysP)

S.C.A.L.E Series 1 videos on YouTube now!

🚀 Episode 1: Introduction - Walk through of S.C.A.L.E and what to expect from the series
🧮 Episode 2: Scaling relational databases for SaaS - I take a dive into the vertical and horizontal scaling strategies for relational databases
💡 Episode 3 - Why Consistency matters in relational databases - Venkatesh explores the concept of consistency and how this translates into modern data architectures
🧮 Episode 4 - Tenant backup and restore - Extracting individual tenants for backup and restore purposes is a common ask of SaaS applications. Dave shows us how!
🔭 Episode 5 - Tenant performance management and telemetry
🚤 Episode 6 - Data Lake Evolution
🧮 Episode 7 - Multi Tenancy with Vector Databases (if you're interested in this topic, check out the new sample we just added on GitHub for multi-tenant OpenSearch as a vector store!)
↔️ Episode 8 - Relational Database Connection Management for SaaS
💡 Episode 9 - re:Cap 2024: All the latest announcements from re:Invent that are relevant to data architectures on AWS for SaaS builders

External relevant SaaS reading

How SaaS remote deployment can destroy your value proposition? Here

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Select your cookie preferences

Site Terms, Privacy, and more.