Architecting for Zero Data Loss Disaster Recovery using Amazon RDS Solutions
Exploring how Amazon RDS database offerings can assist with achieving Zero Data loss in the event of a major disaster on the live database service.
What Does Zero Data Loss Disaster Recovery (ZDLDR) Actually Mean and When Is It Needed?
Cloud Enabled vs Cloud Native Databases
Amazon RDS database Transaction Logging
High Availability vs Backups vs Disaster Recovery
Logical Replication vs Physical Replication
Database Support for Logical Synchronous Replication
RDS Custom for Oracle and SQLServer
Architecting for Zero Data Loss Conclusions
Your resiliency strategy should also include Disaster Recovery (DR) objectives based on strategies to recover your workload in case of a disaster event.
- Cloud Enabled
- Cloud Native
Database Engine | Internal Change Tracking | Database Change log |
---|---|---|
Amazon RDS for MySQL | Log Sequence Number (LSN) | Binary/Redo Log |
Amazon RDS for PostgresSQL | Log Sequence Number (LSN) | Transaction Log |
Amazon RDS for MariaDB | Log Sequence Number (LSN) | Binary/Redo Log |
Amazon RDS for Oracle | System Change Number (SCN) | Redo Log |
Amazon RDS for SQL Server | Log Sequence Number (LSN) | Transaction Log |
Backup | Cadence |
---|---|
Full Physical | One off Initial Backup |
Incremental Physical | Daily Backup Window |
Cloud Enabled Transactional Logical | Every 5 Minutes |
Cloud Native Transactional Logical | Every 5 Minutes |
As stated Aurora backups are every 5 minutes so there could be 5 minutes data loss. But as Aurora is highly available running normally over 3 Availability Zones
there is less chance that 5 minute backup cadence is an issue or will be prevented due to Aurora storage being unavailable.
Replication Method | Description |
---|---|
Asynchronous | Send changes to a second site but do not wait for them to be applied |
Synchronous | Send changes to a second site but wait for them to be applied |
Semi-Synchronous | Send changes to two sites but wait for them to be received by at least one of the sites |
Protection Groups | Send data in each protection group to be replicated six times spanning three AZs in the same region |
- Synchronous (SYNC) where we wait for data to be committed on both the primary RDS database in its AZ and standby RDS database in a different AZ before continuing.
- Asynchronous(ASYNC) where we send the data from the primary RDS database in its AZ to the standby RDS database in a different AZ but do not wait for data to be committed.
- Semi-Synchronous which requires 2 replicas to support the primary database, the commit is only considered persisted when 1 of the 2 replicas has confirmed the changes have been received. Note we say received, not applied - so in theory a double outage of both the primary and replica in receipt of change may still have a chance of data loss. This is also referred to as a Multi AZ DB cluster.
- Protection Groups where the decoupled storage performs replication with no direct waiting by the compute tier.
- Using a physical DR copy, where the replication is at the physical Level tracking disk block changes. This type of replication is always synchronous. Within Amazon RDS this is referred to as Multi AZ replication and is only supported using Multi AZ not Multi Region. One of the drawbacks of physical replication is that it will replicate disk corruptions. Persisted disk corruptions should be considered a very rare occurrence, but nevertheless they need to be catered for. As physical replication uses SYNC replication we can assume an RPO of 0 when not dealing with disk corruptions.
- Using a logical DR copy, where the replication is at the database level tracking transactional changes. This can be either synchronous, asynchronous, or semi-synchronous where supported. One of the key advantages of logical replication is that it supports both cross AZ and cross region replication. Logical database copies are referred to as replica copies. For a logical DR copy, disk corruption should not be replicated. This is key point for any ZDLDR solution. If the logical replication is asynchronous, it has the potential to drift (lag) behind the primary site, but this doesn't necessarily mean data loss if site switches are planned or graceful.
- Using Amazon Aurora storage cluster, where the replication is handled by the intelligent storage tier mirroring multiple copies of data across protection groups and all available AZs. Here the physical replication must be replicated at least 4 ways across all available AZs to be considered updated and eventually 6 ways, using what's called a 4 of 6 Quorum for writes. Due to the always synchronous storage replication of Aurora, we can assume an RPO of 0. We should note that Amazon Aurora storage replication is only supported within the same region, though cross regions replication is supported asynchronously. Disk corruptions present on Amazon Aurora will self heal from one of the healthy copies of the data, and there will be a minimum of 4 copies and eventually 6, so this is very effective against disk corruption.
- Asynchronous for cloud enabled RDS replica's RDS Oracle, RDS Postgres, RDS MySQL and RDS MariaDB.
- Semi-Synchronous if using RDS Postgres or RDS MySQL with 2 replicas.
- Synchronous using RDS SQL Server, which utilises a feature called 'Always On' or 'Mirroring' where the replication to the secondary site is a logical synchronous apply under the covers.
- Synchronous using Oracle RDS Custom, where we are able to access the underlying operating system and also make certain customisations that are not possible under normal Amazon RDS Regular. One of the customisations that is possible for Oracle is to set up for logical synchronous replication.
Database Engine | Storage Sub System | Multi AZ Physical Block Replication Support (SYNC) | Multi AZ DB Replica Logical Replication Support (SYNC) | Multi AZ DB Replica Logical Replication Support (SEMI-SYNC) |
---|---|---|---|---|
Amazon Aurora MySQL | Multi AZ Clustered | Implicit | No | No |
Amazon Aurora Postgres | Multi AZ Clustered | Implicit | No | No |
Amazon RDS for MySQL | Single AZ Striped | Yes | No | Yes |
Amazon RDS for PostgresSQL | Single AZ Striped | Yes | No | Yes |
Amazon RDS for MariaDB | Single AZ Striped | Yes | No | No |
Amazon RDS for Oracle | Single AZ Striped | Yes | No | No |
Amazon RDS for SQL Server | Single AZ Striped | Yes | No | No |
Amazon RDS Custom Oracle | Single AZ Striped | No | Yes | Yes |
Amazon RDS Custom SQL Server | Single AZ Striped | Yes | Yes | No |
Service | Description |
---|---|
Amazon RDS | Database s/w running |
EC2 | Compute that the database s/w runs on |
EBS | Storage that holds the data and s/w |
KMS | keys to support database encryption |
S3 | storage to support Database backups |
Engine | Zero Data loss Disaster Recovery |
---|---|
Amazon RDS MySQL | Possible to achieve ZDLDR using Semi-Synchronous Replication of a Logical DR t o 2 sites |
Amazon RDS MariaDB | Not Possible as Logical Synchronous Replication is not Supported Physical replication could be susceptible to Disk Corruption |
Amazon RDS PostgreSQL | Possible to achieve ZDLDR using Semi-Synchronous Replication of a Logical DR t o 2 sites |
Amazon RDS Oracle EE | Not Possible as Logical Synchronous Replication is not Supported Physical replication could be susceptible to Disk Corruption |
Amazon RDS Oracle Custom EE | Possible to achieve ZDLDR using Synchronous Replication of a Logical DR site |
Amazon RDS Oracle SE2 | Not Possible as Logical Synchronous Replication is not Supported Physical replication could be susceptible to Disk Corruption |
Amazon RDS SQL Server SE | Possible to achieve ZDLDR using Synchronous Replication of a Logical DR site |
Amazon RDS SQL Server EE | Possible to achieve ZDLDR using Synchronous Replication of a Logical DR site |
Amazon RDS SQL Server SE Custom | Possible to achieve ZDLDR using Synchronous Replication of a Logical DR site |
Amazon RDS SQL Server EE Custom | Possible to achieve ZDLDR using Synchronous Replication of a Logical DR site |
Amazon Aurora PostgreSQL | ZDLDR is supported out of the box due to decoupled clustered storage |
Amazon Aurora MySQL | ZDLDR is supported out of the box due to decoupled clustered storage |
The solution to architecting for ZDLDR is the use of database replica's which support synchronous, semi-synchronous or Amazon Aurora storage replication.
- Documentation
- White Papers
- Videos
- How To Guides
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.