Mental model: Zero-ETL integrations between Amazon Redshift and Aurora PostgreSQL

Introduction

Amazon Redshift's Zero-ETL integrations enable seamless, near real-time analytics and machine learning on transaction and operational data from compatible Aurora PostgreSQL sources, eliminating the need for traditional time-consuming ETL pipelines. This integration feature automates the process of replicating data from the source to Amazon Redshift, streamlining data management and reducing complexity.

There are two type of seamless no-code integrations between Amazon Redshift and Amazon Aurora PostgreSQL:

Federated Queries: This allows to connect live data sources directly in Amazon Redshift for real-time reporting and analysis. Data resides in Aurora PostgreSQL, and you use a federated query for in-place transformation and querying of data assets.
Zero-ETL: This makes Aurora PostgreSQL data available in Amazon Redshift through a continuous replication mechanism within seconds, enabling access to more accurate and timely insights.

The image below illustrates the conceptual model of how Zero-ETL is organized from various perspectives. Before enabling Zero-ETL integration, it requires to set up the integration source and target with the required parameters and permissions. After completing the initial load, data gets replicated to the Amazon Redshift data warehouse in near real-time as per specified refresh interval, and it's ready for querying. You can monitor data sync from the Table statistics tab from the Amazon Redshift console.

Image not found

Figure 1: Zero-ETL between Amazon Aurora PostgreSQL and Amazon Redshift

The next image illustrates various options or working with replicated data through Zero-ETL and how it compares with the capabilities of Federated Queries:

Image not found

Figure 2: Comparison of Zero-ETL and Federated Query features

Filtering

You can use data filtering for Aurora zero-ETL integrations to define the scope of replication from the source Aurora to the target Amazon Redshift data warehouse. Rather than replicating all data to the target, you can define one or more filters that selectively include or exclude certain tables from being replicated. Only filtering at the database and table level is available for zero-ETL integrations. Data filtering can be useful when you want to:

Join certain tables from two or more different sources and you don't need complete data from either cluster.
Save costs by performing analytics using only a subset of tables rather than an entire fleet of databases.
Filter out sensitive information — such as those containing phone numbers, addresses, or credit card details.

Note: You can not filter by columns or rows.

Transformations

Zero-ETL integration replicates data from transactional data stores to Amazon Redshift without applying transformations during the process. Data is replicated as-is from the source database to the destination database. However, once the data is in Amazon Redshift, you can apply transformations on the replicated data using either materialized views or stored procedures to further process and analyze the replicated data or leverage machine learning capabilities of Redshift ML to gain valuable insights from your data.

Materialized views

When many users run the same query regularly or queries contain complex transformation, the query content must be retrieved again and again for each execution. The recommended practice is to consider caching frequent queries in Amazon Redshift using a materialized view.

The destination database is read-only and does not allow to create tables, views, or materialized views.

But materialized views with data replicated from zero-ETL database are supported when used either through cross-database queries or by creating an external schema referencing the destination database (refer to the Figure 2). Refreshing materialized views with data replicated from zero-ETL integrations leads to a full refresh of the view. You can then schedule the refresh of the materialized view to happen at a specific time, depending upon the change rate.

Note: Auto and incremental refresh are not supported for materialized views on external tables.

Runtimes

Aurora PostgreSQL zero-ETL integration with Amazon Redshift is available for Aurora provisioned clusters and Amazon Aurora Serverless v2 clusters.

Zero-ETL integration can be used to send data to Amazon Redshift Serverless workgroups and Amazon Redshift provisioned clusters using RA3 instance types.

Conclusion

This article provides a summarized overview of Zero-ETL integration between Amazon Aurora Postgres and Amazon Redshift. It equips you with essential information to quickly begin leveraging this powerful feature and unlock the value of your data.

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Select your cookie preferences

Site Terms, Privacy, and more.