Near-realtime fraud detection of operational data using Zero-ETL and Redshift ML.
Fraud transaction often occurs in high concurrent operational databases(OLTP/RDBMS) used by businesses like on-line shopping, banking etc ... It is important to detect these fraud transactions quickly and take actions at the earliest.
Sherin Chandy
Amazon Employee
Published Dec 9, 2024
OLTP RDBMS systems are used heavily in high concurrent business transaction environments, performance is a key factor in these databases for business success. Integrating realtime fraud detection mechanisms using AI/ML adds performance overheads to these databases, this post shows how Zero-ETL can be used to offload threat detection to Redshift.
Similar to the existing near real time fraud detection reference architecture by using Kinesis Data Streams and Redshift ML feature, we can now run ML inference on transactional data replicated from Aurora to Redshift in near real time via zero-ETL Integration. This helps to offload anomaly/fraud detection tasks to Redshift and perform near real-time predictions on transactional data.
The following diagram illustrates the architecture for this use case.
Amazon Redshift ML makes it easy for SQL users to create, train, and deploy machine learning models using familiar SQL statements. Redshift ML leverages Amazon SageMaker to train, test and deploy ML models using the data in Redshift cluster. Redshift ML uses SageMaker Autopilot to automatically and intelligently explore different models to find the best one. Also, it offers the ability to use already trained and deployed models from SageMaker. Refer Tutorials for Amazon Redshift ML for more details.
For better operational experience, to view and monitor Amazon Aurora zero-ETL integration is crucial. There are several ways to monitor as the following:
- · Through Zero-ETL integrations pane of Amazon RDS Console from the current_status section
- · By querying Redshift system views, such as SVV_INTEGRATION that provides configuration details for your integrations, SYS_INTEGRATION_ACTIVITY that provides information about completed integration runs, and SVV_INTEGRATION_TABLE_STATE that describes the state of each table in an integration.
- · Through CW Metrics for a particular integration ,such as IntegrationNumTablesFailedReplication and IntegrationState in Amazon Redshift console.
While troubleshooting zero-ETL integrations, you can check the state of a zero-ETL integration by querying the SVV_INTEGRATION system view in Amazon Redshift. If the state column has a value of ErrorState, it means something's wrong with the integration. To find out common issues with zero-ETL integrations, you can refer he public documentation.
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.