AWS | Community | Mastering the art of Data Enrichment with Apache Flink | S02 EP41

In this Twitch show, the guests, Subham and Luis share their expertise on data enrichment patterns using Apache Flink. They discuss scenarios where reference data is static, fetched from APIs, or available as a change data stream. The discussion covers the advantages of stateful stream processing with Flink and techniques for handling late or out-of-order events. Subham and Luis then demonstrate code examples and architectural patterns to enrich streaming data efficiently covering topics such as preloading reference data, leveraging Flink state, async IO, caching, and handling rapidly changing reference data. They also touch upon the scalability and auto-scaling capabilities of AWS's managed Flink service.

Key Highlights:

Understanding stream processing and the need for data enrichment
Preloading static reference data into Flink operator memory for low-latency enrichment
Leveraging Flink state for scalable reference data storage when data is large
Asynchronous API calls with Flink for efficient enrichment without busy waiting
Implementing a local cache with Flink state for frequently changing reference data
Handling late events by enriching with historically accurate reference data
Comparing sync, async, and cached enrichment patterns in terms of performance
Auto-scaling capabilities of AWS's managed Flink service based on CPU or custom metrics
Enriching with rapidly changing reference data using Change Data Capture (CDC)
Exploring code examples and demos for various enrichment patterns

Check out the recording here:

To view this Twitch stream, please accept cookies.

Hosts of the show 🎤

Prasad Matkar - Database Specialist SA @ AWS

Guests 🎤

Subham Rakshit - Senior Analytics Solutions Architect @ AWS
Luis Morales - Senior Solutions Architect @ AWS

Links from today's episode

Amazon Managed Service for Apache Flink - https://aws.amazon.com/managed-service-apache-flink/
Blog - Common streaming data enrichment patterns in Amazon Kinesis Data Analytics for Apache Flink - https://aws.amazon.com/blogs/big-data/common-streaming-data-enrichment-patterns-in-amazon-kinesis-data-analytics-for-apache-flink/
GitHub - https://github.com/aws-samples/amazon-kinesis-data-analytics-examples/blob/master/EventEnrichment/src/main/java/com/amazonaws/operators/PreLoadEnrichmentDataInMemory.java
Blog - Implement Apache Flink real-time data enrichment patterns https://aws.amazon.com/blogs/big-data/implement-apache-flink-real-time-data-enrichment-patterns/
GitHub - https://github.com/aws-samples/apache-flink-near-online-data-enrichment-patterns
Github - https://github.com/srakshit/cdc-order-enrichment-flink-example
Blog - Perform Amazon Kinesis load testing with Locust - https://aws.amazon.com/blogs/big-data/perform-amazon-kinesis-load-testing-with-locust/
Github - https://github.com/aws-samples/amazon-kinesis-load-testing-with-locust
Workshop: https://catalog.workshops.aws/managed-flink/en-US/flink-on-msf/lab-3/scale-monitor/cw-metrics

Check out Past Shows

You can check out our past shows from out community page -https://community.aws/livestreams/lets-talk-about-data

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Select your cookie preferences

Site Terms, Privacy, and more.

Mastering the art of Data Enrichment with Apache Flink | S02 EP41 | Lets Talk About Data

Hosts of the show 🎤

Guests 🎤

Links from today's episode

Check out Past Shows

Comments