AWS Logo
Menu
Mastering the art of Data Enrichment with Apache Flink | S02 EP41 | Lets Talk About Data

Mastering the art of Data Enrichment with Apache Flink | S02 EP41 | Lets Talk About Data

In this episode we discuss enrichment patterns for streaming data, and how to implement them using Apache Flink. We then cover patterns in scenarios where reference data is static, available through external APIs, or available as a change data stream. We also dive into internal details about Flink state and how it stores reference data.

Prasad Matkar
Amazon Employee
Published Nov 7, 2024
In this Twitch show, the guests, Subham and Luis share their expertise on data enrichment patterns using Apache Flink. They discuss scenarios where reference data is static, fetched from APIs, or available as a change data stream. The discussion covers the advantages of stateful stream processing with Flink and techniques for handling late or out-of-order events. Subham and Luis then demonstrate code examples and architectural patterns to enrich streaming data efficiently covering topics such as preloading reference data, leveraging Flink state, async IO, caching, and handling rapidly changing reference data. They also touch upon the scalability and auto-scaling capabilities of AWS's managed Flink service.
Key Highlights:
  • Understanding stream processing and the need for data enrichment
  • Preloading static reference data into Flink operator memory for low-latency enrichment
  • Leveraging Flink state for scalable reference data storage when data is large
  • Asynchronous API calls with Flink for efficient enrichment without busy waiting
  • Implementing a local cache with Flink state for frequently changing reference data
  • Handling late events by enriching with historically accurate reference data
  • Comparing sync, async, and cached enrichment patterns in terms of performance
  • Auto-scaling capabilities of AWS's managed Flink service based on CPU or custom metrics
  • Enriching with rapidly changing reference data using Change Data Capture (CDC)
  • Exploring code examples and demos for various enrichment patterns
Check out the recording here:
Loading...

Hosts of the show 🎤

Prasad Matkar - Database Specialist SA @ AWS

Guests 🎤

Subham Rakshit - Senior Analytics Solutions Architect @ AWS
Luis Morales - Senior Solutions Architect @ AWS

Links from today's episode

Check out Past Shows

You can check out our past shows from out community page -https://community.aws/livestreams/lets-talk-about-data
 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments