Extract, Transform, Lock and Load | S2 E18 | Build On Weekly

Jacquie and Darko are laundring some data with Amazon EMR and Apache Spark

Darko Mesaros
Darko Mesaros
Amazon Employee
Published May 18, 2023

The architecture diagram of the thing Jacquie and Darko are trying to build

Today, Jacquie and Darko are going to follow a blog post by our colleague Suman, and go ahead and build an ETL Pipeline with Amazon EMR and Apache Spark. We gather, process/clean(launder) data, crawl it, and put it into a Data Catalog. So if you are learning on how to get started with Elastic Map Reduce (EMR) and PySpark, make sure to check out this live stream.

Also - do you know how much data 3M security tape holds? At 6250 bytes per inch, on 2000 inches of tape? The math is simple - not enough.

Check out the recording here:

🐦 Reach out to the hosts and guests:

Jacquie: https://twitter.com/devopsjacquie

Darko: https://twitter.com/darkosubotica