Extract, Transform, Lock and Load | S2 E18 | Build On Weekly

Jacquie and Darko are laundring some data with Amazon EMR and Apache Spark

AWS Admin
Amazon Employee
Published May 18, 2023
Last Modified Jun 25, 2024
The architecture diagram of the thing Jacquie and Darko are trying to build
Today, Jacquie and Darko are going to follow a blog post by our colleague Suman, and go ahead and build an ETL Pipeline with Amazon EMR and Apache Spark. We gather, process/clean(launder) data, crawl it, and put it into a Data Catalog. So if you are learning on how to get started with Elastic Map Reduce (EMR) and PySpark, make sure to check out this live stream.
Also - do you know how much data 3M security tape holds? At 6250 bytes per inch, on 2000 inches of tape? The math is simple - not enough.
Check out the recording here:

Links from today's episode

🐦 Reach out to the hosts and guests:

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.