Extract, Transform, Lock and Load | S2 E18 | Build On Weekly

Jacquie and Darko are laundring some data with Amazon EMR and Apache Spark

Published May 18, 2023
Last Modified Jun 25, 2024
The architecture diagram of the thing Jacquie and Darko are trying to build
Today, Jacquie and Darko are going to follow a blog post by our colleague Suman, and go ahead and build an ETL Pipeline with Amazon EMR and Apache Spark. We gather, process/clean(launder) data, crawl it, and put it into a Data Catalog. So if you are learning on how to get started with Elastic Map Reduce (EMR) and PySpark, make sure to check out this live stream.
Also - do you know how much data 3M security tape holds? At 6250 bytes per inch, on 2000 inches of tape? The math is simple - not enough.
