Extract, Transform, Lock and Load | S2 E18 | Build On Weekly
Jacquie and Darko are laundring some data with Amazon EMR and Apache Spark
AWS Admin
Amazon Employee
Published May 18, 2023
Last Modified Jun 25, 2024
![The architecture diagram of the thing Jacquie and Darko are trying to build](/_next/image?url=https%3A%2F%2Fcommunity.aws%2Fraw-post-images%2Flivestreams%2Fbuild-on-weekly%2F2023-05-18%2Fimages%2Farchitecture.png%3FimgSize%3D2123x1033&w=3840&q=75)
Today, Jacquie and Darko are going to follow a blog post by our colleague Suman, and go ahead and build an ETL Pipeline with Amazon EMR and Apache Spark. We gather, process/clean(launder) data, crawl it, and put it into a Data Catalog. So if you are learning on how to get started with Elastic Map Reduce (EMR) and PySpark, make sure to check out this live stream.
Also - do you know how much data 3M security tape holds? At 6250 bytes per inch, on 2000 inches of tape? The math is simple - not enough.
Check out the recording here:
Loading...
🐦 Reach out to the hosts and guests:
Jacquie: https://twitter.com/devopsjacquie
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.