Community | Extract, Transform, Lock and Load | S2 E18

The architecture diagram of the thing Jacquie and Darko are trying to build

Today, Jacquie and Darko are going to follow a blog post by our colleague Suman, and go ahead and build an ETL Pipeline with Amazon EMR and Apache Spark. We gather, process/clean(launder) data, crawl it, and put it into a Data Catalog. So if you are learning on how to get started with Elastic Map Reduce (EMR) and PySpark, make sure to check out this live stream.

Also - do you know how much data 3M security tape holds? At 6250 bytes per inch, on 2000 inches of tape? The math is simple - not enough.

Check out the recording here:

Links from today's episode

Blog post from Suman
Apache Spark

🐦 Reach out to the hosts and guests:

Jacquie: https://twitter.com/devopsjacquie

Darko: https://twitter.com/darkosubotica

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Site Terms, Privacy, and more.

Extract, Transform, Lock and Load | S2 E18 | Build On Weekly

Jacquie and Darko are laundring some data with Amazon EMR and Apache Spark

Links from today's episode

Comments