AWS Logo
Menu
Accelerating Data Engineering with RAPIDS Accelerator

Accelerating Data Engineering with RAPIDS Accelerator

Join us as we dive deep into accelerating your data engineering workloads using RAPIDS Accelerator for Apache Spark on AWS. In this episode, we'll explore how to leverage GPU acceleration to dramatically speed up your Spark applications while reducing costs.

Ibrahim Emara
Amazon Employee
Published Feb 20, 2025
In this show, we discuss the acceleration of data engineering with GPU (Graphics Processing Unit) technology. We explore the differences between CPUs and GPUs, highlighting that while CPUs excel at sequential processing, GPUs are designed for highly parallel tasks. The collaboration between AWS and NVIDIA is discussed, showcasing their long-standing partnership since 2010 and the evolution of GPU technology available on AWS.
A key focus is on the practical application of GPUs in data processing, particularly for fraud detection in the financial sector. Dr. Chionis demonstrates how GPUs can significantly speed up data processing tasks, showing a comparison where GPU-enabled clusters completed a task in 43 minutes compared to over 10 hours on CPU-based instances. The cost benefits are also highlighted, with GPU processing being up to 10 times less expensive for the same workload.
The discussion covers the ease of implementing GPU acceleration in existing Spark workloads using Amazon EMR (Elastic MapReduce). Dr. Chionis explains that minimal configuration changes are needed to enable GPU usage, making it accessible for data engineers. The conversation also touches on best practices for sizing GPU clusters, monitoring tools for optimization, and the broader applications of GPU technology in areas like AI-driven drug discovery and advanced robotic simulation.
Highlights:
  • GPUs can process data up to 15 times faster than CPUs for certain workloads
  • GPU processing can be up to 10 times less expensive than CPU processing for the same task
  • Minimal code changes are required to leverage GPU acceleration in Spark workloads
  • AWS and NVIDIA's partnership has led to significant advancements in GPU technology availability on the cloud
  • GPUs are beneficial for workloads processing more than 100 GB of data
  • Real-time fraud detection using GPU-accelerated processing can improve accuracy by 10-20%
  • EMR (Elastic MapReduce) provides an easy way to set up and manage GPU-enabled clusters

Hosts of the show 🎤

Ibrahim Emara - Database Specialist SA @AWS

Guests 🎤

Angelos Chionis - Data, Analytics and AI Lead @AWS

Links from today's episode

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments