Data Solutions Framework on AWS | S02 EP16
In this show we would be discussing about Data Solutions Framework on AWS (DSF), an opinionated open source framework that accelerates building data solutions on AWS. It can take days or weeks to build end-to-end solutions on AWS with infrastructure as code (IaC) and following best practices, but with DSF it takes hours and you can focus on your use case.
Prasad Matkar
Amazon Employee
Published May 3, 2024
In this episode host Prasad is joined with guest Lotfi and they have an in-depth discussion about the data solutions framework on AWS.
The data solutions framework is an AWS CDK library built on top of the AWS CDK that helps deploy data workloads on AWS more quickly. It provides developers with pre-built constructs to accelerate the process of setting up common data architecture patterns. Lotfi explains that it was built based on feedback from customers about the complexity and time needed to build out data platforms. He gives examples of some key constructs provided such as:
- S3-based data lake with encryption and lifecycle policies
- Data catalog powered by AWS Glue
- Utilities like bulk data copy between S3 buckets
- Spark job orchestration using AWS Step Functions
- Packaging PySpark code and dependencies for AWS EMR
- Kafka API for Amazon MSK
The constructs are designed to be modular so they can be used independently or composed together. There is also flexibility to customize things like data tiering policies or encryption keys.
In the second half of the episode, Lotfi does a live demo showing how to use the framework to stand up an end-to-end data lake architecture on AWS. This includes S3 buckets.
Loading...
Hosts of the show 🎤
Prasad Matkar - Database Specialist Solutions Architect @ AWS
Lotfi Mouhib - Principal Solutions Architect @ AWS
- Framework documentation: https://awslabs.github.io/data-solutions-framework-on-aws/
- Spark Data lake with CI/CD: https://github.com/awslabs/data-solutions-framework-on-aws/tree/main/examples/spark-data-lake
You can check out our past shows from out community page -https://community.aws/livestreams/lets-talk-about-data
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.