Data Solutions Framework on AWS | S02 EP16
In this show we would be discussing about Data Solutions Framework on AWS (DSF), an opinionated open source framework that accelerates building data solutions on AWS. It can take days or weeks to build end-to-end solutions on AWS with infrastructure as code (IaC) and following best practices, but with DSF it takes hours and you can focus on your use case.
- S3-based data lake with encryption and lifecycle policies
- Data catalog powered by AWS Glue
- Utilities like bulk data copy between S3 buckets
- Spark job orchestration using AWS Step Functions
- Packaging PySpark code and dependencies for AWS EMR
- Kafka API for Amazon MSK
- Framework documentation: https://awslabs.github.io/data-solutions-framework-on-aws/
- Spark Data lake with CI/CD: https://github.com/awslabs/data-solutions-framework-on-aws/tree/main/examples/spark-data-lake
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.