
Getting Started with Amazon SageMaker Lakehouse
SageMaker Lakehouse helps you manage and analyze data seamlessly across your organization. In this session, we’ll show you how to set it up, use key features for data storage and processing, and integrate it into your workflows. We’ll also include a live demo to guide you through the setup and usage. By the end, you’ll have the tools to start using SageMaker Lakehouse to streamline data management and collaboration.
Tony Mullen
Amazon Employee
Published Apr 16, 2025
The episode focused on Amazon SageMaker Lakehouse, a new offering within the next generation of SageMaker that aims to unify data access across an organization's data estate. Rohan, an account solutions architect, explained how SageMaker Lakehouse breaks down data silos by providing a single interface to query and analyze data from various sources like S3 data lakes, Redshift data warehouses, and other databases. He demonstrated how the Unified Studio interface allows data teams and ML teams to collaborate more efficiently by sharing data assets through projects.
A key demo showcased how to join data from multiple sources, including federated catalogs, to create enriched datasets for machine learning. Rohan walked through the process of querying data, publishing datasets as assets, and using them to train ML models - all within the SageMaker ecosystem. He highlighted the importance of Apache Iceberg as the underlying open table format that enables transactional capabilities on data lakes and allows access via open APIs.
The discussion touched on how existing data catalogs can be integrated into SageMaker Lakehouse through IAM configurations and tagging. Rohan emphasized the flexibility of the platform in supporting various query engines and programming languages. The episode concluded with information about an upcoming hands-on workshop for those interested in trying out SageMaker Lakehouse themselves.
Key highlights:
- SageMaker Lakehouse unifies access to data across an organization's data estate
- Supports both managed catalogs and federated catalogs (external sources)
- Uses Apache Iceberg as the underlying open table format
- Enables data sharing and collaboration through projects in Unified Studio
- Allows querying data from multiple sources using various engines (Athena, Redshift, etc.)
- Integrates with existing data catalogs through IAM configurations and tagging
- Supports flexible compute options for data processing (EMR, Glue, local, etc.)
- Upcoming workshop on May 29th for hands-on experience with SageMaker Lakehouse
Loading...
Tony Mullen - Senior Database Specialist @AWS
Rohan Ghosh - Enterprise Solutions Architect @ AWS
- Sagemaker - https://aws.amazon.com/sagemaker/
- Sagemaker Lakehouse - https://aws.amazon.com/sagemaker/lakehouse/
- Iceberg library link - https://github.com/apache/iceberg
- Workshop registration link - https://aws-experience.com/emea/smb/e/fb3c5/amazon-sagemaker-unified-studio-immersion-day
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.