
Using Glue Catalog Assets in Amazon SageMaker Unified Studio
This post will walk through how to ingest and share Glue Data Catalog assets across AWS accounts using SageMaker Unified Studio Projects and Catalog.
Darren Roback
Amazon Employee
Published Feb 18, 2025
Amazon SageMaker Unified Studio provides an integrated experience to use all your data and tools for analytics and AI. You can use Amazon SageMaker Unified Studio to discover your data and put it to work using familiar AWS analytics and machine learning services for model development, generative AI, big data processing, and SQL analytics, assisted by Amazon Q Developer. You can also use Amazon SageMaker Unified Studio to work across compute resources using unified notebooks, discover and query diverse data sources with a built-in SQL editor, train and deploy AI models at scale, and rapidly build custom generative AI applications.
Amazon SageMaker Unified Studio is built on Amazon DataZone capabilities such as domains to organize your assets and users, and projects to collaborate with others users, securely share artifacts, and seamlessly work across compute services.
Amazon SageMaker Unified Studio offers the following capabilities:
- Use all your data and tools in a single development environment
- Build and scale generative AI applications with Amazon Bedrock
- Gain insights with the most price-performant SQL engine from Amazon Redshift
- Unify data access across Amazon S3 data lakes, Amazon Redshift, and federated data sources with Amazon SageMaker Lakehouse
- Build, train, and deploy machine learning and foundation models, with fully managed infrastructure, tools, and workflows from Amazon SageMaker AI
- Prepare, integrate, and orchestrate data for analytics and AI at petabyte scale with Amazon EMR, Amazon Athena, and AWS Glue
- Discover, govern, and collaborate on data and AI securely, with a unified catalog, built on Amazon DataZone
Many organizations have existing datasets managed in an AWS Glue catalog, and are seeking guidance on how to bring these datasets into the SageMaker ecosystem for advanced analytics and machine learning model development.
In this post we will walk through the end-to-end process to ingest your existing Glue catalog assets as a data source in SageMaker Unified Studio and demonstrate how to publish those assets to the SageMaker catalog to foster a data mesh culture across the organization. To model realistic scenarios, we will be sharing this data across AWS accounts, reflecting a true consumption pattern for many enterprise organizations.
- Access to two AWS accounts (we'll refer to these as the ProducerAccount and ConsumerAccount).
- Both accounts are members of the same AWS Organization.
- AWS Resource Access Manager (RAM) Trusted Access is enabled in AWS Organizations.
- A Glue database and table within the ProducerAccount.
- The Glue database and tables are not managed by AWS Lake Formation.
- Your AWS role is configured as a Lake Formation Data Lake Administrator in the producer account.
- Have enabled IAM Identity Center within the AWS Organization.
- All deployment will take place in the AWS us-east-1 region.
In Amazon SageMaker Unified Studio, a domain is the organizing entity for connecting together your assets, users, and their projects. With Amazon SageMaker platform domains, you have the flexibility to reflect the data and analytics needs of your organizational structure, whether it's creating a single Amazon SageMaker platform domain for your enterprise or multiple domains for different business units.
The first step of involves creating a new SageMaker Platform Domain, which will will do from the Amazon SageMaker console in the ProducerAccount.
(1) Navigate to the Amazon SageMaker console and select Create a Unified Studio domain.

(2) Choose the Manual setup option and provide a name for the domain, such as DemoDomain.
(3) In the Permissions section, select the option to Create and use a new service role for the Domain Execution and Domain Service roles.
- The AmazonSageMakerDomainExecution role has the AWS policy: SageMakerStudioDomainExecutionRolePolicy attached. This is an IAM role that Amazon SageMaker Unified Studio requires to call APIs on behalf of authorized users, including those logged in to Amazon SageMaker Unified Studio.
- The AmazonSageMakerDomainService role has the AWS policy: SageMakerStudioDomainServiceRolePolicy attached. This is a service role for domain level actions performed by Amazon SageMaker Unified Studio.
(4) Select Create domain when complete.

(5) When complete you will be prompted that the domain was successfully created.

We're now ready to configure SSO user access, which will allow us to onboard users into SageMaker Unified Studio projects.
(1) Under Next steps for your domain, select the Configure option next to Configure SSO user access.

(2) Under User authentication method, select IAM Identity Center and click Next.

(3) Verify that your organizational instance of IAM Identity Center is shown and select the Require assignments option to limit access to the SageMaker Unified Studio domain. Click Next to continue.

(4) Confirm your settings and select Save when finished.

(5) Add two test users from IAM Identity Center to the SageMaker Unified Studio domain. One of these users will act as our data producer, while the other will act as our data consumer. Select Add users and groups when complete.

Amazon SageMaker Studio domain units help to organize assets and other domain entities under specific business units and teams. Resource owners such as AWS account owners can use domain units to set up Amazon SageMaker Unified Studio authorization permissions on their resources. Domain units provide a delegated authority from account owners to domain unit owners, and they can set up authorization permissions on environment profiles (created using blueprint configurations) on behalf of account owners. This way, you can limit who can create and use environment profiles depending on the business units to which they belong.
As our SageMaker Unified Studio domain was recently created, we have not yet defined any custom domain units. For the purposes of this post, we will add our newly onboarded users as root domain owners, which will provide them the ability to:
- Create domain units
- Create projects with any existing project profiles
- Become project members
(1) In the SageMaker Studio domain, select the User management tab, scroll down to Root domain owners, click the Add drop-down, and select Add SSO users and groups.

(2) Select the two users onboarded previously, and select Add root domain owner(s) when complete.

(3) Verify that the two users were added successfully.

In Amazon SageMaker Unified Studio, associated accounts are other AWS accounts that can be associated with an Amazon SageMaker platform domain so that resources can be created and accessed in these accounts for various purposes. A common scenario here would involve the creation of projects for various personas within your organization that require various tooling to analyze data (e.g., Athena, EMR, RedShift, etc.).
When we created the SageMaker Studio domain, the account we created the domain within was automatically added as an account association. We'll want to consume data across AWS accounts and SageMaker Studio projects, so we'll need to add an account association for the ConsumerAccount here.
(1) In the SageMaker Studio domain, select the Account associations tab, and click Request association.

(2) Under AWS account ID, enter the account ID of the ConsumerAccount. Select Request association when finished.

(3) Returning to the SageMaker Studio domain, verify that the ProducerAccount and ConsumerAccount are both listed as Associated.

A blueprint defines what AWS tools and services members of the project to which the project profile belongs can use as they work with data in the Amazon SageMaker catalog.
We will be enabling two blueprints in each account - LakeHouseDatabase and Tooling:
- LakeHouseDatabase will provide us with an AWS Glue database for data management and an Amazon Athena workgroup for querying data, and will be all we need to share and query the data.
- Tooling creates resources for the project, including IAM user roles, security groups, and Amazon SageMaker platform domains.
(1) In the SageMaker Studio domain within the ProducerAccount, select the Blueprints tab, and select the LakeHouseDatabase blueprint. Click the Enable button to enable the blueprint.

(2) Under Permissions and resources, select the option to Create and use a new service role for both the Provisioning role and the Manage Access role.
- The AmazonSageMakerManageAccess role grants Amazon SageMaker Unified Studio permissions to publish, grant access, and revoke access to Amazon SageMaker Lakehouse, AWS Glue Data Catalog and Amazon Redshift data. It also grants Amazon SageMaker Unified Studio access to publish and manage subscriptions on Amazon SageMaker Catalog data and AI assets.
- The AmazonSageMakerProvisioning role is used by Amazon SageMaker Unified Studio to provision and manage resources defined in the selected blueprints in your account.

(3) Still within the SageMaker Studio domain within the ProducerAccount, select the Blueprints tab, and select the Tooling blueprint. Click the Enable button to enable the blueprint.

(4) Under Permissions and resources, accept the default option to use the Provisioning and Manage Access roles created when we enabled the LakeHouseDatabase blueprint.
(5) Under Networking, select a default VPC (if created) and at least three subnets for resources. Alternatively, create a new VPC for SageMaker Unified Studio and specify it here.
(6) Review the Tooling blueprint settings, and select Enable blueprint when complete.

Blueprints are account-specific, and to complete the setup we must enable the LakeHouseDatabase and Tooling blueprints within the ConsumerAccount.
(1) Navigate to the Amazon SageMaker console in the ConsumerAccount and you will see an option to View associated domains. Select this to view the associated domain.

(2) Click on the associated domain named DemoDomain that was created from the ProducerAccount.

(3) Within the associated domain, select the Blueprints tab, and select the LakeHouseDatabase blueprint. Click the Enable button to enable the blueprint.

(4) Under Permissions and resources, select the option to Create and use a new service role for both the Provisioning role and the Manage Access role.
- The AmazonSageMakerManageAccess role grants Amazon SageMaker Unified Studio permissions to publish, grant access, and revoke access to Amazon SageMaker Lakehouse, AWS Glue Data Catalog and Amazon Redshift data. It also grants Amazon SageMaker Unified Studio access to publish and manage subscriptions on Amazon SageMaker Catalog data and AI assets.
- The AmazonSageMakerProvisioning role is used by Amazon SageMaker Unified Studio to provision and manage resources defined in the selected blueprints in your account.

(3) Still within the SageMaker Studio domain within the ConsumerAccount, select the Blueprints tab, and select the Tooling blueprint. Click the Enable button to enable the blueprint.

(4) Under Permissions and resources, accept the default option to use the Provisioning and Manage Access roles created when we enabled the LakeHouseDatabase blueprint.
(5) Under Networking, select a default VPC (if created) and at least three subnets for resources. Alternatively, create a new VPC for SageMaker Unified Studio and specify it here.
(6) Review the Tooling blueprint settings, and select Enable blueprint when complete.

A project profile defines an higher-level template for projects in your Amazon SageMaker platform domains. A project profile is a collection of blueprints which are configurations used to create projects. A project profile can define if a particular blueprint is enabled during the creation of the project, or available later for the project users to enable on-demand.
Three default project profiles are created for you - a Data analytics and AI-ML model development project profile, a Generative AI application development project profile, and a SQL analytics project profile - each providing access to a different set of blueprints, and therefore tailored to unique personas within your Amazon SageMaker Studio domain.
You also have the ability to create custom project profiles, which we will be doing here. This will give us complete control over the project profile settings, including which blueprints are included in the profile, allowing us to demonstrate the ingestion, access, and sharing capabilities of our Glue catalog data across Amazon SageMaker Studio projects and AWS accounts.
We'll be creating two project profiles - a ProducerProjectProfile that will serve as the data producer and will be deployed in the ProducerAccount, and a ConsumerProjectProfile that will serve as the data consumer and will be deployed in the ConsumerAccount.
(1) Return to the Amazon SageMaker console in the ProducerAccount. In the SageMaker Studio domain, select the Project profiles tab, and click Create.

(2) Specify ProducerProjectProfile as the name of the project profile.
(3) Under Project profile creation options, select the option for Custom create.
(4) Within the Blueprints setting drop-down, select the option for LakeHouseDatabase.
(5) Under the Default tooling blueprint deployment settings, provide the account ID of your ProducerAccount and provide a region name of us-east-1.
(6) Under the Authorization section, select the option for Selected users and groups, and select the data producer user you onboarded previously.
(7) Under profile readiness, select the option to Enable project profile on creation.
(8) Review your settings and select Create project profile when finished.

(1) Still within the ProducerAccount, following the same process in the SageMaker Studio domain, select the Project profiles tab, and click Create.

(2) Specify ConsumerProjectProfile as the name of the project profile.
(3) Under Project profile creation options, select the option for Custom create.
(4) Within the Blueprints setting drop-down, select the option for LakeHouseDatabase.
(5) Under the Default tooling blueprint deployment settings, provide the account ID of your ConsumerAccount and provide a region name of us-east-1.
(6) Under the Authorization section, select the option for Selected users and groups, and select the data consumer user you onboarded previously.
(7) Under profile readiness, select the option to Enable project profile on creation.
(8) Review your settings and select Create project profile when finished.

(9) Verify that both project profiles were created and show a status of Enabled.

We're now ready to create the producer project, which will serve as the project in which we onboard and share the existing Glue table.
(1) Still within the SageMaker Studio domain, open the Amazon SageMaker Unified Studio URL in a new tab and select the option to Sign in with SSO.

(2) Sign in to the application as the user we onboarded previously who will serve as our data producer. Recall that we had also authorized this user within the producer project profile.

(3) In the SageMaker Unified Studio landing page, select the option to Create project.

(4) Provide a project name of ProducerProject, and select the ProducerProjectProfile created earlier. When complete, click Continue.

(5) On the Customize blueprint parameters screen, leave all values at the default and select Continue.

(6) Review the producer project configuration and select Create project when complete.

(7) When the project creation process completes you will be taken to the Project overview page.

We're now ready to create the consumer project, which will serve as the project in which we onboard and share the existing Glue table.
(1) Open the Amazon SageMaker Unified Studio URL in a new browser or private window and select the option to Sign in with SSO.

(2) Sign in to the application as the user we onboarded previously who will serve as our data consumer. Recall that we had also authorized this user within the consumer project profile.

(3) In the SageMaker Unified Studio landing page, select the option to Create project.

(4) Provide a project name of ConsumerProject, and select the ConsumerProjectProfile created earlier. When complete, click Continue.

(5) On the Customize blueprint parameters screen, leave all values at the default and select Continue.

(6) Review the consumer project configuration and select Create project when complete.

(7) When the project creation process completes you will be taken to the Project overview page.

Note: SageMaker Unified Studio is in public preview as of 02/17/2025 and the team is working diligently on making it easier to ingest Glue data catalog assets into SageMaker Unified Studio projects. The steps below are required as of the time of this writing, but will likely change as the product continues to advance. We will make every attempt to update this post to reflect those changes.
Within the data producer account we have a Glue database named iceberg_tutorial_db and a Glue table within this database named nyc_taxi_curated that we will be ingesting into SageMaker Unified Studio, although your database and table names could be different. This database and table are currently using IAM policies to govern access, and we want to ensure that the onboarding process does not impact any existing integrations with these resources.
Your Glue database and table are likely different, so substitute your database and table name, along with the S3 location for these resources in the examples below.
SageMaker Unified Studio requires a tag on the Glue database within the data producer account following the form of
"AmazonDataZoneProject": "<project_id>"
with project_id being that of the ProducerProject. (1) Navigate to the ProducerProject overview in SageMaker Unified Studio and copy the Project ID to a text editor on your local machine.

(2) In the ProducerAccount, open CloudShell in a new tab and tag the database you wish to ingest with the project ID of your ProducerProject using the command below.
(3) Verify the tag application using the command below.
(4) Verify that your Glue database tag looks similar to the output below.
Lake Formation cross-account capabilities allow users to securely share distributed data lakes across multiple AWS accounts, AWS organizations or directly with IAM principals in another account providing fine-grained access to the Data Catalog metadata and underlying data.
In the ProducerAccount we will need to set the Lake Formation cross-account version to Version 4 to support sharing of data catalog resources that are registered in hybrid mode.
(1) Open the Lake Formation console in the ProducerAccount, and navigate to Data catalog settings. Set the Current cross-account version to Version 4 and select Save when complete.

AWS Lake Formation hybrid access mode supports two permission pathways to the same AWS Glue Data Catalog databases, tables, and views. In the first pathway, Lake Formation allows you to select specific principals, and grant them Lake Formation permissions to access databases and tables by opting in. The second pathway allows all other principals to access these resources through the default IAM principal policies for Amazon S3 and AWS Glue actions.
Configuring hybrid access mode for the Glue table we wish to ingest will allow us to preserve all existing IAM permissions over the Glue resource while also providing a mechanism to use Lake Formation permissions for the SageMaker Unified Studio project roles.
(1) Return to the CloudShell tab in the ProducerAccount and execute the command below to retrieve the data location for the Glue table you wish to onboard.
(2) Still within CloudShell, execute the command below to register the data location in Lake Formation in hybrid access mode, substituting the table location retrieved above.
(3) Verify the table location registration status using the command below.
(4) The output from the previous command will list the resource ARN for your table location along with the hybrid access mode setting.
We'll next need to verify that the Glue database and table are managed by Lake Formation and have IAM_ALLOWED_PRINCIPAL permissions on the Glue resources. This is the default setting to ensure compatibility with AWS Glue.
(1) Return to the CloudShell tab in the ProducerAccount and execute the command below to verify the database permissions.
(2) Verify that IAM_ALLOWED_PRINCIPALS has ALL permissions over the database, and note that you may see additional principals in this permissions list.
(3) Still within the CloudShell tab in the ProducerAccount, execute the command below to verify the table permissions, and note that you may see additional principals in this permissions list.
(4) Verify that IAM_ALLOWED_PRINCIPALS has ALL permissions over the table.
We'll next need to configure Lake Formation opt-in for the SageMaker Unified Studio producer project role, which will allow us to use Lake Formation permissions for the project, while preserving IAM permissions for all other requests to the Glue table.
(1) Navigate to the ProducerProject overview in SageMaker Unified Studio and copy the Project role ARN to a text editor on your local machine.

(2) Return to the CloudShell tab in the ProducerAccount and execute the command below to add opt-in for the Glue database.
(3) Execute the command below to add opt-in for the Glue table.
(4) Verify the opt-in status for the Glue database using the command below.
The output from this command will be similar to what's shown below.
(5) Verify the opt-in status for the Glue table using the command below.
The output from this command will be similar to what's shown below.
Next we'll need to configure Lake Formation permissions for the producer project role over the Glue database and table added above.
(1) Return to the CloudShell tab in the ProducerAccount, and execute the command below to grant DESCRIBE permissions on the database to the SageMaker Unified Studio project role.
(2) Verify the database permissions using the command below.
The output from this command will be similar to what's shown below, showing the project role having DESCRIBE permissions on the database. Note that you may see additional principals in this permissions list.
(3) Execute the command below to grant DESCRIBE and SELECT permissions on the table to the SageMaker Unified Studio project role.
(4) Verify the table permissions using the command below.
The output from this command will be similar to what's shown below, showing the project role having DESCRIBE and SELECT permissions on the table. Note that you may see additional principals in this permissions list.
We'll next need to configure Lake Formation opt-in for the SageMaker Unified Studio consumer project role, which will allow us to use Lake Formation permissions for the project, while preserving IAM permissions for all other requests to the Glue table.
(1) Navigate to the ConsumerProject overview in SageMaker Unified Studio and copy the Project role ARN to a text editor on your local machine.

(2) Return to the CloudShell tab in the ProducerAccount, and execute the command below to add opt-in for the Glue database.
(3) Verify the opt-in status for the Glue database using the command below.
The output from this command will be similar to what's shown below.
Note: We do not need to grant any permissions for the consumer project role as the AmazonSageMakerManageAccess role will automate this process for us.
The first step of working with this data in our SageMaker Unified Studio project is to add a data source in the producer project.
(1) Return to the ProducerProject in SageMaker Unified Studio, and select Data sources from the left-hand menu.

(2) On the Data sources page, select Create data source.

(3) On the Define source screen, provide a Name for your data source, and select AWS Glue for the data source type.
(4) From the Database name drop-down, select the name of the Glue database.
(5) In the Table selection criteria, enter the name of the Glue table, and select Next when complete.

(6) On the Add details screen, leave all settings at the default, and select Next.

(7) On the Set up schedule screen, leave all settings at the default, and select Next.

(8) Review the data source details and select Create when complete.

(9) Once the data source has been added, select Run to add it to the project. This process will take a few seconds and you will be prompted that the asset was Successfully created.

(10) Navigate to Assets on the left-hand menu, and select the newly added data source.
(11) You'll notice that business metadata for the data source was automatically generated based on the data source schema. Edit this as needed and select Accept all when complete.

(12) Click on the Actions drop-down and select Query with Athena. The LakeHouseDatabase blueprint we deployed earlier provisioned an Athena workgroup for you to query this data directly within the SageMaker Unified Studio project.

The ability to add this Glue catalog table as a data source and query it within the SageMaker Unified Studio project demonstrates that we were successful in the SageMaker Unified Studio and Lake Formation setup process.
We're now ready to publish the data asset in the SageMaker Catalog, which will make it discoverable from other projects (and AWS accounts) within our organization.
(1) Still within the ProducerProject in SageMaker Unified Studio, return to the newly added data asset, and select Publish Asset.
(2) When prompted to confirm publishing of the asset, select Publish Asset.

We're now ready to subscribe to the data asset in the SageMaker Catalog, which will make the data consumable within the consumer project.
(1) In a different browser or private window, login to SageMaker Unified Studio and access the ConsumerProject.
(2) From the Discover menu, select Data Catalog.

(3) Select Browse assets.
(4) Locate the published data asset and select Subscribe.

(5) From the Project drop-down, select ConsumerProject, and provide a reason for the subscription request. Select Request when complete.

(6) Return to the ProducerProject in SageMaker Unified Studio and select Subscription requests from the left-hand menu. Select View request to view the new subscription request.

(7) In the response details, select Full access for Approval access, and add a decision comment. Select Approve when complete.

(8) Return to the ConsumerProject in SageMaker Unified Studio, select Assets from the left-hand menu, and select the Subscribed tab.

(9) From the Build menu, select Query Editor.

(10) Navigate to the newly subscribed data asset, click on the ellipsis, and select Query with Athena. Verify that the data is able to be queried and that you are able to see the data schema in the lower-left menu.

In this post we have demonstrated the process to add an existing Glue catalog table into Amazon SageMaker Unified Studio and query that table using tools available directly in the SageMaker Unified Studio project. We have further demonstrated how to publish data assets to the SageMaker Catalog, and subscribe to those assets from other SageMaker Unified Studio projects, even those that cross AWS account boundaries.
As SageMaker Unified Studio prepares for general availability (GA), we expect much of the Lake Formation configuration to be automated for us, and will make every attempt to keep those post up to date to reflect current onboarding and publishing processes.
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.