When creating a project in Amazon SageMaker Unified Studio, users select a project profile to define resources and tools to be provisioned in the project. These are used by Amazon SageMaker Catalog to implement a data mesh pattern. Some users don’t want to take advantage of resources provisioned along with the project for various reasons. For instance, they may want to avoid making changes to their existing applications and data products.
This post shows you how to implement a data mesh pattern by using Amazon SageMaker Catalog while keeping your current data repositories and consumer applications unchanged.
In this post, you will simulate a scenario based on data producer and data consumer that exists before Amazon SageMaker Catalog adoption. For this purpose, you will use a sample dataset to simulate existing data and simulate an existing application using an AWS Lambda function. You can apply the same solution to your real-life data and workloads.
The following diagram illustrates the solution architecture’s key configurations. In this architecture, the Amazon Simple Storage Service (Amazon S3) bucket and the AWS Glue Data Catalog in the producer account simulate the existing data repository. The Lambda function in the consumer account simulates the existing consumer application.

Here is a description of the key configurations highlighted in the architecture:
The solution architecture is based on the following Amazon Web Services (AWS) services and features:
In this section, you will prepare the resources and configurations you need for this solution.
To follow this solution, you need three AWS accounts, and it’s better if they’re part of the same organization in AWS Organizations:
Each account must have an Amazon Virtual Private Cloud (Amazon VPC) with at least two private subnets in two different Availability Zones. For instruction, refer to Create a VPC plus other VPC resources. Make sure to create both VPCs in the same Region you plan to apply this solution.
A governance account is used for the sake of convenience, but it’s not strictly needed because Amazon SageMaker can be configured and managed in producer or consumer accounts.If you don’t have access to three accounts, you can still use this post to understand the key configurations required to implement a data mesh pattern with Amazon SageMaker Catalog while keeping your current data repositories and consumer applications unchanged.
First, create a sample dataset by following these instructions:
After you create the sample dataset, create an S3 bucket and an AWS Glue database in the producer account, which will act as the data repository.
Create the S3 bucket and upload the trees.csv file in the producer account:
Create the AWS Glue database and table in the producer account:
Create the Lambda function in the consumer account. This will simulate a data consumer application.First, in the consumer account create the IAM policy and the IAM role to be assigned to the Lambda function:
After the IAM role for the Lambda function is in place, you can create the Lambda function in the consumer account:
The code provided for the Lambda function includes some placeholders that you will replace later, after you have the required information. Don’t test the Lambda function at this time because it will fail because of the presence of the placeholders.
Amazon SageMaker Unified Studio supports two distinct domain types: AWS IAM Identity Center based domains and IAM based domains. At the time of writing this post, only IAM Identity Center based domains support multi-accounts association, therefore in this post you work with this type of domain that requires IAM Identity Center.
In the governance account, you enable IAM Identity Center and create an administrative user to create and manage the Amazon SageMaker Unified Studio domain. Create a user with administrative access:
Sign in as the user with administrative access:
To create the Amazon SageMaker Unified Studio domain in the governance account refer to Create a Amazon SageMaker Unified Studio domain – quick setup.
After your domain is created, you can navigate to the Amazon SageMaker Unified Studio portal (a browser-based web application) where you can use your data and configured tools for analytics and AI. Save the Amazon SageMaker Unified Studio portal URL because you will use this URL later.
Now that you have the prerequisites in place, you can complete the following ten high-level steps to implement the solution.
Start by associating the producer and consumer accounts to the newly created Amazon SageMaker Unified Studio domain. When you associate your producer and consumer accounts to the domain, make sure to select IAM users and roles can access APIs and IAM users can log in to Amazon SageMaker Unified Studio in the AWS RAM share managed permission section. For step-by-step instructions, refer to Associated accounts in Amazon SageMaker Unified Studio. If your AWS accounts are part of the same organization, your association requests are automatically accepted. However, if your AWS accounts aren’t part of the same organization, request association with the other AWS accounts in the governance account and then accept the association request in both the producer and consumer accounts.
Now, create two project profiles, one for the producer project and one for the consumer project.
In Amazon SageMaker Unified Studio, a project profile defines an uber template for projects in your Amazon SageMaker domain. A project profile is a collection of blueprints that provides reusable AWS CloudFormation templates used to create project resources.
A project profile is associated to a specific AWS account. This means, when a project is created the blueprints listed in the project profile are deployed in the associated AWS account. To use a project profile, you must enable its blueprints in the AWS account associated to the project profile.
You’re going to create the producer project profile that is associated to the producer account. This project profile will be used to create the producer project. This profile includes by default the Tooling blueprint that creates resources for the project, including IAM user roles and security groups.
Before creating the project profile, you will enable the Tooling blueprint in the producer account using the following procedure:

Proceed to creating the project profile in the governance account:
You also create a consumer project profile and associate it to the consumer account. This profile will be used to create the consumer project. The consumer project profile includes the LakeHouseDatabase blueprint, which is needed to create a lakehouse environment with an AWS Glue database for data management and an Amazon Athena workgroup for querying. The Tooling blueprint is included by default in the project profile.
Before creating the project profile, enable the Tooling and LakeHouseDatabase blueprints in the consumer account:
After blueprints are enabled in the consumer account, you can proceed creating the project profile:
In Amazon SageMaker Unified Studio, a project is a boundary within a domain where you can collaborate with other users to work on a business use case. In projects, you can create and share data and resources.To create producer and consumer projects in Amazon SageMaker Unified Studio use the following instructions:
After you’ve created the Producer project, note in a text file the Project role ARN that is displayed in the Project overview. The following image is shown for reference. The project role name is the string that follows arn:aws:iam:::role/ in the project role Amazon Resource Name (ARN). You will use both project role name and ARN later.

Repeat the preceding procedure to create the Consumer project. Be sure to enter Consumer for Project name and then select consumer-project-profile for Project profile. After it’s created, note the Project role ARN in a text file. The project role name is the string that follows arn:aws:iam:::role/ in the project role ARN. You will use both project role name and ARN later.
Bring your own data to the Amazon SageMaker Unified Studio Producer project. AWS provides several options to achieve this onboarding. The first option is automated onboarding in Amazon SageMaker lakehouse, in which you ingest the Amazon SageMaker lakehouse metadata of datasets into Amazon SageMaker Catalog. With this option, you can onboard your Amazon SageMaker lakehouse data as part of creating a new Amazon SageMaker Unified Studio domain or for an existing domain.
For more information about automated onboarding of Amazon SageMaker lakehouse data, refer to Onboarding data in Amazon SageMaker Unified Studio. As other options, you can bring in existing resources to your Amazon SageMaker Unified Studio project by using the Data and Compute pages in your project, or by using scripts provided in GitHub. For more information about using the Data and Compute pages or about using scripts, refer to Bringing existing resources into Amazon SageMaker Unified Studio. In this post, you will use Amazon SageMaker lakehouse capabilities to import your trees AWS Glue table into the Producer project.
To use Lake Formation permissions for fine-grained access control to the trees table, you need to register in Lake Formation the Amazon S3 location of the trees table. To do that, complete the following actions:
Grant database access to the IAM role that is associated with your Producer project. This role is called the project role, and it was created in IAM upon project creation.
To access the AWS Glue Data Catalog collections database from the Producer project in the Amazon SageMaker Unified Studio, complete the following actions:
Grant trees table access to the IAM role that is associated with your Producer project. To grant these permissions use the following instructions:
You must revoke the IAMAllowedPrincipals group permissions on both the database and table to enforce Lake Formation permission for access. For more information, refer to Revoking permission using the Lake Formation console.

Verify that your collections database and trees table are accessible in the Producer project:
Even if it’s accessible in the project, to work with the trees table in Amazon SageMaker Catalog, you need to register the data source and create an Amazon SageMaker Catalog asset:
Publishing a data asset manually is a one-time operation that you need to perform to allow others to access the data asset through the catalog:
To consume data assets in the Consumer project, subscribe to the data asset by creating a subscription request:
By default, asset subscription requests require manual approval by a data owner. However, if the requester in the Consumer project is also a member of the Producer project, the subscription request is automatically approved. For information about approving subscription requests, refer to Approve or reject a subscription request in Amazon SageMaker Unified Studio.
To enable your Lambda function access to the subscribed data asset, you need to allow the Lambda function to assume the Consumer project role. To do this, edit the Consumer project’s IAM role trust relationship:

Before you can test your Lambda function, you need to replace placeholders in the function code and in the IAM policy. There are three placeholders to be replaced: , and . For , you already have the actual value, which is the Consumer project’s role ARN that you noted in step 3 “Create SageMaker Unified Studio producer and consumer projects”. The next sections provide instructions to retrieve values for the other placeholders.
You need to find the name of the AWS Glue Data Catalog database that was created along with the Consumer project. You will then use this value to replace the placeholder in the consumer_function Lambda function code. To retrieve the AWS Glue Data Catalog database name, follow these instructions:

You need to find the ID of the Athena workgroup that was created along with the Consumer project. You will then use this value to replace the placeholder in the consumer_function Lambda function code and in the smus_consumer_athena_execution IAM policy. Use the following instructions to retrieve the Athena workgroup ID:
To replace the placeholder in the smus_consumer_athena_execution IAM policy, use the following procedure:
In this section, you will replace the , and placeholders in the consumer_function Lambda function code, and then you can test the function ability to access data of the trees table.
If your Lambda function execution fails due to timeout, change the function timeout setting as follows:
After increasing the timeout, test the function again.
If you no longer need the resources you created as you followed this post, delete them to prevent incurring additional charges. Start by deleting your Amazon SageMaker Unified Studio domain in the governance account. For more information, refer to Delete domains.
To remove the AWS Glue collections database from the producer account, follow these steps:
To remove the S3 bucket from the producer account, empty the bucket and then you can delete the bucket. For information about emptying the bucket, refer to Emptying a general purpose bucket. For information about deleting the bucket, refer to Deleting a general purpose bucket.
To remove the Lambda function from the consumer account, follow these steps:
To complete the cleanup, delete the IAM role named smus_consumer_lambda, then delete the IAM policy named smus_consumer_athena_execution in the consumer account. For information about removing a IAM role, refer to Delete roles or instance profiles. For information about removing an IAM policy, refer to Delete IAM policies.
In this post, we covered adopting Amazon SageMaker Catalog for data governance without rearchitecting your existing applications and data repositories. We walked through how to onboard existing data in Amazon SageMaker Unified Studio, then publish it in a catalog, and then subscribe and consume the data from resources deployed outside the context of an Amazon SageMaker Unified Studio project. This solution can help you accelerate your implementation of a data mesh pattern with Amazon SageMaker Catalog to publish, find, and access data securely in your organization.
For more information, refer to What is Amazon SageMaker? and work through the Amazon SageMaker Workshop to try the unified experience for data, analytics, and AI.
Paolo is a Senior Solutions Architect at AWS for Energy and Utilities. With 20+ years of experience in designing and building enterprise solutions, he works with global energy customers to design solutions to address customers’ business and technical needs. He is passionate about technology and enjoys running.
Joel is a Principal Specialist SA Analytics for AWS with 25 years’ experience working on enterprise architecture, data governance and analytics. He uses his experience to advise customers on their data strategy and technology foundations.