AWS Logo
Menu

Build & Automate ML Pipelines with SageMaker & GitHub

End-to-end sagemaker pipeline with github and codepipeline

Published Feb 24, 2025
Amazon SageMaker Pipelines, the purpose-built, easy-to-use continuous integration and continuous delivery (CI/CD) service for machine learning (ML).These workflow automation components enable you to easily scale your ability to build, train, test, and deploy hundreds of models in production, iterate faster, reduce errors due to manual orchestration, and build repeatable mechanisms.
SageMaker projects introduce MLOps templates that automatically provision the underlying resources needed to enable CI/CD capabilities for your ML development lifecycle. You can use a number of built-in templates or create your own custom template.
Let's use the built in template : MLOps template for building, training, and deploying models with third-party git repositories with CodePipeline
The below is the Architecture flow for the template
Workflow
Workflow
Prerequisites:
  • An IAM or IAM Identity Center account to sign in to SageMaker Studio. For information, see Amazon SageMaker AI domain overview.
  • To setup Sagemaker Studio and domain follow these steps.
  • Permission to use SageMaker AI-provided project templates. For information, see Granting SageMaker Studio Permissions Required to Use Projects.
  • Two Empty GitHub repositories with out the readme file, You input these repositories into the project template, which will seed these repos with model build and deploy code.
  • Create Codestar connection to your two empty GitHub repositories. To create the Code star connection, follow these steps.

Step 1 :

Open SageMaker Studio and navigate to the projects section under Deployments menu
sagemaker projects
Projects
Click on Create project and choose the template MLOps template for building, training, and deploying models with third-party git repositories with CodePipeline.
After choosing the template it will prompt for project name give the appropriate project name. Under Model build and Model deploy repos info give the respective branch details to use from your repositories for pipeline activities and for full repository name enter repository name in the format of username/repository name or organization/repository name.
For CodeStar Connection ARN, enter the ARN of the AWS CodeStar connection you created.
The project created from the MLOps template, the following AWS Services and Resources are deployed.
The MLOps templates that are made available through SageMaker projects are provided via an AWS Service Catalog portfolio that automatically gets imported when a user enables projects on the Studio domain.
Two Github repositories are populated with Modelbuild and Modeldeploy codes
The first repository contains scaffolding code to create a multi-step model building pipeline This pipeline includes data processing, model training, model evaluation, and conditional model registration based on accuracy. As outlined pipeline.py file, it trains a linear regression model using the XGBoost algorithm on the well-known UCI Abalone Dataset. This repository also includes a build specification file, used by AWS CodePipeline and AWS CodeBuild to run the pipeline automatically.
The second repository contains code and configuration files for model deployment, as well as test scripts required to pass the quality gate. This repo also uses CodePipeline and CodeBuild, which run an AWS CloudFormation template to create model endpoints for staging and production.
Two CodePipeline pipelines:
  • The ModelBuild pipeline automatically triggers and runs the pipeline from end to end whenever a new commit is made to the ModelBuild code repository.
  • The ModelDeploy pipeline automatically triggers whenever a new model version is added to the model registry and the status is marked as Approved. Models that are registered with Pending or Rejected statuses aren’t deployed.
  • An Amazon Simple Storage Service (Amazon S3) bucket is created for output model artifacts generated from the pipeline.
Two SageMaker endpoints:
  • After a model is approved in the registry, the artifact is automatically deployed to a staging endpoint followed by a manual approval step.
  • If approved, it’s deployed to a production endpoint in the same AWS account.

Modifying the sample code for a custom use case

Step 2:

After your project has been created, the architecture described earlier is deployed and the visualization of the pipeline is available on the Pipelines drop-down menu within SageMaker Studio.
Now clone the Github repositories to the notebook instance in the SageMaker Studio, if you don't have the notebook instance create one in the SageMaker Studio.
Repositories
Repositories

Step 3:

ModelBuild repository
The ModelBuild repository contains the code for preprocessing, training, and evaluating the model. The sample code trains and evaluates a model on the UCI Abalone Dataset. We can modify 1.codebuild-buildspec.yml, 2.abalone , 3.evaluate.py, 4. preprocess.py, 5.pipeline.py
Create the dataset for the customerchurn usecase, run the below code to download the dataset and save it to s3 bucket.
Rename the abalone directory to customerchurn and modify the path inside codebuild-buildspec.yml as shown in the following code:
Replace the preprocess.py code with the customer churn preprocessing script found in the sample repository.
Edit the pipeline.py code for the customerchurn now replace the "InputDataUrl" default parameter with the Amazon S3 URL obtained in above step
Update the conditional step to evaluate the classification model:
change the pipeline definition
Change the Training step name to your desired name:
Change the evaluation report name and Processing step name:
Change the step_register name:
The default ModelApprovalStatus is set to PendingManualApproval. If our model has greater than 80% accuracy, it’s added to the model registry, but not deployed until manual approval is complete.
Replace the evaluate.py code with the customer churn evaluation script found in the sample repository.

Step 4 :

ModelDeploy repository
The ModelDeploy repository contains the AWS CloudFormation buildspec for the deployment pipeline. We don’t make any modifications to this code because it’s sufficient for our customer churn use case. It’s worth noting that model tests can be added to this repo to test model deployment. See the following code:

Step 5:

Triggering a pipeline run
Committing these changes to the GitHub repository (easily done on the Studio source control tab) triggers a new pipeline run, because an Amazon EventBridge event monitors for commits. After a few moments, we can monitor the run by choosing the pipeline inside the SageMaker project.
git commit
Source control
The following screenshot shows our pipeline details.
Pipeline
Model Stages
Choosing the pipeline run displays the steps of the pipeline, which you can monitor.

Step 6:

When the pipeline is complete, you can go to the Models tab inside the SageMaker Studio and inspect the model artifacts
If everything looks fine then, we can manually approve the model by selecting the version of the model and update the deployment status and move it to approved.
Model Approval
Model approval
This will deploy to staging and the model is tested as well Once these steps are done you can manually approve the model status in the codepipeline which will trigger the ModelDeploy pipeline and exposes an endpoint for real-time inference and the pipeline looks this way
This is the pipeline looklike
Pipeline stages
Navigate to endpoints tab you will be able to see the production and staging environment endpoints for the inference.
endpoints
endpoints

Conclusion

SageMaker Pipelines enables teams to leverage best practice CI/CD methods within their ML workflows. To learn more about SageMaker Pipelines, check out the website and the documentation.
 

Comments