logo
Menu

Create and run SageMaker pipelines using AWS SDKs

Amazon SageMaker is powerful for machine learning, but did you know you can automate SageMaker operations using pipelines with AWS SDKS?

rlhagerm
Amazon Employee
Published Aug 18, 2023
Last Modified Apr 23, 2024
Are you using Amazon SageMaker for machine learning operations? If you are, did you also know you can automate your ML operations using an Amazon SageMaker Model Building Pipeline, and deploy and run those pipelines using AWS SDKs?
A new example scenario demonstrating how to work with Amazon SageMaker pipelines and geospatial jobs is available now as part of the AWS SDK Code Example Library.

What is the code example library?

The code example library is a collection of code examples that shows you how to use AWS software development kits (SDKs) with your development language of choice to work with AWS services and tools. You can download all of the examples, including this Amazon SageMaker scenario, from the GitHub repository.
The code in this example uses AWS SDKs to set up and run a scenario that creates, manages, and executes a SageMaker pipeline. A SageMaker pipeline is a series of interconnected steps that can be used to automate machine learning workflows. You can create and run pipelines from SageMaker Studio using Python, but you can also use an AWS Software Development Kit (SDK). Using the SDK, you can create and run SageMaker pipelines and also monitor pipeline operations.
To follow along with this example, clone the repository and choose your preferred language SDK from the list below. Each language link will take you to a README file which includes setup instructions and prerequisites for that specific version.

Amazon SageMaker Pipelines

A SageMaker pipeline is a series of
interconnected steps that can be used to automate machine learning workflows. Pipelines use interconnected steps and shared parameters to support repeatable workflows that can be customized for your specific use case.

Explore a runnable scenario

This example scenario demonstrates using AWS Lambda and Amazon Simple Queue Service (Amazon SQS) as part of an Amazon SageMaker pipeline. The pipeline itself executes a geospatial job to reverse geocode a sample set of coordinates into human-readable addresses. Input and output files are located in an Amazon Simple Storage Service (Amazon S3) bucket.
Pipeline image
When you run the example console application, you can execute the following steps:
  • Create the AWS resources and roles needed for the pipeline.
  • Create the AWS Lambda function.
  • Create the SageMaker pipeline.
  • Upload an input file into an Amazon S3 bucket.
  • Execute the pipeline and monitor its status.
  • Display some output from the output file.
  • Clean up the pipeline resources.
All of these steps are executed by the language code from the repository. For example, in the .NET solution the scenario is available within the SageMakerExamples.sln solution. You can run the example as-is or experiment by making changes to the pipeline, Lambda function, or input and output processing options.
Code image
The console example can be executed directly from your IDE, and includes interactive options to support running multiple times without having to create new resources. For .NET, run the example by selecting Start Debugging in the toolbar or by using the command dotnet run. The example will guide you through the setup and execution of the pipeline.
After the pipeline has completed an execution, sample output is also displayed. Then interactive options guide you through cleaning up the resources.

What features are included in the example?

The following sections describe some primary features of this code example. This, and related information, can also be found within each language README within the GitHub repository.

Pipeline steps

Pipeline steps define the actions and relationships of the pipeline operations. The pipeline in this example includes an AWS Lambda step
and a callback step.
Both steps are processed by the same example Lambda function.
The Lambda function handler is included as part of the example, with the following functionality:
  • Starts a SageMaker Vector Enrichment Job with the provided job configuration.
  • Processes Amazon SQS queue messages from the SageMaker pipeline.
  • Starts the export function with the provided export configuration.
  • Completes the pipeline when the export is complete.
Pipeline image

Pipeline parameters

The example pipeline uses parameters that you can reference throughout the steps. You can also use the parameters to change
values between runs and control the input and output setting. In this example, the parameters are used to set the Amazon Simple Storage Service (Amazon S3)
locations for the input and output files, along with the identifiers for the role and queue to use in the pipeline.
The example demonstrates how to set and access these parameters before executing the pipeline using an SDK.

Geospatial jobs

A SageMaker pipeline can be used for model training, setup, testing, or validation. This example uses a simple job
for demonstration purposes: a Vector Enrichment Job (VEJ) that processes a set of coordinates to produce human-readable
addresses powered by Amazon Location Service. Other types of jobs can be substituted in the pipeline instead.

Try it for yourself

Using the AWS SDK, you can manage, execute, and update AWS SageMaker pipelines in your preferred programming language. Using pipeline parameters and steps alongside SDK actions, you can utilize the features of AWS SageMaker machine learning in a dynamic way that can be integrated into larger systems.
Check out the example code library for this and other detailed examples for your AWS services.
 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments