We Built and Deployed Lead Scoring ML model - Here's What and How

We Built and Deployed Lead Scoring ML model - Here's What and How

Use SageMaker and AutoML to automate the prioritization and customer targeting in Marketing.

Dennis Liang
Amazon Employee
Published Jan 16, 2024
Last Modified Jan 17, 2024
Authors: Oshry Ben-Harush, Joe Standerfer, Dennis Liang, Priyanvada Barve, Yuan Feng, Tim Wu
Marketing and Sales teams’ goal is to optimize their lead funnel and target leads with the likelihood of converting to a sales opportunity. A lead is a potential customer who has expressed interest in a company's product or service. As a potential solution to this challenge, we developed and deployed Machine Learning (ML) solutions. With Amazon SageMaker and AutoML, marketers can build lead scoring models to identify potential opportunities and prioritize leads more effectively. By defining the target variable, gathering and cleaning data, and training and deploying the model, businesses can gain insights into which features are most predictive of conversion and optimize their marketing efforts. This allows us to reach out to customers and address their needs faster.
In this post, we will cover a step-by-step detailed walkthrough of how this solution was designed and implemented from problem definition to deployment.

Why score and prioritize leads ?

As organizations grow, effective lead prioritization requires further attention from sales and marketing teams. With limited bandwidth across a growing customer base, it becomes critical for sellers and marketers to efficiently prioritize high-value leads. Improved lead prioritization enables sales and marketing to optimize their outreach, focusing on the most promising potential customers that marketing and sales team could benefit from future connections. Rather than replying on simplistic rule-based scoring, we are proposing a predictive ML solution to integrate customer data with demographics, company information, and online and offline event engagement. By taking a data-driven approach to lead scoring, sales and marketing teams can ensure they are allocating resources to reach the right customers at the right scale.

Solution Overview

We outline the high-level ML solution on lead scoring and prioritization in the diagram. As outlined in the previous section, the model objective is to predict the propensity. We build two pipelines in parallel to achieve the goal: training with historical records, and make inference with incoming leads. For feature extraction/engineering, we establish the same process so that common functions are shared between training and inference pipelines. The trained model is deployed through Amazon SageMaker Endpoint for real-time inference.

Data Processing and Workflow

Lead data is in a secure Amazon Redshift. We clean, transform and create model-ready input features at the lead level from demographic, online (documentation viewing) and offline (marketing & sales) engagement, as well as product usage behavior. We leverage AWS Glue to author, orchestrate, schedule and monitor data ETL (Extract-Transform-Load CI/CD) workflows. Once the post-processed data from AWS Glue is generated, we store output into a dedicated S3 location.
Additionally, data preparation step for training and inference are different. The training script learns from the historical data and defines “what” and “how” in feature engineering. We store the feature engineering job, as well as model artifact in the endpoint. For real-time inference, the incoming leads trigger the EventBridge, are dispatched and sent to Orchestration service, which invoke the matched model endpoint in SageMaker. The inference endpoint takes care of the feature transformation learned from training, deliver the prediction of a leads, and store the record in DynamoDB for analysis and reporting.

Model Training and Evaluation

To rapidly explore the potential for predicting which leads will be qualified by sales, we used Amazon SageMaker Autopilot. Using Amazon SageMaker Autopilot allowed us to rapidly evaluate multiple ML models as well as combination of models over a data set that contained a combination of numerical and factor variables and many missing values.
The source data set contained 300k+ rows and 100+ features. This task is time dependent, i.e., upon deployment, we train the model using all the historical data available prior to a point in time; We then apply the model to infer on future leads. So, to keep close to the production constraints, we split the data to train and test in time. A threshold date is determined for splitting the training (any lead before or on the threshold date) and test (any lead after the threshold date) sets.
We then created the Amazon SageMaker auto-ml job, and waited for the job to find the best model, optimizing for the Area Under the Curve (AUC metric). The main business metric for this use case is to correctly identify as many sales qualified leads as possible while maintaining false positive to the minimum, so we ended up using the Area Under the Curve (AUC).
Once finished, we examined the results in Amazon SageMaker Studio. The AutoPilot job provided information for the many different models and hyper-parameters that were attempted and highlighted the model with the best performance on the training and validation sets.
By default, inference containers are configured to only generate the predicted label, though, for calculating and visualizing the ROC, we required prediction probabilities in addition to the predicted labels. To select additional inference content, we can update the inference_response_keys parameter to include up to these three environment variables:
  • SAGEMAKER_INFERENCE_SUPPORTED: This is set to provide hints to you about what content each container supports.
  • SAGEMAKER_INFERENCE_INPUT: This should be set to the keys that the container expects in input payload.
  • SAGEMAKER_INFERENCE_OUTPUT: This should be populated with the set of keys that the container outputs.
One important aspect in machine learning development and usage is interpretability. Our customers are interested in understanding the reason for the model’s decisions. For explaining the model’s outcomes, we use SHAPley values:
  • Calculate SHAP values for the full dataset.
  • Group binary features to entire categories. Find feature importance for categorical and numerical features by taking the sum of absolute values of SHAP, and rank by these importance;
  • Create sub-plots under each aggregated feature to show the binary (or bucketed numerical) average feature contributions to the target;
  • Show subplots of the top X contributors and detractors.
We found that the Shapley explainers for grouped binary features are the most interpretable for business users. It allows us to:
  • rank the importance of the full categorical features against numerical;
  • provides directionality of feature contributions;
  • provides the most amount of information in a condensed format.

Solution Deployment

Building, iterating and deploying a machine learning model can be time consuming and resource intensive that includes multiple manual steps such as data collection, data process, training, model evaluation and deployment to inference endpoint. To rapidly build, iterate and deploy ML models with quality, we apply the ML Ops CI/CD idea help us automate the full life cycle of ML models.

Automated Data Collection

As mentioned earlier, we leveraged an internal workflow management service to schedule data queries on Amazon Redshift regularly and load it to a dedicated S3 bucket. We set up S3 Event Notification to alert us about new datasets collected. The notification will invoke a dedicate lambda to process it and trigger training workflows accordingly.

Build Training Workflow in Step Function

A machine learning training workflow often involves multiples steps that can be complex and time-consuming to manage. Step Functions provides a way to simplify this process by breaking the workflow into smaller and more manageable steps that can be executed in parallel. So, we broke down the lead scoring model training workflow into multiple steps, had them hosted by Amazon SageMaker jobs and use Step Function to chain each Amazon SageMaker jobs. Having Step Function to manage training workflow also enables us to trigger the training by any event or from anywhere and extend to be integrated with other workflows.

Off To Production

To deploy the model to production, we further extended the training step function to include the step to create model in Amazon SageMaker and a step to initiate a AWS Fargate task to create or update Amazon SageMaker endpoint since endpoint creation or update can be time consuming.
By leveraging Amazon SageMaker and Step Function, we built the fully automated ML Ops CI/CD pipeline which continuous train models regularly and deploy trained models rapidly to mitigate the risk that the data in real world drifted from what we used for model training so that our ML model can always predict with quality.


In this post, we have covered how by using Amazon SageMaker Autopilot, we can train and deploy new models in a matter of weeks. These models train on older Sales and Marketing data. Sellers can now consume information from these models and quickly prioritize customers that have the highest chance of revenue generation, which increases their productivity and drive to revenue faster.
Overall, Amazon SageMaker AutoPilot allowed us to rapidly experiment and evaluate multiple machine learning models and ensembles of models over rough, unclean data. Having AutoPilot clean the data, identify and extract relevant features from numerical and textual columns and fit and evaluate multiple models using that data. The results are far from perfect, but helps establish confidence with the relevancy of the data and the ML/AI direction taken to solve the problem.

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.