AutoML is a technique where customers bring their data and walk away with a model without worrying about the complex workflow involved in training machine learning models. It dramatically simplifies the approach of data preparation, feature engineering, model selection, and hyperparameter tuning based on proven algorithms.
Public cloud-based ML Platform as a Service (PaaS) offerings such as Azure ML, IBM Watson Studio and Google Cloud AI have an AutoML component. AWS was late to bring AutoML capabilities to its SageMaker platform. Since the announcement of SageMaker AutoPilot in 2019, Amazon has been consistently improving the AutoML capabilities of its managed ML platform.
With the recent addition of SageMaker JumpStart, AWS now has the entire spectrum of AutoML capabilities covering the areas of regression, classification, vision and natural language processing.
Though AWS doesn’t officially call Amazon SageMaker Autopilot and JumpStart services AutoML, they are alternatives to Azure AutoML and Google Cloud AutoML.
Amazon SageMaker Autopilot targets scenarios such as sales forecasting, recommendation systems, call center routing, and advertisement optimization that rely on datasets typically stored in CSV files, relational databases, and NoSQL databases.
Based on XGBoost and Linear Learner algorithms, Autopilot is ideal for dealing with linear regression, logistic regression and binary or multivariate classification problems. With the addition of deep learning algorithms, Autopilot can handle complex data that is not linearly separable.
The key differentiator of Amazon SageMaker Autopilot is the auto-generation of the notebooks as part of the AutoML workflow. Customers can create an Autopilot job only to generate notebooks instead of running the entire process. These notebooks are based on the standard, open source Juypter notebooks popular in the data science community. Developers and data scientists can download the notebooks to understand how the data preparation was done and the algorithm used for individual pipelines built for each candidate.
Announced at re:Invent 2020, Amazon SageMaker JumpStart is the latest addition to the Amazon SageMaker Studio, the integrated ML development platform for AWS customers. While Amazon SageMaker Autopilot deals with structured data typically stored in a tabular format, SageMaker JumpStart focuses on vision and NLP domains.
There are three components to Amazon SageMaker JumpStart – open source model deployment, solutions and customized models built from existing open source models based on smaller, custom datasets.
Amazon tapped into the official model zoo offered by TensorFlow and PyTorch to provide open source model deployment. There are 150+ models available within SageMaker Studio that can be deployed with a single click. AWS downloads the models, registers them with SageMaker, and exposes an endpoint for inference.
For example, you can expose a ResNet or a MobileNet SSD model for image classification and object detection with just one click. Once the model is deployed, SageMaker points you to a Jupyter Notebook with sample code to invoke the inference endpoint.
Deploying existing computer vision models trained with public datasets such as ImageNet or CIFAR-100 may not be very useful for businesses. They need models trained with custom datasets aligned with a specific business problem. For example, an organization might need to identify people with no masks waiting in the reception area. There is no publicly available model to reliably detect faces with no mask.
With Amazon SageMaker JumpStart, customers can bring a labeled dataset and fine-tune an existing open source model to meet their requirements. This can be achieved by simply uploading the images to an Amazon S3 bucket and pointing SageMaker JumpStart to it. This approach does not demand large datasets. With at least 100+ images for each class, you are guaranteed to get an accurate model.
Behind the scenes, Amazon SageMaker JumpStart uses transfer learning, which is a proven technique for AutoML. If you are a data scientist or an ML engineer familiar with hyperparameter tuning, you can also tweak some settings such as the learning rate and the number of epochs used for fine-tuning the model.
Almost all of the available AutoML solutions rely on transfer learning, But what’s unique about Amazon SageMaker JumpStart is the transparency. You can choose any model that supports retraining and customize it through the fine-tuning option available in SageMaker studio. You can also select the type of EC2 instance used for transfer learning and inference.
Both Amazon SageMaker Autopilot and JumpStart have native Python SDK for integration with Jupyter Notebooks which makes data scientists feel at home.
The models deployed through SageMaker JumpStart can be optimized for cloud and edge deployments through SageMaker Neo, the component of the platform that’s meant to compile the models for diverse environments.
AWS is continuously adding features and capabilities to Amazon SageMaker to make it one of the most comprehensive managed ML platforms. With an emphasis on transparency and explainability, SageMaker has got the best AutoML capabilities in the form of Autopilot and JumpStart.