Automated Machine Learning
Automated Machine Learning
1
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
2
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
3
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
Model Selection
A critical and often challenging step in machine learning is selecting the right model
type and architecture that best suits the specific task at hand. With numerous model
types and architectural variations, finding the optimal configuration among countless
possibilities requires careful evaluation. This process significantly impacts
performance, as the chosen model must balance accuracy, efficiency, and task-specific
constraints to achieve the best results.
4
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
In the context of AutoML, this challenge can be addressed either through Automated
Model Selection, which applies to a range of model types, or, if the focus is specifically
on neural networks, through Neural Architecture Search (NAS). Both often need to
be performed in conjunction with Hyperparameter Optimization (HPO) to fine-tune
the selected models or architectures for optimal performance.
Model selection is the process of choosing the best-performing model from a set of
predefined models or configurations. It involves evaluating multiple existing models
and comparing their performance on a given dataset. The candidate models are
typically predefined by the practitioner, and the selection is carried out through
systematic evaluation (often exhaustive or brute force evaluation), with early
termination based on performance thresholds to save computational resources.
• Early termination in model selection refers to stopping the evaluation of a
model or configuration before it completes its full process if it becomes evident
that it is unlikely to outperform other candidates. This technique helps save
computational resources by avoiding unnecessary evaluations of sub-optimal
models. Early termination can be achieved through various methods, including:
1. Performance Thresholds: Stop training if performance metrics (e.g.,
validation loss) fail to improve after a set number of iterations.
2. Budget Constraints: Halt models exceeding allocated resources, such as
time or computational cost, without showing promising results.
3. Confidence Intervals: Use statistical methods to stop evaluations when
the likelihood of outperforming the best model so far is low.
Neural Architecture Search (NAS) is a systematic method for creating and evaluating
new neural network architectures to find the best one for a given problem. It often
incorporates Hyperparameter Optimization (HPO) as part of the process to fine-tune
the architectures for optimal performance. NAS is highly automated, often requiring
minimal manual intervention beyond defining the search space and objective.
The NAS process involves the following key steps:
1. Defining the Search Space: The search space specifies the set of all possible
neural network architectures that can be explored. This includes options like the
number and type of layers (e.g., convolutional, recurrent), their connections
(e.g., sequential, skip connections), activation functions, and more.
5
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
6
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
7
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
Meta-learning
8
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
Pipeline Automation
9
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
Auto-sklearn
Auto-sklearn automates the process of selecting and tuning machine learning models
using scikit-learn's library of algorithms. It excels in structured datasets and provides
robust solutions for supervised learning tasks like classification and regression. Its
strength lies in its built-in ensemble learning capabilities and use of Bayesian
optimization to identify the best models.
TPOT
TPOT uses genetic algorithms to optimize machine learning pipelines, making it
particularly useful for automating feature engineering and model selection. Its
evolutionary approach iteratively improves pipeline performance, which is ideal for
complex datasets where unconventional pipeline configurations may outperform
standard methods.
H2O AutoML
H2O AutoML offers a fully automated end-to-end machine learning pipeline that
includes data preprocessing, model training, hyperparameter tuning, and model
stacking. It's especially powerful for large-scale enterprise applications, thanks to its
scalability and support for distributed computing, making it suitable for big data and
cloud environments.
PyCaret
PyCaret is a low-code AutoML library designed for Python, simplifying model
development for beginners and rapid prototyping. It covers the entire machine
learning pipeline, from data preprocessing to deployment, and is ideal for users
looking for ease of use with minimal coding in scenarios like business analytics and
small-scale projects.
Google Cloud AutoML
Google Cloud AutoML provides scalable, cloud-based solutions for automating
machine learning tasks. It supports diverse data types, including structured data, text,
images, and video, making it highly suitable for businesses seeking customizable and
production-ready models integrated with Google Cloud's infrastructure.
10