Democratizing AI, and Surviving Titanic With Automated Machine Learning - Adnan Masood
Democratizing AI, and Surviving Titanic With Automated Machine Learning - Adnan Masood
Automatic Automated
Organization Data Model Standard Kubeflow
Connectors metrics data Marketplace
specific cleansing versioning Metrics Pipeline
definition exploration
Data Monitor
versioning Custom Performance
metrics
Monitor resi-
dual errors
Image source: R. Olson & W. La Cava et. al. (2017) “Data-driven advice for applying machine learning to bioinformatics problems.”
AutoML is a huge time-saver
AutoML handles (some of) this for you!
Image source: R. Olson et. al. (2016) “Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science.”
Early AutoML focused on only parameter tuning
Image source: R. Olson et. al. (2016) “Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science.”
Modern AutoML optimizes the entire ML
workflow
Open source AutoML tools
▪ auto-sklearn [Python]
Bayesian optimization over a fixed 3-step ML pipeline
github.com/automl/auto-sklearn
▪ auto-Weka [Java]
Similar to auto-sklearn, but built on top of Weka
github.com/automl/autoweka
▪ TPOT [Python]
Genetic Programming over a configurable ML pipeline
github.com/rhiever/tpot
▪ H2O.ai AutoML [Java w/ Python, Scala, & R APIs and web GUI]
Basic data prep w/ mix of grid and random search over ML algorithms
github.com/h2oai/h2o-3
▪ devol [Python]
Deep Learning architecture search via Genetic Programming
github.com/joeddav/devol
© Microsoft Azure + AI Conference All rights reserved.
AutoMLaaS: Commercial AutoML tools
▪ DataRobot
Web-based interface
Fixed search over thousands of ML pipelines
▪ H2O.ai Driverless AI
Web-based interface
H2O.ai AutoML + better feature construction
▪ Google AutoML
Integrated in the Google Cloud Compute platform
DNN architecture search
▪ SAS Factory Miner
Fixed search over a handful of ML methods
▪ IBM SPSS Modeler
Basic automated data preparation and ML modeling
Thank you!