0% found this document useful (0 votes)
2K views

Democratizing AI, and Surviving Titanic With Automated Machine Learning - Adnan Masood

Uploaded by

scott
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2K views

Democratizing AI, and Surviving Titanic With Automated Machine Learning - Adnan Masood

Uploaded by

scott
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Democratizing AI, and Surviving Titanic with

Automated Machine Learning


Adnan Masood, PhD.
@adnanmasood
[email protected]
Adnan Masood, Ph.D. - Chief Architect - Artificial
Intelligence and Machine Learning at UST Global,
Visiting Scholar at Stanford University, and Microsoft
MVP (Most Valuable Professional) for AI.

Dr. Adnan Masood


Deploy
Select Data
Models
Analyze Fetch Data driven Prepare Select Train Evaluate
Workflow Explore Data (Training
Problem And Store Business Data model Models Models
and
Metrics
Inference)

Automatic Automated
Organization Data Model Standard Kubeflow
Connectors metrics data Marketplace
specific cleansing versioning Metrics Pipeline
definition exploration

Universal Data Experiment Integrate Model Custom Kubernetes


Auto ML
Model Versioning custom code comparison metrics Deployment

Data Standard Storage of Spark


Visualization Marketplace performance/ metrics with
versioning Deployment
accuracy data & Model
metrics versions

Data Monitor
versioning Custom Performance
metrics

Vectorization Model pause CI/CD


& restart

Monitor resi-
dual errors

Reliability, Availability, Serviceability, Security, Performance, Management Console


“AutoML is a quiet revolution in AI…”
ML still requires a lot of manual programming
AutoML aims to automate the entire ML workflow
Default parameters are almost always bad
AutoML handles this for you!
AutoML is a huge time-saver
AutoML handles (some of)
this for you!

Image source: R. Olson & W. La Cava et. al. (2017) “Data-driven advice for applying machine learning to bioinformatics problems.”
AutoML is a huge time-saver
AutoML handles (some of) this for you!

Image source: visit.crowdflower.com/data-science-report.html


The business case for AutoML

Image source: kaggle.com/surveys/2017


Early AutoML focused on only parameter tuning

Image source: R. Olson et. al. (2016) “Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science.”
Early AutoML focused on only parameter tuning

… and maybe (limited) model selection


We mostly used grid search and random search

Nowadays, we wouldn’t really call this AutoML


Modern AutoML optimizes the entire ML workflow

Image source: R. Olson et. al. (2016) “Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science.”
Modern AutoML optimizes the entire ML
workflow
Open source AutoML tools
▪ auto-sklearn [Python]
 Bayesian optimization over a fixed 3-step ML pipeline
 github.com/automl/auto-sklearn
▪ auto-Weka [Java]
 Similar to auto-sklearn, but built on top of Weka
 github.com/automl/autoweka
▪ TPOT [Python]
 Genetic Programming over a configurable ML pipeline
 github.com/rhiever/tpot
▪ H2O.ai AutoML [Java w/ Python, Scala, & R APIs and web GUI]
 Basic data prep w/ mix of grid and random search over ML algorithms
 github.com/h2oai/h2o-3
▪ devol [Python]
 Deep Learning architecture search via Genetic Programming
 github.com/joeddav/devol
© Microsoft Azure + AI Conference All rights reserved.
AutoMLaaS: Commercial AutoML tools
▪ DataRobot
 Web-based interface
 Fixed search over thousands of ML pipelines
▪ H2O.ai Driverless AI
 Web-based interface
 H2O.ai AutoML + better feature construction
▪ Google AutoML
 Integrated in the Google Cloud Compute platform
 DNN architecture search
▪ SAS Factory Miner
 Fixed search over a handful of ML methods
▪ IBM SPSS Modeler
 Basic automated data preparation and ML modeling

© Microsoft Azure + AI Conference All rights reserved.


AutoML in the near future
▪ AutoML will also handle most of the data cleaning process
 Unstructured data → tabular data ready for analysis
 Capture & automate human approaches to data cleaning

▪ AutoML will vastly improve Deep Learning


 Automated DNN architecture design
 Automated preprocessing of data prior to modeling

▪ AutoML will scale to large datasets


 AutoML is very slow right now on “Big Data”
 Spark, dask, TensorFlow, etc. will help bring AutoML to scale

▪ AutoML will become human-competitive


 Already human-competitive on several Kaggle challenges
 Already human-competitive in DNN architecture design (Google AutoML)

© Microsoft Azure + AI Conference All rights reserved.


AutoML in the future
▪ AutoML will transform the practice of data science as we know it
 “Data Science Assistant” → Junior Data Scientist level
 Less focus on choosing the right ML workflow
 More focus on posing the right questions, collecting & curating the right data, and
“thinking like a data scientist”

▪ AutoML will become productized


 Not AutoMLaaS!
 “Siri, set an alarm for 6am” → “Siri, set an alarm for the best time for me to wake up”
 “Siri, [given my personal medical history] should I worry about this rash on my face?”

▪ AutoML is only a small part of a greater meta-learning movement


 Computer programming is focused on automating rote tasks
 Machine learning is focused on automating the automation of rote tasks
 Meta-learning is focused on automating the automation of automation
 i.e., enabling the machine to learn how to learn in the best way possible

© Microsoft Azure + AI Conference All rights reserved.


Questions

© Microsoft Azure + AI Conference All rights reserved.


Please use EventsXD to fill out a session evaluation.

Thank you!

© Microsoft Azure + AI Conference All rights reserved.

You might also like