Chapter 02 Overview - 4
Chapter 02 Overview - 4
Shmueli, Bruce,
© Galit Shmueli, Peter Bruce and AmitDeokar,
Deokar 2023 & Patel
Core Ideas in Machine
Learning`
●Classification
●Prediction
●Association Rules & Recommenders
●Data & Dimension Reduction
●Data Exploration
●Visualization
Paradigms for Machine Learning
(variations)
●SEMMA (from SAS)
• Sample
• Explore
• Modify
• Model
• Assess
●CRISP-DM (SPSS/IBM)
• Business Understanding
• Data Understanding
• Data Preparation
• Modeling
• Evaluation
• Deployment
Supervised Learning
●Goal: Predict a single “target” or “outcome”
variable
●Categorical
●Naïve Bayes can use as-is
●In most other algorithms, must create n or n-
1 binary dummies
Data Pre-processing in RM - West Roxbury
data
Detecting Outliers
Data store (data warehouse or data lake); also Analytic Base Table
(ABT) with derivatives more suited to analysis
AI Engineering, ML-Ops
Tools to create the ABT derivatives that are in the Data Collection layer.
AI Engineering, ML-Ops
Delivery is how user views the system (text file, spreadsheet, interface
with Tableau or Power BI, …)
Summary
⚫ Machine Learning consists of supervised methods
(Classification & Prediction) and unsupervised methods
(Association Rules, Data Reduction, Data Exploration &
Visualization)
⚫ Before algorithms can be applied, data must be explored
and pre-processed
⚫ To evaluate performance and to avoid overfitting, data
partitioning is used
⚫ Models are fit to the training partition and assessed on
the validation and holdout partitions
⚫ Machine Learning methods are usually applied to a
sample from a large database, and then the best model
is used to score the entire database
⚫ Once a model is developed, AI Engineering (ML-Ops)
skills and tools are required to deploy it