1 Tailieuthamkhao MachineLearning
1 Tailieuthamkhao MachineLearning
RapidMiner Education
Introduction
Course Introduction
Introduction to RapidMiner
1. Purpose
2. Platform Overview
3. Basics of using RapidMiner Studio
4. Continued learning
Introduction
Target Audience
At the end of this course, you should be able to understand, and are
able to use, the following machine learning tools in RapidMiner Studio’s
Process Designer and Auto Model:
• Classification and Regression
• Split Validation
• Scoring
• Correlations
• Feature Importance
• Clustering and Association Analysis
Course Outline
1. Introduction
2. Introduction to Machine Learning
3. Supervised Learning
4. Deployment & Scoring
5. Unsupervised Learning
6. Feature Engineering
7. Auto Model
Getting Started with RapidMiner
RapidMiner Platform
RapidMiner Market Place
Industry, Application & ML Extensions
Web Services
Engine User/Group Access
Rights management
Process Scheduler
RapidMiner Radoop
Compile + Execute in Hadoop
Process Execution
Integrate using Web Service and SQL operators
Engine
Server Application Java SE/EE Application
RapidMiner AI Cloud
RapidMiner Radoop
Compile + Execute in Hadoop Databases / DWHs Application (BI, ERP,
Managed Services CRM…) / Portal
AWS
Use any data
Azure
R / Python / SQL Scripting
Run in multiple In-Memory/H2O/Weka
Compute Engines In-Hadoop & Spark
Creating Repositories & Folders
• Repositories – Collection of Projects
- Local Repository
- Server Repository
• Best practice: Create a folder for each Project under the Repository
and within this folder, a sub-folder for – Data, Processes & Results (and
more)
RapidMiner Studio
Visual Workflow Designer for Data Scientists
• RapidMiner Academy:
https://academy.rapidminer.com
• Online Documentation:
https://docs.rapidminer.com/studio
• Online Community:
https://community.rapidminer.com/
Introduction to Machine
Learning
Introduction to Machine Learning
1. Introduction 1. Introduction
2. Introduction to Machine 2. k-NN
Learning 3. Model Validation
3. Supervised Learning 4. Normalize & Group Models
4. Deployment & Scoring
5. Unsupervised Learning
6. Feature Engineering
7. Auto Model
Introduction to Machine Learning
Unsupervised vs Supervised Learning
Can we add structure? Is the ? an or an ?
? ?? ?
? ? ?
? ?
?
?? ? ??
? ?
? ?? ? ??
?
?
Machine Learning
Feature Engineering
Supervised Learning
Classification Regression
(aka Predictive Is this A or B? How much or how many?
Will this be A or B? How many will happen?
Analytics) Feature
Generation
Create useful attributes
Feature Selection
Outlier Associations & Weight & Select Attributes
Unsupervised Clustering
How is this organized? Detection Correlations
Learning What belongs together? Is this weird?
What happens together?
What belongs together?
Underfit and Overfit
Simple Complex
P4
P2
P1
P3
Distance Types & Measures
• Numerical
• Nominal
• Mixed
• Bregman Divergence
Splitting Data for Training & Testing
70%
30%
RapidMiner Studio
Visual Workflow Designer for Data Scientists
Conclusion
Splitting Data for Training & Testing
70%
30%
Training vs Testing Error
New Data
• Performance can only be
measured by testing
predictions with new data
Error Rate
Good
• When validating a model,
ignore performance on Training
Data
Training • Don’t over-optimize. Everything
Data you optimize needs to be
Underfit Overfit validated!
Model Complexity
Training vs Testing Error
http://gerardnico.com/wiki/data_mining/overfitting
Anatomy of Machine Learning
Modeling
Evaluation
Data Preparation
Deployment
Performance Measurement - Accuracy
TP FP
FN T
N
Accuracy = (TP+TN)/(TP+FP+TN+FN)
Class Precision = TP/(TP+FP) or TN/(TN+FN)
Class Recall = TP/(TP+FN) or TN/(TN+FP)
Performance Measurement - Costs
• Range Transformation
• Proportional (Sum)
• Interquartile Range
• Z-Transformation
http://www.statistics4u.info/fundstat_eng/ee_ztransform.html
Group Models
α = intercept
β = slope
yi =α + β xi + εi
Analytical solution:
′
Linear Regression Analytical solution:
′
dominates RSS
Ridge Regression
& '(( ) *+ ',(
Helps with multicollinearity
Converting Nominal to Numerical Data
Converting Nominal to Numerical Data
Dummy Coding
Effect Coding
Converting Nominal to Numerical Data
Integer Coding
• Logit: ln[p/(1-p)] = a + BX
- ./01
• Logistic: p =
2- ./01
• where:
- ln is the natural logarithm, loge
- p is the probability that the event Y
occurs, p(Y=1)
- p/(1-p) = "odds ratio"
- ln[p/(1-p)] = log odds ratio, or "logit"
• Otherwise like a linear model
• Resulting B coefficient is the effect
on the “odds ratio”
Converting Nominal to Binominal Data
RapidMiner Studio
Visual Workflow Designer for Data Scientists
prediction
Golf data set – When to play golf?
?
Likelihood Calculation
Outlook Yes No Temperature Yes No Humidity Yes No Wind Yes No Yes No
sunny 2 3 Cool 4 1 Normal 4 1 false 6 2 9 5
overcast 4 0 Mild 3 3 High 5 4 true 3 3
rain 3 2 High 2 1
Probabilities
sunny 0.2 0.6 Cool 0.4 0.2 Normal 0.4 0.2 false 0.7 0.4 0.6 0.4
overcast 0.4 0.0 Mild 0.3 0.6 High 0.6 0.8 true 0.3 0.6
rain 0.3 0.4 High 0.2 0.2
36.4%
63.6%
Bayes’ Rule
Constant
Prior probability likelihood Evidence across
Posterior probability classes!
3 45 3 7845
3 45 |7
3 7 likelihood
3 45 3 7845
class instance
Prior probability
evidence compare
prediction
RapidMiner Studio
Visual Workflow Designer for Data Scientists
lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997
Extrapolation
No? Yes?
Num >= 2
Number of Cost
Candy Bars
(Num)
1 $1 No? Yes?
$1 Num >= 3
2 $2
3 $3 $2 $3
Decision Tree Operator
RapidMiner Studio
Visual Workflow Designer for Data Scientists
Age
Last Transaction
The one with the
highest value is chosen
as predicted class
Input layer Hidden layer(s) Output layer
How does it work?
Weight positive or negative
Age α1
α2
Gender Loyal
Last Transaction
How does it work?
Age
β1
β3
Gender Loyal
Last Transaction
How does it work?
f1 = f(α1 . Age, β1 . Avg Transaction, . . . )
Age
Gender Loyal
Last Transaction
How does it work?
f2 = f(α2 . Age, β2 . Avg Transaction, . . . )
Age
f1
Gender Loyal
f3
Last Transaction
How does it work?
Age
f1
f4
Avg Transaction Churn
f2
f5
Gender Loyal
f3
Last Transaction
How does it work?
Age
25 f1
f4
Avg Transaction Churn
12.5 f2
Loyal
f5
Gender Loyal
1 f3
Last Transaction
10-08-2015
Where do Weights Come From?
• Performance is measured by
some loss or cost function that
tells how bad our perceptron is
doing – we want to minimize
loss
• Loss and Activation functions
are chosen to make calculating
the slope trivial
• The slope indicates direction of
reduced loss
How Does Optimization Work?
• Gradient descent
• We can’t analytically solve the
Loss, cost, or
1
3 for a minimum cost, but for a
given set of weights, we can
error
Weight value
Learning Rate
steps
1
• A high Learning Rate may skip
error
1. Introduction 1. Deployment
2. Introduction to Machine 2. Scoring
Learning
3. Supervised Learning
4. Deployment & Scoring
5. Unsupervised Learning
6. Feature Engineering
7. Auto Model
Deployment
Deployment Overview
Terminology
• The model is trained,
validated, and ready for
production
• A deployment is a place for
models of one purpose
• A deployment location is a
place for deployments of
similar access methods
• Deploy is the action of putting
a model in its place
Deployment in RapidMiner
Transaction Score
Scoring
Initialization
? ?? • Randomly pick k new points
? ? ?
? ? C
and assign them to k unique
clusters
?
• Each of these k points become
A ? ? the centroid of their own
? ?
? ? ? cluster
? ?? ? ??
? B
?
k-Means Clustering
Segmentation
? ?? • For each observed point, find
? ? ?
? ? C
which centroid it is closest to,
and assign it to that cluster
?
A ? ?
? ?
? ? ?
? ?? ? ??
? B
?
k-Means Clustering
Segmentation
? ?? • For each observed point, find
? ? ?
? ? C
which centroid it is closest to,
and assign it to that cluster
?
A ? ?
? ?
? ? ?
? ?? ? ??
? B
?
k-Means Clustering
Segmentation
? ?? • For each observed point, find
? ? ?C which centroid it is closest to,
? ? and assign it to that cluster
?
A? ? ? ?
? ? B ?
? ?? ? ??
?
?
k-Means Clustering
Segmentation
? ?? • For each observed point, find
? ? C
? which centroid it is closest to,
? ? and assign it to that cluster
?
?? ? ??
A ? B ?
? ?? ? ??
?
?
k-Means Clustering
k sets number of
clusters
Item 2
FP-Growth
Root
Feature Engineering
attributes
- Correlation
- Information Gain
- Relief
Feature Generation
Feature Generation or Feature Engineering is the process of transforming raw
data in order to make it more useful or more stable for predictive modeling
purposes
• Select approaches to feature engineering:
- Functional transformations
- Counts, sum, average, min/max/range,
ratios
- Interaction effect variables
- Binning continuous variables
- Combining high cardinality nominal
variables
- Date/time calculations
Automatic Feature Engineering tools can create many new features using these
techniques
Feature Selection
• http://www.jmlr.org/papers/volume3/guyon03a/guyon03a.pdf
RapidMiner Studio
Visual Workflow Designer for Data Scientists
1. Introduction 1. Clustering
2. Introduction to Machine 2. Supervised Learning
Learning 3. Deployment
3. Supervised Learning
4. Deployment & Scoring
5. Unsupervised Learning
6. Auto Model
Auto Model
Auto Model for Clustering
Feature Engineering
Supervised Learning Classification Regression
Is this A or B? How much or how many?
(aka Predictive Analytics) Will this be A or B? How many will happen? Feature
Generation
Create useful attributes
You should be able to understand, and are able to use, the following
machine learning tools in RapidMiner Studio’s Process Designer and
Auto Model:
• Classification and Regression
• Split Validation
• Scoring
• Correlations
• Feature Importance
• Clustering and Association Analysis
Next Steps
Get Certified!
https://academy.rapidminer.com/pages/certification
Area Next Course
Data Understanding and Data Preparation Data Engineering Master
Model Selection, Evaluation, and Validation Machine Learning Master