M2 - Supervised Machine Learning
M2 - Supervised Machine Learning
Machine Learning
2 of 3 modules
Supervised Learning
Data includes both the input and the desired results.
Training and Test
Sets
Resampling
Imbalanced
Datasets
Ensemble (combined)
Models
Linear Regression
Getting our line straight!
Introduction to Regression Analysis
Regression analysis is used to:
Predict the value of a dependent variable based on the value of at least one
independent variable
Explain the impact of changes in an independent variable on the dependent
variable
• Dependent variable:
The variable we wish to predict or explain
• Independent variable:
The variable used to explain the dependent variable
Simple Only one independent variable, X
Y Y
X X
Y Y
X X
Types of Relationships
Strong relationships Weak relationships
Y Y
X X
Y Y
X X
Types of Relationships
No relationship
X
Simple Linear Regression Model
Population Random
Population Independent Error
Slope
Y intercept Variable term
Coefficient
Dependent
Variable
Yi = b + MXi + εi
Linear component Random Error
component
Simple Linear Regression Model - Errors
Y Yi = b + MXi + εi
Observed Value
of Y for Xi
εi Slope = M
Predicted Value Random Error
of Y for Xi
for this X i value
Intercept = b
Xi X
Interpretation of the Slope and the Intercept
Sales = 168 + 23
Advertising
Example – Log Reg – Scoring Goals!
• If we are kicking our soccer ball from a variety of distances.
• The results are going to be only Goal or no Goal.
• Our Standard Linear Regression will not work in this scenario!
Nominal
• Nominal scales are used for labeling variables, without
any quantitative value. “Nominal” scales could simply be called
“labels.”
Good to • E.g Male/Female, Red/Green/Yellow
know! Ordinal
• With ordinal scales, the order of the values is what’s important
and significant, but the differences between each one is not really
known.
• E.g Good, Very good, Excellent, Fantastic – 1#, 2#, 3#, 4#
What is logistic regression?
+ + 10 10
8 1
Predicted Class
Predicted Class
2 89 - 0 80
-
Accuracy:
97% 90%
(TP+TN)/(TP+TN+FP+FN)
Precision:
89% 50%
TP/(TP+FP)
Recall:
80% 100%
TP/(TP+FN)
Precision and Recall
Precision attempts to answer the following question:
What proportion of positive identifications was correct?
Introduction
The classification
model is a tree, called
decision tree.
A decision tree can be converted to a set of rules
• Build tree split by split.
How we do • Find the best split you can at each step
our tree
• This best split is also known as Greedy
Search.
• We can put a number to our splitting
split? step with :
• Gini Index
Gini Index
• Where pi is the probability of an object being classified to a
particular class.
• While building the decision tree, we would prefer choosing the
attribute/feature with the least Gini index as the root node.
Each inner node is a decision based on a feature
Each leaf node is a class label
Died Survived
0.05 2% 0.89 2%
Build tree split by split,
Find the best split you can at each step
Survived
0.73 36%
Build tree split by split,
Find the best split you can at each step
Died
0.17 61%
Build tree split by split,
Find the best split you can at each step
Died Survived
0.05 2% 0.89 2%
• Generates understandable rules.
• Perform classification without requiring
Strengths of much computation.
• able to handle both continuous and
decision tree categorical variables.
• Provides a clear indication of which fields
methods are most important for prediction or
classification.
• Natural multiclass classifier.
• It is less appropriate for estimation tasks where
the goal is to predict the value of a continuous
attribute.
• Prone to errors in classification problems with
many class and relatively small number of
training examples.
1
Patient
status 0.5
after 5 yr.
0
Accuracy: 78%
Support Vector Machine (SVM)
?
1
Patient
status 0.5
after 5 yr.
0
Accuracy: 100%
Support Vector Machine (SVM)
?
1
Patient
status 0.5
after 5 yr.
0
Accuracy: 100%
Support Vector Machine (SVM)
?
1
Patient
status 0.5
after 5 yr.
0
Accuracy: 100%
Support Vector Machine (SVM)
1
Patient
status 0.5
after 5 yr.
0
Age
Age
What is a
hyperplane?
How to choose the
best hyperplane?
4/5/2023 78
Thank you !!
I welcome your questions.
4/5/2023 79