0% found this document useful (0 votes)
6 views

M2 - Supervised Machine Learning

The document provides an overview of supervised machine learning, focusing on regression analysis, feature engineering, and various classification techniques such as logistic regression, decision trees, and support vector machines (SVM). It explains the concepts of dependent and independent variables, performance metrics for regression and classification, and the importance of model evaluation. Additionally, it discusses the strengths and weaknesses of different machine learning methods, highlighting their applications and effectiveness in various scenarios.

Uploaded by

chenkhoonsg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

M2 - Supervised Machine Learning

The document provides an overview of supervised machine learning, focusing on regression analysis, feature engineering, and various classification techniques such as logistic regression, decision trees, and support vector machines (SVM). It explains the concepts of dependent and independent variables, performance metrics for regression and classification, and the importance of model evaluation. Additionally, it discusses the strengths and weaknesses of different machine learning methods, highlighting their applications and effectiveness in various scenarios.

Uploaded by

chenkhoonsg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

Supervised

Machine Learning
2 of 3 modules
Supervised Learning
Data includes both the input and the desired results.
Training and Test
Sets
Resampling
Imbalanced
Datasets

Resampling + Synthesis of artificial data


SMOTE – Synthetic Minority Oversampling

Ensemble (combined)
Models
Linear Regression
Getting our line straight!
Introduction to Regression Analysis
 Regression analysis is used to:
 Predict the value of a dependent variable based on the value of at least one
independent variable
 Explain the impact of changes in an independent variable on the dependent
variable
• Dependent variable:
The variable we wish to predict or explain
• Independent variable:
The variable used to explain the dependent variable
Simple Only one independent variable, X

Linear Relationship between X and Y is


Regression described by a linear function.

Model Changes in Y are assumed to be


caused by changes in X
More than one independent variable,
Multi X

Linear Relationship between X and Y is


Regression described by a linear function.

Model Changes in Y are assumed to be


caused by changes in X
Types of Relationships
Linear relationships Curvilinear relationships

Y Y

X X

Y Y

X X
Types of Relationships
Strong relationships Weak relationships

Y Y

X X

Y Y

X X
Types of Relationships
No relationship

X
Simple Linear Regression Model

Population Random
Population Independent Error
Slope
Y intercept Variable term
Coefficient
Dependent
Variable

Yi = b + MXi + εi
Linear component Random Error
component
Simple Linear Regression Model - Errors

Y Yi = b + MXi + εi
Observed Value
of Y for Xi

εi Slope = M
Predicted Value Random Error
of Y for Xi
for this X i value

Intercept = b

Xi X
Interpretation of the Slope and the Intercept

• b is the estimated average value of Y when the value of X is zero

• M is the estimated change in the average value of Y as a result of a


one-unit change in X
How do we determine if our
Regression model is doing well or not?
Performance Metrics (Regression)

Mean Squared Error -


Mean Absolute Error - Measures the average of
Sum of the absolute the squares of the errors—
differences between that is, the average
predictions and actual squared difference between
values. the estimated values and
what is estimated.
Let’s dive straight to the Hands-on
using Jupyter notebooks
Feature
Engineering
Improving the model!
Feature engineering

• The first thing we need to do when creating a machine


learning model is to decide what to use as features.
• Features are key to a model, like a person’s name or
favorite color. pieces of information that we take from the
text and give to the algorithm so it can work its magic.
• E.g, if we do classification on health, some features could
be a person’s height, weight, gender, and so on.
• We would exclude things that maybe are known but aren’t useful
Benefits of Feature Engineering

• Reduces Overfitting : Less redundant data means less


opportunity to make decisions based on noise.

• Improves Accuracy : Less misleading data means modeling


accuracy improves.

• Reduces Training Time : Fewer data points reduce


algorithm complexity and algorithms train faster.
Techniques of Feature
Engineering

• Introducing interaction terms


Let’s dive straight to the Hands-on
using Jupyter notebooks
Logistic Regression
What is it and what is the algorithm?
What is the difference
between Linear Regression
& Logistic Regression?
Recap: What is linear
regression?
• Linear regression quantifies the relationship
between one or more predictor variables and
one outcome variable.

• For example, linear regression can be used to


quantify the relative impacts of age, gender, and
diet (the predictor variables) on height (the
outcome variable).
Recap: Example

Sales = 168 + 23
Advertising
Example – Log Reg – Scoring Goals!
• If we are kicking our soccer ball from a variety of distances.
• The results are going to be only Goal or no Goal.
• Our Standard Linear Regression will not work in this scenario!
Nominal
• Nominal scales are used for labeling variables, without
any quantitative value. “Nominal” scales could simply be called
“labels.”
Good to • E.g Male/Female, Red/Green/Yellow

know! Ordinal
• With ordinal scales, the order of the values is what’s important
and significant, but the differences between each one is not really
known.
• E.g Good, Very good, Excellent, Fantastic – 1#, 2#, 3#, 4#
What is logistic regression?

• Logistic regression is the appropriate regression


analysis to conduct when the dependent
variable is atleast binary.
• Like all regression analyses, the logistic
regression is a predictive analysis.
• Logistic regression is used to describe data and
to explain the relationship between one
dependent binary variable and one or more
nominal, ordinal, interval or ratio-level
independent variables.
The Sigmoid function
• We apply sigmoid function on the linear regression equation.
• By doing so, we will push our straight line to be a S shape or Sigmoid
Curve.
Model Evaluation is an integral part
Model of the model development process.

Evaluation It helps to find the best model that


represents our data and how well the
chosen model will work in the future.
Performance Metrics (Classification)

Confusion Matrix Accuracy Precision and


Recall
How do you evaluate classifiers?
Accuracy!
Confusion Matrix

It is a performance measurement for machine learning classification


problem where output can be two or more classes.

It is a table with 4 different combinations of predicted and actual


values.
So how can we use the metrics?
Say we have 2 confusion matrix from 2
models
Actual Class
Actual Class
- + -
+

+ + 10 10
8 1
Predicted Class
Predicted Class

2 89 - 0 80
-

Logistic Regression SVM


We can compare them!

Accuracy:
97% 90%
(TP+TN)/(TP+TN+FP+FN)

Precision:
89% 50%
TP/(TP+FP)
Recall:
80% 100%
TP/(TP+FN)
Precision and Recall
Precision attempts to answer the following question:
What proportion of positive identifications was correct?

Recall attempts to answer the following question:


What proportion of actual positives was identified correctly?
Decision Trees
Decision tree learning
is one of the most
widely used techniques
for classification.

Introduction
The classification
model is a tree, called
decision tree.
A decision tree can be converted to a set of rules
• Build tree split by split.
How we do • Find the best split you can at each step

our tree
• This best split is also known as Greedy
Search.
• We can put a number to our splitting
split? step with :
• Gini Index
Gini Index
• Where pi is the probability of an object being classified to a
particular class.
• While building the decision tree, we would prefer choosing the
attribute/feature with the least Gini index as the root node.
Each inner node is a decision based on a feature
Each leaf node is a class label

Predicting Titanic Survivors


Yes Is sex male? No

Is age > 9.5? Survived


0.73 36%

Died Is sibsp > 2.5?


0.17 61%

Died Survived

0.05 2% 0.89 2%
Build tree split by split,
Find the best split you can at each step

Yes Is sex male? No

Survived
0.73 36%
Build tree split by split,
Find the best split you can at each step

Yes Is sex male? No

Is age > 9.5? Survived


0.73 36%

Died
0.17 61%
Build tree split by split,
Find the best split you can at each step

Yes Is sex male? No

Is age > 9.5? Survived


0.73 36%

Died Is sibsp > 2.5?


0.17 61%

Died Survived

0.05 2% 0.89 2%
• Generates understandable rules.
• Perform classification without requiring
Strengths of much computation.
• able to handle both continuous and
decision tree categorical variables.
• Provides a clear indication of which fields
methods are most important for prediction or
classification.
• Natural multiclass classifier.
• It is less appropriate for estimation tasks where
the goal is to predict the value of a continuous
attribute.
• Prone to errors in classification problems with
many class and relatively small number of
training examples.

Weaknesses of • Computationally expensive to train.


• Growing a decision tree is computationally

decision tree expensive.


• At each node, each candidate splitting field
must be sorted before its best split can be
found.
• Small changes in input data can result in totally
different trees.
• Can make mistakes with unbalanced classes.
Support Vector
Machines
• SVMs are linear or non-linear classifiers that
find a hyperplane to separate two class of data,
positive and negative.
What are • SVM not only has a rigorous theoretical
SVMs? foundation, but also performs classification
more accurately than most other methods in
applications, especially for high dimensional
data
Support Vector Machine (SVM)

1
Patient
status 0.5
after 5 yr.
0

Number of positive nodes

Find the best boundary that separates two classes


Support Vector Machine (SVM)
?
1
Patient
status 0.5
after 5 yr.
0

Number of positive nodes

Bad: 3 misclassifications, accuracy 67%


Support Vector Machine (SVM)
?
1
Patient
status 0.5
after 5 yr.
0

Number of positive nodes

One misclassification, accuracy 89%


Support Vector Machine (SVM)
?
1
Patient
status 0.5
after 5 yr.
0

Number of positive nodes

Accuracy: 78%
Support Vector Machine (SVM)
?
1
Patient
status 0.5
after 5 yr.
0

Number of positive nodes

Accuracy: 100%
Support Vector Machine (SVM)
?
1
Patient
status 0.5
after 5 yr.
0

Number of positive nodes

Accuracy: 100%
Support Vector Machine (SVM)
?
1
Patient
status 0.5
after 5 yr.
0

Number of positive nodes

Accuracy: 100%
Support Vector Machine (SVM)

1
Patient
status 0.5
after 5 yr.
0

Number of positive nodes

The margin: No man’s land


2 Features: Number of + nodes, Age
2 Labels: Survived / Lost

Age

Number of positive nodes


Find the line that separates
the classes best

Age

Number of positive nodes


Age

Number of positive nodes


Age

Number of positive nodes


Age

Number of positive nodes


3 features: Find the best boundary plane
(More features: hyperplane)
• The hyperplane that separates positive and negative training
data is
〈w ⋅ x〉 + b = 0
• It is also called the decision boundary (surface).

What is a
hyperplane?
How to choose the
best hyperplane?

• SVM looks for the


separating hyperplane with
the largest margin.
• Machine learning theory
says this hyperplane
minimizes the error bound
• Accuracy

Pros • Works well on smaller cleaner datasets


• It can be more efficient because it uses a subset of
training points

• Isn’t suited to larger datasets as the training time

Cons with SVMs can be high


• Less effective on noisier datasets with overlapping
classes
What have you learned?

4/5/2023 78
Thank you !!
I welcome your questions.

4/5/2023 79

You might also like