0% found this document useful (0 votes)

6 views

M2 - Supervised Machine Learning

The document provides an overview of supervised machine learning, focusing on regression analysis, feature engineering, and various classification techniques such as logistic regression, decision trees, and support vector machines (SVM). It explains the concepts of dependent and independent variables, performance metrics for regression and classification, and the importance of model evaluation. Additionally, it discusses the strengths and weaknesses of different machine learning methods, highlighting their applications and effectiveness in various scenarios.

Uploaded by

chenkhoonsg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

M2 - Supervised Machine Learning

Uploaded by

chenkhoonsg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 79

Supervised

Machine Learning
2 of 3 modules
Supervised Learning
Data includes both the input and the desired results.
Training and Test
Sets
Resampling
Imbalanced
Datasets

Resampling + Synthesis of artificial data

SMOTE – Synthetic Minority Oversampling

Ensemble (combined)
Models
Linear Regression
Getting our line straight!
Introduction to Regression Analysis
 Regression analysis is used to:
 Predict the value of a dependent variable based on the value of at least one
independent variable
 Explain the impact of changes in an independent variable on the dependent
variable
• Dependent variable:
The variable we wish to predict or explain
• Independent variable:
The variable used to explain the dependent variable
Simple Only one independent variable, X

Linear Relationship between X and Y is

Regression described by a linear function.

Model Changes in Y are assumed to be

caused by changes in X
More than one independent variable,
Multi X

Linear Relationship between X and Y is

Regression described by a linear function.

Model Changes in Y are assumed to be

caused by changes in X
Types of Relationships
Linear relationships Curvilinear relationships

Y Y

X X

Y Y

X X
Types of Relationships
Strong relationships Weak relationships

Y Y

X X

Y Y

X X
Types of Relationships
No relationship

X
Simple Linear Regression Model

Population Random
Population Independent Error
Slope
Y intercept Variable term
Coefficient
Dependent
Variable

Yi = b + MXi + εi
Linear component Random Error
component
Simple Linear Regression Model - Errors

Y Yi = b + MXi + εi
Observed Value
of Y for Xi

εi Slope = M
Predicted Value Random Error
of Y for Xi
for this X i value

Intercept = b

Xi X
Interpretation of the Slope and the Intercept

• b is the estimated average value of Y when the value of X is zero

• M is the estimated change in the average value of Y as a result of a

one-unit change in X
How do we determine if our
Regression model is doing well or not?
Performance Metrics (Regression)

Mean Squared Error -

Mean Absolute Error - Measures the average of
Sum of the absolute the squares of the errors—
differences between that is, the average
predictions and actual squared difference between
values. the estimated values and
what is estimated.
Let’s dive straight to the Hands-on
using Jupyter notebooks
Feature
Engineering
Improving the model!
Feature engineering

• The first thing we need to do when creating a machine

learning model is to decide what to use as features.
• Features are key to a model, like a person’s name or
favorite color. pieces of information that we take from the
text and give to the algorithm so it can work its magic.
• E.g, if we do classification on health, some features could
be a person’s height, weight, gender, and so on.
• We would exclude things that maybe are known but aren’t useful
Benefits of Feature Engineering

• Reduces Overfitting : Less redundant data means less

opportunity to make decisions based on noise.

• Improves Accuracy : Less misleading data means modeling

accuracy improves.

• Reduces Training Time : Fewer data points reduce

algorithm complexity and algorithms train faster.
Techniques of Feature
Engineering

• Introducing interaction terms

Let’s dive straight to the Hands-on
using Jupyter notebooks
Logistic Regression
What is it and what is the algorithm?
What is the difference
between Linear Regression
& Logistic Regression?
Recap: What is linear
regression?
• Linear regression quantifies the relationship
between one or more predictor variables and
one outcome variable.

• For example, linear regression can be used to

quantify the relative impacts of age, gender, and
diet (the predictor variables) on height (the
outcome variable).
Recap: Example

Sales = 168 + 23
Advertising
Example – Log Reg – Scoring Goals!
• If we are kicking our soccer ball from a variety of distances.
• The results are going to be only Goal or no Goal.
• Our Standard Linear Regression will not work in this scenario!
Nominal
• Nominal scales are used for labeling variables, without
any quantitative value. “Nominal” scales could simply be called
“labels.”
Good to • E.g Male/Female, Red/Green/Yellow

know! Ordinal
• With ordinal scales, the order of the values is what’s important
and significant, but the differences between each one is not really
known.
• E.g Good, Very good, Excellent, Fantastic – 1#, 2#, 3#, 4#
What is logistic regression?

• Logistic regression is the appropriate regression

analysis to conduct when the dependent
variable is atleast binary.
• Like all regression analyses, the logistic
regression is a predictive analysis.
• Logistic regression is used to describe data and
to explain the relationship between one
dependent binary variable and one or more
nominal, ordinal, interval or ratio-level
independent variables.
The Sigmoid function
• We apply sigmoid function on the linear regression equation.
• By doing so, we will push our straight line to be a S shape or Sigmoid
Curve.
Model Evaluation is an integral part
Model of the model development process.

Evaluation It helps to find the best model that

represents our data and how well the
chosen model will work in the future.
Performance Metrics (Classification)

Confusion Matrix Accuracy Precision and

Recall
How do you evaluate classifiers?
Accuracy!
Confusion Matrix

It is a performance measurement for machine learning classification

problem where output can be two or more classes.

It is a table with 4 different combinations of predicted and actual

values.
So how can we use the metrics?
Say we have 2 confusion matrix from 2
models
Actual Class
Actual Class
- + -
+

+ + 10 10
8 1
Predicted Class
Predicted Class

2 89 - 0 80
-

Logistic Regression SVM

We can compare them!

Accuracy:
97% 90%
(TP+TN)/(TP+TN+FP+FN)

Precision:
89% 50%
TP/(TP+FP)
Recall:
80% 100%
TP/(TP+FN)
Precision and Recall
Precision attempts to answer the following question:
What proportion of positive identifications was correct?

Recall attempts to answer the following question:

What proportion of actual positives was identified correctly?
Decision Trees
Decision tree learning
is one of the most
widely used techniques
for classification.

Introduction
The classification
model is a tree, called
decision tree.
A decision tree can be converted to a set of rules
• Build tree split by split.
How we do • Find the best split you can at each step

our tree
• This best split is also known as Greedy
Search.
• We can put a number to our splitting
split? step with :
• Gini Index
Gini Index
• Where pi is the probability of an object being classified to a
particular class.
• While building the decision tree, we would prefer choosing the
attribute/feature with the least Gini index as the root node.
Each inner node is a decision based on a feature
Each leaf node is a class label

Predicting Titanic Survivors

Yes Is sex male? No

Is age > 9.5? Survived

0.73 36%

Died Is sibsp > 2.5?

0.17 61%

Died Survived

0.05 2% 0.89 2%
Build tree split by split,
Find the best split you can at each step

Yes Is sex male? No

Survived
0.73 36%
Build tree split by split,
Find the best split you can at each step

Yes Is sex male? No

Is age > 9.5? Survived

0.73 36%

Died
0.17 61%
Build tree split by split,
Find the best split you can at each step

Yes Is sex male? No

Is age > 9.5? Survived

0.73 36%

Died Is sibsp > 2.5?

0.17 61%

Died Survived

0.05 2% 0.89 2%
• Generates understandable rules.
• Perform classification without requiring
Strengths of much computation.
• able to handle both continuous and
decision tree categorical variables.
• Provides a clear indication of which fields
methods are most important for prediction or
classification.
• Natural multiclass classiﬁer.
• It is less appropriate for estimation tasks where
the goal is to predict the value of a continuous
attribute.
• Prone to errors in classification problems with
many class and relatively small number of
training examples.

Weaknesses of • Computationally expensive to train.

• Growing a decision tree is computationally

decision tree expensive.

• At each node, each candidate splitting field
must be sorted before its best split can be
found.
• Small changes in input data can result in totally
diﬀerent trees.
• Can make mistakes with unbalanced classes.
Support Vector
Machines
• SVMs are linear or non-linear classifiers that
find a hyperplane to separate two class of data,
positive and negative.
What are • SVM not only has a rigorous theoretical
SVMs? foundation, but also performs classification
more accurately than most other methods in
applications, especially for high dimensional
data
Support Vector Machine (SVM)

1
Patient
status 0.5
after 5 yr.
0

Number of positive nodes

Find the best boundary that separates two classes

Support Vector Machine (SVM)
?
1
Patient
status 0.5
after 5 yr.
0

Number of positive nodes

Bad: 3 misclassifications, accuracy 67%

Support Vector Machine (SVM)
?
1
Patient
status 0.5
after 5 yr.
0

Number of positive nodes

One misclassification, accuracy 89%

Support Vector Machine (SVM)
?
1
Patient
status 0.5
after 5 yr.
0

Number of positive nodes

Accuracy: 78%
Support Vector Machine (SVM)
?
1
Patient
status 0.5
after 5 yr.
0

Number of positive nodes

Accuracy: 100%
Support Vector Machine (SVM)
?
1
Patient
status 0.5
after 5 yr.
0

Number of positive nodes

Accuracy: 100%
Support Vector Machine (SVM)
?
1
Patient
status 0.5
after 5 yr.
0

Number of positive nodes

Accuracy: 100%
Support Vector Machine (SVM)

1
Patient
status 0.5
after 5 yr.
0

Number of positive nodes

The margin: No man’s land

2 Features: Number of + nodes, Age
2 Labels: Survived / Lost

Age

Number of positive nodes

Find the line that separates
the classes best

Age

Number of positive nodes

Age

Number of positive nodes

Age

Number of positive nodes

Age

Number of positive nodes

3 features: Find the best boundary plane
(More features: hyperplane)
• The hyperplane that separates positive and negative training
data is
〈w ⋅ x〉 + b = 0
• It is also called the decision boundary (surface).

What is a
hyperplane?
How to choose the
best hyperplane?

• SVM looks for the

separating hyperplane with
the largest margin.
• Machine learning theory
says this hyperplane
minimizes the error bound
• Accuracy

Pros • Works well on smaller cleaner datasets

• It can be more efficient because it uses a subset of
training points

• Isn’t suited to larger datasets as the training time

Cons with SVMs can be high

• Less effective on noisier datasets with overlapping
classes
What have you learned?

4/5/2023 78
Thank you !!
I welcome your questions.

4/5/2023 79

Implementation of Medical Insurance Price Prediction System Using Regression Algorithms
No ratings yet
Implementation of Medical Insurance Price Prediction System Using Regression Algorithms
7 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
BITCOIN Pub Key
80% (5)
BITCOIN Pub Key
2 pages
Algorithms in C++ - Sedgewick
33% (3)
Algorithms in C++ - Sedgewick
2 pages
UNIT3 Machine Learning
No ratings yet
UNIT3 Machine Learning
53 pages
UNIT-3
No ratings yet
UNIT-3
12 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Week 7. Intro to ML. Regression
No ratings yet
Week 7. Intro to ML. Regression
24 pages
41 Machine Learning Algorithms I
No ratings yet
41 Machine Learning Algorithms I
8 pages
Machinelearning Algorithm Basics2 NOTES
No ratings yet
Machinelearning Algorithm Basics2 NOTES
72 pages
Module 5
No ratings yet
Module 5
48 pages
Regression Models: by Mayuri Bhandari
No ratings yet
Regression Models: by Mayuri Bhandari
64 pages
Lecture - 2 & 3
No ratings yet
Lecture - 2 & 3
62 pages
Commonly Used Machine Learning Algorithms
No ratings yet
Commonly Used Machine Learning Algorithms
27 pages
ML Algorithms Week 3
No ratings yet
ML Algorithms Week 3
30 pages
KCA 034 - Unit 2
No ratings yet
KCA 034 - Unit 2
97 pages
Supervised Learning
No ratings yet
Supervised Learning
187 pages
AI and DS QB1
No ratings yet
AI and DS QB1
31 pages
ICT202B AI ML and Emerging technologies UNIT 3 (Classification and Regression) 2
No ratings yet
ICT202B AI ML and Emerging technologies UNIT 3 (Classification and Regression) 2
23 pages
Ds Module 4
No ratings yet
Ds Module 4
73 pages
1 - Intro to Machine Learning
No ratings yet
1 - Intro to Machine Learning
34 pages
Supervised and Unsupervised Learning
No ratings yet
Supervised and Unsupervised Learning
92 pages
Machine Learing Algorithms
No ratings yet
Machine Learing Algorithms
13 pages
2-Machine Learning Algorithms
No ratings yet
2-Machine Learning Algorithms
16 pages
ML2
No ratings yet
ML2
8 pages
Lecture 3
No ratings yet
Lecture 3
51 pages
S, SVM, LR
No ratings yet
S, SVM, LR
18 pages
Slide 1
No ratings yet
Slide 1
29 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
Machine Learning
No ratings yet
Machine Learning
115 pages
machine learning
No ratings yet
machine learning
37 pages
UNIT II Machine Learning
No ratings yet
UNIT II Machine Learning
118 pages
AIML
No ratings yet
AIML
30 pages
Accuracy Assessment and Confusion Matrix
No ratings yet
Accuracy Assessment and Confusion Matrix
23 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
Fiches Machine Learning
No ratings yet
Fiches Machine Learning
21 pages
ML UNIT-4
No ratings yet
ML UNIT-4
20 pages
CS601_Machine Learning_Unit 1_Notes_1672759748
No ratings yet
CS601_Machine Learning_Unit 1_Notes_1672759748
13 pages
unit-2.pptx
No ratings yet
unit-2.pptx
133 pages
Analytics Boot Camp
No ratings yet
Analytics Boot Camp
126 pages
Tutorial 7 Machine Learning Algorithms
No ratings yet
Tutorial 7 Machine Learning Algorithms
30 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
Day2
No ratings yet
Day2
52 pages
Machine Learning: Bilal Khan
100% (2)
Machine Learning: Bilal Khan
20 pages
MBAS901 - L4
No ratings yet
MBAS901 - L4
83 pages
Lecture 9
No ratings yet
Lecture 9
27 pages
Machine learning notes
No ratings yet
Machine learning notes
12 pages
Machine Learning Theory
100% (1)
Machine Learning Theory
12 pages
Data Science Cheatsheet 2.0: Statistics Model Evaluation Logistic Regression
No ratings yet
Data Science Cheatsheet 2.0: Statistics Model Evaluation Logistic Regression
4 pages
Supervised Learning
No ratings yet
Supervised Learning
3 pages
Predictive ModellingAnalytics
No ratings yet
Predictive ModellingAnalytics
27 pages
Week 9 - PROG 8510 Week 9
No ratings yet
Week 9 - PROG 8510 Week 9
27 pages
Machine Learning For Interviews
No ratings yet
Machine Learning For Interviews
12 pages
Unit V - Big Data Programming
No ratings yet
Unit V - Big Data Programming
22 pages
ML-classification models
No ratings yet
ML-classification models
27 pages
Machine learning
No ratings yet
Machine learning
62 pages
Module3-Fitting A Model To Data
No ratings yet
Module3-Fitting A Model To Data
57 pages
Machine Learning QNA
No ratings yet
Machine Learning QNA
1 page
ML_Introduction
No ratings yet
ML_Introduction
76 pages
U21amg05 Aif and ML Unit 04 Notes
No ratings yet
U21amg05 Aif and ML Unit 04 Notes
42 pages
Supervised Learning
No ratings yet
Supervised Learning
24 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
Network Analysis and Mining: Pagerank
No ratings yet
Network Analysis and Mining: Pagerank
17 pages
Personal Goal - Portfolio
No ratings yet
Personal Goal - Portfolio
13 pages
DIGITAL FILTERS Case Study
No ratings yet
DIGITAL FILTERS Case Study
8 pages
Control Theory Python Summary
No ratings yet
Control Theory Python Summary
25 pages
3.0 Numerical Analysis
No ratings yet
3.0 Numerical Analysis
14 pages
Zeiss Algorithms
No ratings yet
Zeiss Algorithms
15 pages
Greedy Algorithms
No ratings yet
Greedy Algorithms
6 pages
Mfcs-Recurrence Relations
100% (2)
Mfcs-Recurrence Relations
35 pages
Design of Multiplierless IFIR Based Cosine Modulated Filter Bank Using QPSO
No ratings yet
Design of Multiplierless IFIR Based Cosine Modulated Filter Bank Using QPSO
6 pages
Solving Transportation Problem of Student in TSU Lucinda Campus Using North West Corner Method
No ratings yet
Solving Transportation Problem of Student in TSU Lucinda Campus Using North West Corner Method
12 pages
Artificial Variables: V O Thomas
No ratings yet
Artificial Variables: V O Thomas
19 pages
Module IV - K NN
No ratings yet
Module IV - K NN
15 pages
Puño Vinabie A AC101 LINEAR PROGRAMMING
No ratings yet
Puño Vinabie A AC101 LINEAR PROGRAMMING
2 pages
Chapter 2 Discrete
No ratings yet
Chapter 2 Discrete
12 pages
Adaptive Multi Rate Coder Using ACLP
No ratings yet
Adaptive Multi Rate Coder Using ACLP
45 pages
Sheets 1 - 4
No ratings yet
Sheets 1 - 4
5 pages
12.Linear Programming RBSE PYQ Edition 2025
No ratings yet
12.Linear Programming RBSE PYQ Edition 2025
5 pages
AI Roadmap_ based on Berkeley AI Graduate Certificate
No ratings yet
AI Roadmap_ based on Berkeley AI Graduate Certificate
23 pages
Backtracking 1
No ratings yet
Backtracking 1
18 pages
Ec3492 Set3
No ratings yet
Ec3492 Set3
3 pages
Generating Wikipedia by Summarizing Long Sequence
No ratings yet
Generating Wikipedia by Summarizing Long Sequence
33 pages
ANDAR Assignment 3 Open Methods MT 311 2021-2022 - 025258
No ratings yet
ANDAR Assignment 3 Open Methods MT 311 2021-2022 - 025258
15 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
Master Theorem
100% (1)
Master Theorem
21 pages
Tetris Tetris Tetris Tetris
No ratings yet
Tetris Tetris Tetris Tetris
24 pages
Libro Nuevo ML
No ratings yet
Libro Nuevo ML
577 pages

M2 - Supervised Machine Learning

Uploaded by

M2 - Supervised Machine Learning

Uploaded by

Supervised

Resampling + Synthesis of artificial data

Linear Relationship between X and Y is

Model Changes in Y are assumed to be

Linear Relationship between X and Y is

Model Changes in Y are assumed to be

• b is the estimated average value of Y when the value of X is zero

• M is the estimated change in the average value of Y as a result of a

Mean Squared Error -

• The first thing we need to do when creating a machine

• Reduces Overfitting : Less redundant data means less

• Improves Accuracy : Less misleading data means modeling

• Reduces Training Time : Fewer data points reduce

• Introducing interaction terms

• For example, linear regression can be used to

• Logistic regression is the appropriate regression

Evaluation It helps to find the best model that

Confusion Matrix Accuracy Precision and

It is a performance measurement for machine learning classification

It is a table with 4 different combinations of predicted and actual

Logistic Regression SVM

Recall attempts to answer the following question:

Predicting Titanic Survivors

Is age > 9.5? Survived

Died Is sibsp > 2.5?

Yes Is sex male? No

Yes Is sex male? No

Is age > 9.5? Survived

Yes Is sex male? No

Is age > 9.5? Survived

Died Is sibsp > 2.5?

Weaknesses of • Computationally expensive to train.

decision tree expensive.

Number of positive nodes

Find the best boundary that separates two classes

Number of positive nodes

Bad: 3 misclassifications, accuracy 67%

Number of positive nodes

One misclassification, accuracy 89%

Number of positive nodes

Number of positive nodes

Number of positive nodes

Number of positive nodes

Number of positive nodes

The margin: No man’s land

Number of positive nodes

Number of positive nodes

Number of positive nodes

Number of positive nodes

Number of positive nodes

• SVM looks for the

Pros • Works well on smaller cleaner datasets

• Isn’t suited to larger datasets as the training time

Cons with SVMs can be high

You might also like