0% found this document useful (0 votes)
13 views

Model Evaluation-I

The document discusses evaluating machine learning algorithms and model selection. It covers topics like evaluating performance using error, accuracy, and other metrics. Cross-validation techniques like k-fold, leave-one-out, and bootstrap are presented. Metrics for evaluating predictions in regression and classification problems are defined. Key concepts related to evaluating algorithms like bias, variance, underfitting, and overfitting are explained.

Uploaded by

lila puchari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Model Evaluation-I

The document discusses evaluating machine learning algorithms and model selection. It covers topics like evaluating performance using error, accuracy, and other metrics. Cross-validation techniques like k-fold, leave-one-out, and bootstrap are presented. Metrics for evaluating predictions in regression and classification problems are defined. Key concepts related to evaluating algorithms like bias, variance, underfitting, and overfitting are explained.

Uploaded by

lila puchari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 68

MATRUSRI

ENGINEERING COLLEGE

MATRUSRI ENGINEERING COLLEGE


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

SUBJECT NAME: Machine Learning

FACULTY NAME: Mrs.J.Samatha


MATRUSRI
ENGINEERING COLLEGE

MACHINE LEARNING
COURSE OBJECTIVES:
•To explore the supervised learning paradigms of machine learning
•To explore the unsupervised learning paradigms of machine learning
•To evaluate various machine learning algorithms and techniques
•To explore deep learning technique and various feature extraction
strategies.
•To explore recent trends in machine learning methods for IOT
applications.
COURSE OUTCOMES:
•Extract features and apply supervised learning paradigms.
•Illustrate several clustering algorithms on the give data set.
•Compare and contrast various machine learning algorithms and
get insight of when to apply particular machine learning approach
•Apply basic deep learning algorithms and feature extraction
strategies.
•Get familiarized with advanced topics of machine learning.
MATRUSRI
ENGINEERING COLLEGE

MODULE-I

Evaluating machine learning algorithms and model selection

OUTCOMES:
• Able to evaluate different machine learning algorithms and select a
model
MATRUSRI
ENGINEERING COLLEGE

Evaluation of Learning Algorithms


Evaluating the performance of learning systems is important because:
-Learning are usually designed to predict the class of future unlabeled
data points

Typical choices for performance evaluation:


-Error
-Accuracy
-Precision/recall

Typical choices for sampling methods:


-Hold out method or Training/test set
-K fold cross validation
-Leave one out cross validation
-Boot strap method
MATRUSRI
ENGINEERING COLLEGE

Hold out Method(Training/Validation/Test set)


MATRUSRI
ENGINEERING COLLEGE
MATRUSRI
ENGINEERING COLLEGE

K fold Cross Validation


1.Split the data into k equal subsets
2.Perform k rounds of learning, on each round
-1/k of the data is held out as a test set and
-The remaining examples are used as training data
3. Compute the average test set score of the k rounds
MATRUSRI
ENGINEERING COLLEGE

Boot strap method


•In statistics, the term “bootstrap sampling”, the “bootstrap” or
“bootstrapping” for short, refers to process of “random sampling with
replacement”.

• repeated sampling from data with replacement and repeated


estimation

•Subsample will have same number of observations

•Same observation can be selected many times

•Probability of selecting each observation is same


MATRUSRI
ENGINEERING COLLEGE

Leave one out Cross Validation


•Specific case of K fold CV.
•K=N

•ADV:
•Good way to validate
•Disadv:
•High computation time
MATRUSRI
ENGINEERING COLLEGE

Evaluating Predictions
Suppose we want to make a prediction of a value for a target feature
on example x:

-Y is observed value of target value of target feature on example x

-Y’ is the predicted value of target feature on example x

-How is the error measured?


MATRUSRI
ENGINEERING COLLEGE

Terminology related to Evaluation of Algorithms


Supervised Supervised learning teaches a model from labeled training data and helps you to
learning make predictions about unseen or future data. During the training, you give the
algorithm a dataset that contains correct answers (label y). Then, you validate the
model accuracy with a test data set with correct answers. A data set must be split into
training and test sets.

Classification With classification, you're trying to predict one of a small number of discrete-valued
outputs. For example, you might try to predict whether your label is binary (binary
classification) or categorical (multiclass classification).

Regression In regression, the goal of the learning problem is to predict continuous value output.

Ranking Order items according to some criterion. Ex: web search : returning web pages
relevant to a search query. Many other similar ranking problems arise in the context
of the design of information extraction of natural language processing systems.
Unsupervised Given a data set, try to find tendencies in the data by using techniques like clustering.
learning

Feature Feature is an attribute that is used as input for the model to train. Other names
include dimension or column.
MATRUSRI
ENGINEERING COLLEGE

Terminology related to Evaluation of Algorithms


Bias Bias is the expected difference between the parameters of a
model that perfectly fits your data and the parameters that your
algorithm learned. The sample error is a poor estimator of true
error.
Variance Variance is how much the algorithm is impacted by the training
data and how much the parameters change with new training
data. The smaller the test set, the greater the expected variance.
Underfitting The model is too simple to capture the patterns within the data.
The model performs poorly on data that it was trained on and on
unseen data. High bias, low variance. High training error and
high test error.
Overfitting The model is too complicated or too specific, capturing trends
that don't generalize. The model accurately predicts data that it
was trained on but doesn't accurately predict unseen data. Low
bias, high variance. Low training error and high test error.
Bias- Bias-Variance Trade-off refers to finding a model with the right
MATRUSRI
ENGINEERING COLLEGE

Bias
Bias is the difference between our actual and predicted values. Bias is the
simple assumptions that our model makes about our data to be able to
predict new data.
MATRUSRI
ENGINEERING COLLEGE

Bias
When the Bias is high, assumptions made by our model are too
basic, the model can’t capture the important features of our data. This
means that our model hasn’t captured patterns in the training data and
hence cannot perform well on the testing data too. If this is the case, our
model cannot perform on new data and cannot be sent into production.
This instance, where the model cannot find patterns in our
training set and hence fails for both seen and unseen data, is called
Underfitting.
MATRUSRI
ENGINEERING COLLEGE

Variance
We can define variance as the model’s sensitivity to fluctuations in the
data. Our model may learn from noise. This will cause our model to
consider trivial features as important.

we can see that our model has learned extremely well for our training
data, which has taught it to identify cats. But when given new data, such
as the picture of a fox, our model predicts it as a cat, as that is what it has
learned. This happens when the Variance is high, our model will capture
all the features of the data given to it, including the noise, will tune itself
to the data, and predict it very well but when given new data, it cannot
predict on it as it is too specific to training data.
MATRUSRI
ENGINEERING COLLEGE

Variance
Hence, our model will perform really well on testing data and get high
accuracy but will fail to perform on new, unseen data. New data may not
have the exact same features and the model won’t be able to predict it
very well. This is called Overfitting.

Over-fitted model where we see model performance on, a) training data


b) new data
MATRUSRI
ENGINEERING COLLEGE

Bias & Variance


MATRUSRI
ENGINEERING COLLEGE

Evaluation Metrics
REGRESSION:
•Mean absolute error
•Mean squares error
•Root Mean Squared Error(RMSE)
•Mean absolute percentage error

CLASSIFICATION:
•Confusion matrix
MATRUSRI
ENGINEERING COLLEGE

Mean Absolute Error


We will calculate the residual for every data point, taking only the
absolute value of each so that negative and positive residuals do not
cancel out. We then take the average of all these residuals.
MATRUSRI
ENGINEERING COLLEGE

Mean Absolute Error


•Take the absolute difference between Y and Ŷ for each of the
available observations: ⎮Yᵢ-Ŷᵢ⎮ where i ϵ [1, the total number of
points in the dataset].
•Sum each absolute difference to get a total error: Σ⎮Yᵢ-Ŷᵢ⎮
•Divide the sum by a total number of observations to get a mean
error value: Σ⎮Yᵢ-Ŷᵢ⎮ / n

MAE = Σ⎮Yᵢ-Ŷᵢ⎮ / n
MATRUSRI
ENGINEERING COLLEGE

Mean square error

•Take the difference between Y and Ŷ for each of the available


observations: Yᵢ-Ŷᵢ
•Square each of the difference value : (Yᵢ-Ŷᵢ)²
•Sum Squared values: Σ (Yᵢ-Ŷᵢ)² where i ϵ [1, the total number of
points in the dataset]
•Divide by the total number of observations: Σ (Yᵢ-Ŷᵢ)² / n
MATRUSRI
ENGINEERING COLLEGE

Root Mean squared error


As RMSE is clear by the name itself, that it is a simple square root of mean
squared error.
MATRUSRI
ENGINEERING COLLEGE

Mean absolute percentage error

MAPE will be lower when the prediction is lower than the actual
compared to a prediction that is higher by the same amount.
MATRUSRI
ENGINEERING COLLEGE

Confusion matrix for 2-class problems


MATRUSRI
ENGINEERING COLLEGE

Confusion Matrix
Basic terminology
•True Positives (TP): we correctly predicted that they do have diabetes
•True Negatives (TN): we correctly predicted that they don't have
diabetes
•False Positives (FP): we incorrectly predicted that they do have diabetes
•False Negatives (FN): we incorrectly predicted that they don't have
diabetes
MATRUSRI
ENGINEERING COLLEGE

Other accuracy metrics


MATRUSRI
ENGINEERING COLLEGE

Other accuracy metrics

•Precision tells us how many of the correctly predicted cases actually turned out
to be positive.
•Recall tells us how many of the actual positive cases we were able to predict
correctly with our model.
MATRUSRI
ENGINEERING COLLEGE

Other accuracy metrics

Recall and sensitivity both are same.


MATRUSRI
ENGINEERING COLLEGE

Other measures of performance


Using the data in the confusion matrix of a classifier of
two-class dataset, several measures of performance
have been defined.
Accuracy = (TP + TN)/( TP + TN + FP + FN )
Error rate = 1− Accuracy
Sensitivity = TP/( TP + FN)
Precision = TP/( TP + FP)
Specificity = TN /(TN + FP)
F-measure = (2 × TP)/( 2 × TP + FP + FN)
MATRUSRI
ENGINEERING COLLEGE

Suppose we had a classification dataset with 1000 data points.


We fit a classifier on it and get the below confusion matrix

True Positive (TP) = 560; meaning 560 positive class data points were
correctly classified by the model
True Negative (TN) = 330; meaning 330 negative class data points were
correctly classified by the model
False Positive (FP) = 60; meaning 60 negative class data points were
incorrectly classified as belonging to the positive class by the model
False Negative (FN) = 50; meaning 50 positive class data points were
incorrectly classified as belonging to the negative class by the model
MATRUSRI
ENGINEERING COLLEGE

Example

The total outcome values are:


TP = 30, TN = 930, FP = 30, FN = 10
So, the accuracy for our model turns out to be:
MATRUSRI
ENGINEERING COLLEGE

Precision vs. Recall


Precision tells us how many of the correctly predicted cases actually
turned out to be positive.

Recall tells us how many of the actual positive cases we were able to
predict correctly with our model.
MATRUSRI
ENGINEERING COLLEGE

Practice problem-1
Suppose a computer program for recognizing dogs in photographs
identifies eight dogs in a picture containing 12 dogs and some cats. Of the
eight dogs identified, five actually are dogs while the rest are cats.
Compute the precision and recall of the computer program.
MATRUSRI
ENGINEERING COLLEGE

Practice problem-2
Let there be 10 balls (6 white and 4 red balls) in a box and let it be
required to pick up the red balls from them. Suppose we pick up 7 balls as
the red balls of which only 2 are actually red balls. What are the values of
precision and recall in picking red ball?
MATRUSRI
ENGINEERING COLLEGE

Practice Problem-3
A database contains 80 records on a particular topic of which 55 are
relevant to a certain investigation. A search was conducted on that topic
and 50 records were retrieved. Of the 50 records retrieved, 40 were
relevant. Construct the confusion matrix for the search and calculate
the precision and recall scores for the search.
SOLUTION:
Each record may be assigned a class label “relevant" or “not relevant”. All
the 80 records were tested for relevance. The test classified 50 records as
“relevant”. But only 40 of them were actually relevant. Hence we have
the following confusion matrix for the search:
MATRUSRI
ENGINEERING COLLEGE
MATRUSRI
ENGINEERING COLLEGE

Suppose we have a test dataset of 10 records with expected


outcomes and a set of predictions from our classification algorithm.

Compute the accuracy, precision, sensitivity


and specificity of the data.
MATRUSRI
ENGINEERING COLLEGE

Sample problem
Suppose 10000 patients get tested for flu; out
of them, 9000 are actually healthy and 1000
are actually sick. For the sick people, a test
was positive for 620 and negative for 380. For
the healthy people, the same test was
positive for 180 and negative for 8820.
Construct a confusion matrix for the data and
compute the accuracy, precision and recall for
the data.
MATRUSRI
ENGINEERING COLLEGE

Receiver Operating Characteristic (ROC)


•The acronym ROC stands for Receiver Operating
Characteristic, a terminology coming from signal
detection theory.

•The ROC curve was first developed by electrical


engineers and radar engineers during World War II
for detecting enemy objects in battlefields.

•They are now increasingly used in machine learning


and data mining research.
MATRUSRI
ENGINEERING COLLEGE

TPR and FPR


Let a binary classifier classify a collection of test data.

TP = Number of true positives


TN = Number of true negatives
FP = Number of false positives
FN = Number of false negatives

TPR = True Positive Rate = TP/( TP + FN )= Fraction of


positive examples correctly classified = Sensitivity

FPR = False Positive Rate = FP /(FP + TN) = Fraction of


negative examples incorrectly classified = 1 −
Specificity
MATRUSRI
ENGINEERING COLLEGE

ROC space
•We plot the values of FPR along the horizontal axis
(that is , x-axis) and the values of TPR along the
vertical axis (that is, y-axis) in a plane.

•For each classifier, there is a unique point in this


plane with coordinates (FPR,TPR).

•The ROC space is the part of the plane whose points


correspond to (FPR,TPR).

•Each prediction result or instance of a confusion


matrix represents one point in the ROC space.
MATRUSRI
ENGINEERING COLLEGE

ROC space
The position of the point (FPR,TPR) in the ROC
space gives an indication of the performance
of the classifier.

For example, let us consider some special


points in the space

One step higher for positive examples and


one step right for negative examples
MATRUSRI
ENGINEERING COLLEGE
MATRUSRI
ENGINEERING COLLEGE

Special points in ROC space


•The left bottom corner point (0, 0):
•Always negative prediction
•A classifier which produces this point in the ROC
space never classifies an example as positive,
neither rightly nor wrongly, because for this point
TP = 0 and FP = 0.
•It always makes negative predictions.
•All positive instances are wrongly predicted and
all negative instances are correctly predicted.
•It commits no false positive errors.
MATRUSRI
ENGINEERING COLLEGE

Special points in ROC space


•The right top corner point (1, 1):
•Always positive prediction
•A classifier which produces this point in the ROC
space always classifies an example as positive
because for this point FN = 0 and TN = 0.
•All positive instances are correctly predicted and
all negative instances are wrongly predicted.
• It commits no false negative errors.
MATRUSRI
ENGINEERING COLLEGE

Special points in ROC space


•The left top corner point (0, 1):
•Perfect prediction
•A classifier which produces this point in the ROC
space may be thought as a perfect classifier.
•It produces no false positives and no false
negatives
MATRUSRI
ENGINEERING COLLEGE

Special points in ROC space


• Points along the diagonal:
• Random performance

• Consider a classifier where the class labels

are randomly guessed, say by flipping a


coin.
• Then, the corresponding points in the ROC

space will be lying very near the diagonal


line joining the points (0, 0) and (1, 1).
MATRUSRI
ENGINEERING COLLEGE

ROC Space & some special points in the


space
MATRUSRI
ENGINEERING COLLEGE

ROC curve
•In the case of certain classification algorithms, the
classifier may depend on a parameter.

•Different values of the parameter will give different


classifiers and these in turn give different values to
TPR and FPR.

•The ROC curve is the curve obtained by plotting in


the ROC space the points (TPR , FPR) obtained by
assigning all possible values to the parameter in the
classifier
MATRUSRI
ENGINEERING COLLEGE

ROC curve
•The closer the ROC curve is to the top left corner (0,
1) of the ROC space, the better the accuracy of the
classifier.

•Among the three classifiers A, B, C with ROC curves ,


the classifier C is closest to the top left corner of the
ROC space.

•Hence, among the three, it gives the best accuracy


in predictions.
MATRUSRI
ENGINEERING COLLEGE

Area under the ROC curve (AUC)


•The measure of the area under the ROC curve is
denoted by the acronym AUC .

• The value of AUC is a measure of the performance


of a classifier.

•For the perfect classifier, AUC = 1.0.


MATRUSRI
ENGINEERING COLLEGE
MATRUSRI
ENGINEERING COLLEGE

Sample problem on ROC & AUC


•The body mass index (BMI) of a person is defined as
(weight(kg)/height(m)2 ).

•Researchers have established a link between BMI and the risk of


breast cancer among women.

•The higher the BMI the higher the risk of developing breast cancer.

•The critical threshold value of BMI may depend on several


parameters like food habits, socio-cultural-economic background,
life-style, etc

•Gives real data of a breast cancer study with a sample having 100
patients and 200 normal persons.

•The table also shows the values of TPR and FPR for various cut-off
values of BMI.
MATRUSRI
ENGINEERING COLLEGE

Data for various values of BMI


MATRUSRI
ENGINEERING COLLEGE

ROC curve of the data


MATRUSRI
ENGINEERING COLLEGE

Given the following data, construct the ROC


curve of the data. Compute the AUC.
MATRUSRI
ENGINEERING COLLEGE

Statistical Learning Theory


•Statistical learning theory is a framework for machine
learning, drawing from the fields of statistics and
functional analysis.
•Statistical learning theory deals with the problem of
finding a predictive function based on data.
•The goal of learning is prediction. Learning falls into
many categories:
-Supervised learning
-Unsupervised learning
-Semi-supervised learning
-Transfer Learning
-Online learning
-Reinforcement learning
MATRUSRI
ENGINEERING COLLEGE

Statistical Learning Theory


•Statistical learning theory was introduced in the late 1960s but until
1990s it was simply a problem of function estimation from a given
collection of data.
•In the middle of the 1990s new types of learning algorithms (e.g.,
support vector machines) based on the developed theory were proposed.
This made statistical learning theory not only a tool for the theoretical
analysis but also a tool for creating practical algorithms for estimating
multidimensional functions.
•Statistical learning plays a key role in many areas of science, finance
and industry.
MATRUSRI
ENGINEERING COLLEGE

Statistical modeling from the perspective of


supervised learning
•In supervised learning, an algorithm is given samples
that are labeled in some useful way. For example, the
samples might be descriptions of apples, and the labels
could be whether or not the apples are edible.
•Supervised learning involves learning from a training
set of data. Every point in the training is an input-
output pair, where the input maps to an output. The
learning problem consists of inferring the function that
maps between the input and the output in a predictive
fashion, such that the learned function can be used to
predict output from future input.
MATRUSRI
ENGINEERING COLLEGE

Machine learning Vs Statistics


MATRUSRI
ENGINEERING COLLEGE

Machine Learning Vs Statistical Modelling


•Machine Learning is … an algorithm that can
learn from data without relying on rules-
based programming.
•Statistical Modelling is … formalization of
relationships between variables in the form of
mathematical equations.
MATRUSRI
ENGINEERING COLLEGE

Machine Learning Vs Statistical Modelling

•Machine Learning is … a subfield of computer


science and artificial intelligence which deals
with building systems that can learn from
data, instead of explicitly programmed
instructions.
•Statistical Modelling is … a subfield of
mathematics which deals with finding
relationship between variables to predict an
outcome
MATRUSRI
ENGINEERING COLLEGE

Examples of the learning problems


•Predict whether a patient, hospitalized due to a heart attack, will have a
second heart attack. The prediction is to be based on demographic, diet
and clinical measurements for that patient.
•Predict the price of a stock in 6 months from now, on the basis of
company performance measures and economic data.
•Estimate the amount of glucose in the blood of a diabetic person, from
the infrared absorption spectrum of that person’s blood.
•Identify the risk factors for prostate cancer, based on clinical and
demographic variables.
MATRUSRI
ENGINEERING COLLEGE

Sample problem
•Consider a set of patients coming for treatment in a certain clinic.
•Let A denote the event that a “Patient has liver disease” and B the
event that a “Patient is an alcoholic.”
•It is known from experience that 10% of the patients entering the
clinic have liver disease and 5% of the patients are alcoholics.
•Also, among those patients diagnosed with liver disease, 7% are
alcoholics.
•Given that a patient is alcoholic, what is the probability that he will
have liver disease?
MATRUSRI
ENGINEERING COLLEGE

Using the notations of probability,


P(A)= 10% = 0.10
P(B)= 5% = 0.05
P(B∣A)= 7% = 0.07

P(A∣B)= P(B∣A)P(A) / P(B)


= 0.07×0.10/ 0.05
= 0.14
MATRUSRI
ENGINEERING COLLEGE

A good learner is one that accurately predicts such an outcome.

•In essence, a statistical learning problem is learning from


the data.
• In a typical scenario, we have an outcome measurement,
usually quantitative (such as a stock price) or categorical
(such as heart attack/no heart attack), that we wish to predict
based on a set of features (such as diet and clinical
measurements).
•We have a Training Set which is used to observe the
outcome and feature measurements for a set of objects.
•Using this data we build a Prediction Model, or
a Statistical Learner , which enables us to predict the
outcome for a set of new unseen objects.
MATRUSRI
ENGINEERING COLLEGE

Statistics + Machine Learning=Statistical Learning


MATRUSRI
ENGINEERING COLLEGE

Questions & Answers


1) What are the different choices for performance evaluation.
2) What k-fold cross validation.
3) Differentiate bias & variance
4) Explain the terms underfitting & overfitting
5) What are the different evaluation metrics for classification &
regression
6) Explain the terminology in confusion matrix.
7) Explain about ROC & AUC

You might also like