Model Evaluation-I
Model Evaluation-I
ENGINEERING COLLEGE
MACHINE LEARNING
COURSE OBJECTIVES:
•To explore the supervised learning paradigms of machine learning
•To explore the unsupervised learning paradigms of machine learning
•To evaluate various machine learning algorithms and techniques
•To explore deep learning technique and various feature extraction
strategies.
•To explore recent trends in machine learning methods for IOT
applications.
COURSE OUTCOMES:
•Extract features and apply supervised learning paradigms.
•Illustrate several clustering algorithms on the give data set.
•Compare and contrast various machine learning algorithms and
get insight of when to apply particular machine learning approach
•Apply basic deep learning algorithms and feature extraction
strategies.
•Get familiarized with advanced topics of machine learning.
MATRUSRI
ENGINEERING COLLEGE
MODULE-I
OUTCOMES:
• Able to evaluate different machine learning algorithms and select a
model
MATRUSRI
ENGINEERING COLLEGE
•ADV:
•Good way to validate
•Disadv:
•High computation time
MATRUSRI
ENGINEERING COLLEGE
Evaluating Predictions
Suppose we want to make a prediction of a value for a target feature
on example x:
Classification With classification, you're trying to predict one of a small number of discrete-valued
outputs. For example, you might try to predict whether your label is binary (binary
classification) or categorical (multiclass classification).
Regression In regression, the goal of the learning problem is to predict continuous value output.
Ranking Order items according to some criterion. Ex: web search : returning web pages
relevant to a search query. Many other similar ranking problems arise in the context
of the design of information extraction of natural language processing systems.
Unsupervised Given a data set, try to find tendencies in the data by using techniques like clustering.
learning
Feature Feature is an attribute that is used as input for the model to train. Other names
include dimension or column.
MATRUSRI
ENGINEERING COLLEGE
Bias
Bias is the difference between our actual and predicted values. Bias is the
simple assumptions that our model makes about our data to be able to
predict new data.
MATRUSRI
ENGINEERING COLLEGE
Bias
When the Bias is high, assumptions made by our model are too
basic, the model can’t capture the important features of our data. This
means that our model hasn’t captured patterns in the training data and
hence cannot perform well on the testing data too. If this is the case, our
model cannot perform on new data and cannot be sent into production.
This instance, where the model cannot find patterns in our
training set and hence fails for both seen and unseen data, is called
Underfitting.
MATRUSRI
ENGINEERING COLLEGE
Variance
We can define variance as the model’s sensitivity to fluctuations in the
data. Our model may learn from noise. This will cause our model to
consider trivial features as important.
we can see that our model has learned extremely well for our training
data, which has taught it to identify cats. But when given new data, such
as the picture of a fox, our model predicts it as a cat, as that is what it has
learned. This happens when the Variance is high, our model will capture
all the features of the data given to it, including the noise, will tune itself
to the data, and predict it very well but when given new data, it cannot
predict on it as it is too specific to training data.
MATRUSRI
ENGINEERING COLLEGE
Variance
Hence, our model will perform really well on testing data and get high
accuracy but will fail to perform on new, unseen data. New data may not
have the exact same features and the model won’t be able to predict it
very well. This is called Overfitting.
Evaluation Metrics
REGRESSION:
•Mean absolute error
•Mean squares error
•Root Mean Squared Error(RMSE)
•Mean absolute percentage error
CLASSIFICATION:
•Confusion matrix
MATRUSRI
ENGINEERING COLLEGE
MAE = Σ⎮Yᵢ-Ŷᵢ⎮ / n
MATRUSRI
ENGINEERING COLLEGE
MAPE will be lower when the prediction is lower than the actual
compared to a prediction that is higher by the same amount.
MATRUSRI
ENGINEERING COLLEGE
Confusion Matrix
Basic terminology
•True Positives (TP): we correctly predicted that they do have diabetes
•True Negatives (TN): we correctly predicted that they don't have
diabetes
•False Positives (FP): we incorrectly predicted that they do have diabetes
•False Negatives (FN): we incorrectly predicted that they don't have
diabetes
MATRUSRI
ENGINEERING COLLEGE
•Precision tells us how many of the correctly predicted cases actually turned out
to be positive.
•Recall tells us how many of the actual positive cases we were able to predict
correctly with our model.
MATRUSRI
ENGINEERING COLLEGE
True Positive (TP) = 560; meaning 560 positive class data points were
correctly classified by the model
True Negative (TN) = 330; meaning 330 negative class data points were
correctly classified by the model
False Positive (FP) = 60; meaning 60 negative class data points were
incorrectly classified as belonging to the positive class by the model
False Negative (FN) = 50; meaning 50 positive class data points were
incorrectly classified as belonging to the negative class by the model
MATRUSRI
ENGINEERING COLLEGE
Example
Recall tells us how many of the actual positive cases we were able to
predict correctly with our model.
MATRUSRI
ENGINEERING COLLEGE
Practice problem-1
Suppose a computer program for recognizing dogs in photographs
identifies eight dogs in a picture containing 12 dogs and some cats. Of the
eight dogs identified, five actually are dogs while the rest are cats.
Compute the precision and recall of the computer program.
MATRUSRI
ENGINEERING COLLEGE
Practice problem-2
Let there be 10 balls (6 white and 4 red balls) in a box and let it be
required to pick up the red balls from them. Suppose we pick up 7 balls as
the red balls of which only 2 are actually red balls. What are the values of
precision and recall in picking red ball?
MATRUSRI
ENGINEERING COLLEGE
Practice Problem-3
A database contains 80 records on a particular topic of which 55 are
relevant to a certain investigation. A search was conducted on that topic
and 50 records were retrieved. Of the 50 records retrieved, 40 were
relevant. Construct the confusion matrix for the search and calculate
the precision and recall scores for the search.
SOLUTION:
Each record may be assigned a class label “relevant" or “not relevant”. All
the 80 records were tested for relevance. The test classified 50 records as
“relevant”. But only 40 of them were actually relevant. Hence we have
the following confusion matrix for the search:
MATRUSRI
ENGINEERING COLLEGE
MATRUSRI
ENGINEERING COLLEGE
Sample problem
Suppose 10000 patients get tested for flu; out
of them, 9000 are actually healthy and 1000
are actually sick. For the sick people, a test
was positive for 620 and negative for 380. For
the healthy people, the same test was
positive for 180 and negative for 8820.
Construct a confusion matrix for the data and
compute the accuracy, precision and recall for
the data.
MATRUSRI
ENGINEERING COLLEGE
ROC space
•We plot the values of FPR along the horizontal axis
(that is , x-axis) and the values of TPR along the
vertical axis (that is, y-axis) in a plane.
ROC space
The position of the point (FPR,TPR) in the ROC
space gives an indication of the performance
of the classifier.
ROC curve
•In the case of certain classification algorithms, the
classifier may depend on a parameter.
ROC curve
•The closer the ROC curve is to the top left corner (0,
1) of the ROC space, the better the accuracy of the
classifier.
•The higher the BMI the higher the risk of developing breast cancer.
•Gives real data of a breast cancer study with a sample having 100
patients and 200 normal persons.
•The table also shows the values of TPR and FPR for various cut-off
values of BMI.
MATRUSRI
ENGINEERING COLLEGE
Sample problem
•Consider a set of patients coming for treatment in a certain clinic.
•Let A denote the event that a “Patient has liver disease” and B the
event that a “Patient is an alcoholic.”
•It is known from experience that 10% of the patients entering the
clinic have liver disease and 5% of the patients are alcoholics.
•Also, among those patients diagnosed with liver disease, 7% are
alcoholics.
•Given that a patient is alcoholic, what is the probability that he will
have liver disease?
MATRUSRI
ENGINEERING COLLEGE