0% found this document useful (0 votes)
6 views

Model Evaluation

Uploaded by

Aathmika Vijay
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Model Evaluation

Uploaded by

Aathmika Vijay
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Model Evaluation

Model Evaluation
• Assessing the performance and effectiveness of models.
• It involves measuring the accuracy and reliability of predictions made
by the models.
Importance
• It helps determine the quality and reliability of a predictive model.
• By evaluating a model, data scientists can assess how well it
generalizes to unseen data and whether it meets the desired
performance standards
• model evaluation aids in the comparison of different models or
variations of the same model, allowing data scientists to select the
most suitable one for a given problem.
• It enables the identification of potential issues such as overfitting or
underfitting, which can be addressed to improve model performance.
Overfitting and Underfitting

• Overfitting and underfitting - common issues in machine learning


models.
• Overfitting - occurs when a model performs exceptionally well on the
training data but fails to generalize to unseen data.
• Underfitting- happens when a model is too simple to capture the
underlying patterns in the data.
Evaluation metrics
Accuracy
• Fundamental evaluation metric that measures the overall correctness
of predictions made by a model.

• It calculates the ratio of correctly predicted samples to the total


number of samples in the dataset.
Precision
• A metric that quantifies the ability of a model to accurately identify
positive samples.

• It calculates the ratio of true positive predictions to the total number


of positive predictions (true positive + false positive).
• Precision is useful when the cost of false positives is high.
Recall

• Also known as sensitivity or true positive rate, measures the model’s


ability to identify all positive samples correctly.
• It calculates the ratio of true positive predictions to the total number
of actual positive samples (true positive + false negative).
• The recall is crucial when the cost of false negatives is high.
F1 Score
• Is a harmonic mean of precision and recall.
• It provides a single metric that combines both precision and recall,
giving a balanced measure of a model’s performance.
• The F1 score is especially useful when there is an uneven class
distribution in the dataset.
ROC Curve and AUC
• The ROC (Receiver Operating Characteristic) curve is a graphical
representation of a model’s performance across various classification
thresholds.
• It plots the true positive rate against the false positive rate, allowing
data scientists to evaluate the trade-off between sensitivity and
specificity.
• The area under the ROC curve is a scalar value that summarizes the
overall performance of a model.
Confusion Matrix
• Provides a comprehensive evaluation of a model’s performance by
summarizing the number of correct and incorrect predictions for each
class.
• It enables the calculation of various metrics such as accuracy,
precision, recall, and F1 score.
Mean Absolute Error (MAE)

• Mean Absolute Error is an evaluation metric commonly used for


regression tasks.
• It measures the average absolute difference between the predicted
and actual values.
• MAE provides a straightforward interpretation of the model’s
performance.
Mean Squared Error (MSE)

• Mean Squared Error is another regression evaluation metric that


calculates the average squared difference between the predicted
and actual values.
• MSE penalizes larger errors more significantly than MAE, making it
suitable for models where larger errors are considered more critical.
Root Mean Squared Error (RMSE)

• Root Mean Squared Error is the square root of the MSE.


Summary
• Selecting the appropriate evaluation metrics depends on the nature
of the problem and the specific goals of the project.
• For classification tasks, metrics like accuracy, precision, recall, and
F1 score are commonly used.
• In regression tasks, metrics such as MAE, MSE, and RMSE are widely
employed to assess the model’s predictive performance.
Cross-Validation
• A technique used to evaluate the performance of a model on multiple
subsets of data.
• It helps assess the model’s ability to generalize well by providing a
more robust estimate of performance.
• When only a limited amount of data is available, to achieve an
unbiased estimate of the model performance we use k-fold cross-
validation.
• In k-fold cross-validation, we divide the data into k subsets of equal
size.
• We build models k times, each time leaving out one of the subsets
from training and use it as the test set.

You might also like