0% found this document useful (0 votes)
10 views

Machine Learning Terminology

The document discusses machine learning terminology like training and test datasets used to build and evaluate models. It also covers concepts like cross validation, confusion matrices, performance metrics like recall, precision, accuracy and F1 score. The document further explains AUC-ROC curves, variance, bias-variance tradeoff, overfitting and underfitting.

Uploaded by

aditidhepe25
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Machine Learning Terminology

The document discusses machine learning terminology like training and test datasets used to build and evaluate models. It also covers concepts like cross validation, confusion matrices, performance metrics like recall, precision, accuracy and F1 score. The document further explains AUC-ROC curves, variance, bias-variance tradeoff, overfitting and underfitting.

Uploaded by

aditidhepe25
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

MACHINE LEARNING

TERMINOLOGY
TRAINING & TESTING SET
Training Dataset
1. The sample of data used to fit the model.
2. The model sees and learns from this data.
Test Dataset
1. The sample of data used to provide an unbiased evaluation of a final
model fit on the training dataset.
CROSS VALIDATION (K-FOLD)

Cross-validation is a technique in which we train


our model using the subset of the data-set and then
evaluate using the complementary subset of the
data-set.

The three steps involved in cross-validation are as


follows :
 Reserve some portion of sample data-set.
 Using the rest data-set train the model.
 Test the model using the reserve portion of the data-set.
CONFUSION MATRIX
Confusion Matrix is a tool to determine the performance of classifier. It contains
information about actual and predicted classifications.

1. True Positive (TP)

2. False Negative (FN)


3. False positive (FP)
4. True Negative (TN)
RECALL (SENSITIVITY)
It is measure of positive examples labeled as positive by classifier. It should
be higher.

Sensitivity = 45/(45+20) = 69.23%


SPECIFICITY (TRUE NEGATIVE
RATE)
It is measure of negative examples labeled as negative by classifier. There
should be high specificity.

Specificity = 30/(30+5) = 85.71% .


PRECISION
Precision is ratio of total number of correctly classified positive examples and
the total number of predicted positive examples.

Precision = 45/(45+5)= 90%


ACCURACY
Accuracy is the proportion of the total number of predictions that are correct.

Accuracy = (45+30)/(45+20+5+30) = 75%


F1 SCORE
F1 score is a weighted average of the recall (sensitivity) and precision.
SUMMARY OF CONFUSION
MATRIX
WHAT IS THE AUC-ROC
CURVE?
The Receiver Operator Characteristic (ROC) curve is an evaluation metric
for binary classification problems. It is a probability curve that plots the TPR
against FPR at various threshold values.
The Area Under the Curve (AUC) is the measure of the ability of a
classifier to distinguish between classes and is used as a summary of the ROC
curve.
VARIANCE
•When a model does not perform as well as it does with the trained data set, there is a
possibility that the model has a variance.
•It basically tells how scattered the predicted values are from the actual values.
BIAS-VARIANCE TRADE-OFF
Bias: Error in training data

Variance: Error in test data


OVERFITTING
A statistical model is said to be overfitted when we feed it a lot more data than
necessary.
Training Data Accuracy is high and Test Data Accuracy is low.
UNDERFITTING
In order to avoid overfitting, we could stop the training at an earlier stage.
Training Data Acc is low and Test Data Acc is low
Thank you…

You might also like