unit 4
unit 4
Selection
Unit 4
Model Evaluation
Model evaluation is the process of using different evaluation metrics
to understand a machine learning model's performance, as well as its
strengths and weaknesses.
It is a crucial step in the development and deployment of machine
learning systems.
The primary goal of model evaluation is to determine how well the
model generalizes to unseen data and whether it meets the desired
objectives.
Model Evaluation techniques
Training Data:
Training data are collections of examples or samples that are
used to 'teach' or 'train the machine learning model.
The model uses a training data set to understand the patterns
and relationships within the data, thereby learning to make
predictions or decisions without being explicitly programmed to
perform a specific task.
It is the set of data that is used to train and make the model
learn the hidden features/patterns present in the data.
Validation Data:
The validation data is a set of data that is used to validate the
model performance during training.
This data is held aside during the modelling process and used
only to evaluate a model after the modelling is complete.
After training a machine learning model using the training data,
the model's performance is evaluated using the validation data.
This evaluation typically involves measuring metrics such as
accuracy, precision, recall, F1 score, or other relevant
performance indicators, depending on the nature of the
problem being solved.
Testing Data:
The testing data is used to evaluate the accuracy of the trained
algorithm.
Data that held aside during the modelling process and used
only to evaluate a model after the modelling is complete.
Test data has the same variables as the training data, the same
set of independent variables and the dependent variable.
Overfitting:
Definition:
Overfitting occurs when a model learns the training data too well,
capturing noise or random fluctuations in the data as if they were
genuine patterns. Consequently, the model performs well on the
training data but fails to generalize to new, unseen data.
Characteristics:
Low bias: The model has low bias as it fits the training data very
closely. Bias is the inability for ML model to get a proper relationship
between variables.
High variance: However, it has high variance because it fails to
generalize well to unseen data. In ML the difference in fits between
data sets is called variance.
It will have excellent performance on training data but poor
performance on test data.
Causes:
Using a too complex model or algorithm.
Having too many features relative to the amount of training data.
Insufficient regularization. Regularization refers to techniques that
are used to compare machine learning models in order to minimize
the adjusted loss function and prevent overfitting or under fitting.
Using Regularization, we can fit our machine learning model
appropriately on a given test set and hence reduce the errors in it.
Loss functions are a measurement of how good your model is in
terms of predicting the expected outcome.
Regularization in machine learning is a technique used to prevent
overfitting and improve the generalization ability of a model
Remedies:
Simplify the model by reducing the number of features or
decreasing its complexity.
Cross-validation to tune hyper parameters and prevent
overfitting.
Early stopping during training to prevent the model from
learning noise in the data.
Under fitting
Definition: Under fitting occurs when a model is too simple to capture the underlying
structure of the data. In other words, the model fails to learn the patterns in the
training data, resulting in poor performance not only on the training data but also on
unseen data (test data).
Characteristics:
High bias: The model is biased toward a certain set of assumptions and fails to
capture the complexity of the data.
Poor performance: Both on training and test data, the model's performance is
poor.
Causes:
Using a too simple model or algorithm.
Insufficient training data.
Insufficient training time.
Remedies:
Increase model complexity by adding more features or
increasing the model's capacity.
Use more advanced algorithms that can capture complex
patterns.
Gather more training data.
Train the model for longer periods.
How to overcome over fitting and under fitting
in model?
Variance-bias tradeoff:
If the algorithm is too simple then it may be on high bias and low variance
condition and thus is error-prone. If algorithms fit too complex then it may
be on high variance and low bias.
The ideal model lies between these two extremes.
If we make the model more complex (to reduce bias), you risk increasing
variance.
If you simplify the model (to reduce variance), you risk increasing bias.
The challenge is to find a balance where the model is complex enough to
capture important patterns but simple enough to generalize well.
Cross-validation
Cross-validation is a technique used to evaluate the
performance of a machine learning model by splitting the data
into multiple parts. Instead of using just one training and one
test set, the data is divided into "folds," and the model is
trained and tested on different combinations of these folds.
How It Works:
1. Split the Data: The dataset is divided into k equal parts (folds).
2. Train and Test:
The model is trained on k-1 folds.
It is tested on the remaining 1 fold.
3. Repeat: This process is repeated k times, with each fold used as the
test set once.
4. Average Results: The final model performance is the average of the
results from all folds.
Example:
Prevents Overfitting:
Disadvantages :
1. Computationally Expensive: Involves multiple training and testing cycles,
increasing resource usage.
2. Time-Consuming: Can be slow for large datasets or complex models.
3. Not Always Necessary: May be excessive for small datasets or already well-
performing models.
4. Risk of Data Leakage: Improper splitting can introduce information from
test data into training.
Hyper parameter tuning:
Hyperparameters are external configurations that are not learned from
the data but set before training.
Examples:
• Learning Rate: Controls how much the model adjusts during training.
• Number of Trees: In decision trees or random forests.
• Batch Size: Number of samples processed before updating the model.
Step 1: Initialize the model parameters randomly or with some starting values.
Step 2: Randomly select one data point (or a mini-batch of data points).
Step 3: Calculate the gradient of the loss function with respect to the model
parameters using that single data point.
Step 4: Update the model parameters by moving in the opposite direction of the
gradient (to minimize the loss). The update rule is typically:
θ=θ−η⋅∇L(θ)
where:
θ are the model parameters,
η is the learning rate (step size),
∇L(θ) is the gradient of the loss function with respect to the parameters.
Step 5: Repeat this process for a specified number of iterations (epochs), going
through the dataset multiple times.
Advantages of SGD:
1. Faster Convergence:
Quicker updates: SGD updates parameters using a single data point (or
mini-batch) at a time, leading to faster parameter adjustments compared to
traditional gradient descent, which requires computing gradients over the
entire dataset.
Frequent updates: With each data point processed, the model receives
immediate feedback, speeding up convergence in the early stages.
Resource efficiency: SGD doesn't require loading the entire dataset into
memory, making it faster and more computationally efficient.