0% found this document useful (0 votes)
6 views17 pages

Kkk

The document discusses the overfitting problem in machine learning, outlining its causes and providing strategies to prevent it, such as regularization techniques (L1, L2, Elastic), dropout, and early stopping. It explains how overfitting occurs due to small training data, noise, and model complexity, and emphasizes the importance of feature selection and data augmentation. Additionally, it details the mathematical foundations and applications of different regularization methods to improve model generalization.

Uploaded by

tonystark369845
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views17 pages

Kkk

The document discusses the overfitting problem in machine learning, outlining its causes and providing strategies to prevent it, such as regularization techniques (L1, L2, Elastic), dropout, and early stopping. It explains how overfitting occurs due to small training data, noise, and model complexity, and emphasizes the importance of feature selection and data augmentation. Additionally, it details the mathematical foundations and applications of different regularization methods to improve model generalization.

Uploaded by

tonystark369845
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

UNIT - 3

▪ Overfitting Problem

▪ Regularization (Ridge, Lasso, Elastic)

▪ Dropout and Early Stopping

Dr. Tarandeep Singh 21 February 2025 23


Overfitting Problem

Why does overfitting occur?


➢ The training data size is too small and does not contain enough data samples to accurately
represent all possible input data values.
➢ The training data contains large amounts of irrelevant information, called noisy data.
➢ The model trains for too long on a single sample set of data.
➢ The model complexity is high, so it learns the noise within the training data.

Dr. Tarandeep Singh 21 February 2025 24


Overfitting Problem

Overfitting Examples
Consider a use case where a machine learning model has to analyze photos and identify the ones that contain dogs in
them. If the machine learning model was trained on a data set that contained majority photos showing dogs outside in
parks , it may may learn to use grass as a feature for classification and may not recognize a dog inside a room.

Dr. Tarandeep Singh 21 February 2025 25


Overfitting Problem

How to prevent overfitting?


You can prevent overfitting by diversifying and scaling your training data set or using some strategies like:

Early stopping
Early stopping pauses the training phase before the machine learning model learns the noise in the data. However,
getting the timing right is important; else the model will still not give accurate results.

Pruning
You might identify several features or parameters that impact the final prediction when you build a model. Feature
selection—or pruning—identifies the most important features within the training set and eliminates irrelevant ones. For
example, to predict if an image is an animal or human, you can look at various input parameters like face shape, ear
position, body structure, etc. You may prioritize face shape and ignore the shape of the eyes.

Dr. Tarandeep Singh 21 February 2025 26


Overfitting Problem

How to prevent overfitting?


You can prevent overfitting by diversifying and scaling your training data set or using some strategies like:

Regularization (Ridge/Lasso/Elastic/Dropout)
Regularization is a collection of training/optimization techniques that seek to reduce overfitting. These methods try to
eliminate those factors that do not impact the prediction outcomes by grading features based on importance. For
example, mathematical calculations apply a penalty value to features with minimal impact.

Ensembling
Ensembling combines predictions from several separate machine learning algorithms. Some models are called weak
learners because their results are often inaccurate. Ensemble methods combine all the weak learners to get more
accurate results. The two main ensemble methods are bagging and boosting. Boosting trains different machine learning
models one after another to get the final result, while bagging trains them in parallel.

Dr. Tarandeep Singh 21 February 2025 27


Overfitting Problem

How to prevent overfitting?


You can prevent overfitting by diversifying and scaling your training data set or using some strategies like:
Lasso Regularization/ L1 regularization
➢ It stands for Least Absolute and Selection Operator.
➢ It added the penalty term contains only the “absolute weights” .
➢ Since it takes absolute values, hence, it can shrink the slope to 0, whereas Ridge Regression can only shrink it near
to 0.

So, our L1 regularization technique would assign the less signification feature with a zero weight, if it doesn’t
have a significant effect on the prediction of our target column.

Dr. Tarandeep Singh 21 February 2025 28


Overfitting Problem

How to prevent overfitting?


You can prevent overfitting by diversifying and scaling your training data set or using some strategies like:
Ridge Regularization/L2 regularization
➢ A Ridge regressor is basically a regularized version of a Linear Regressor. i.e to the original cost function of linear
regressor, we add a regularized term that forces the learning algorithm to fit the data and helps to keep the
weights lower as possible.
➢ Adds “squared magnitude)” of coefficient (square of weights) as penalty term to the loss function. Here
the highlighted outline part represents L2 regularization element.

❖ If lambda is zero, then it get back to OLS (ordinary linear regression).


❖ If lambda is very large, then it will add too much weight and it will lead to under-fitting.
“It’s important how lambda is chosen. This technique works very well to avoid over-fitting issue”.

Dr. Tarandeep Singh 21 February 2025 29


L1 Regularisation

Dr. Tarandeep Singh 21 February 2025 30


L2 Regularisation

Dr. Tarandeep Singh 21 February 2025 31


Out of the Box Question

L1 and L2 Regularization. Why are they named so?


➢ The names "L1" and "L2" come from the norm used to calculate the regularization term.

➢ In L1 regularization, the norm used is the L1 norm, which is the sum of the absolute values of the

elements.

➢ In L2 regularization, the norm used is the L2 norm, which is the square root of the sum of the squared

values of the elements.

➢ These norms are commonly used in mathematics, and they determine the type of regularization

applied to the model.


Overfitting Problem

How to prevent overfitting?


You can prevent overfitting by diversifying and scaling your training data set or using some strategies like:
Elastic Regularization (Combination of both Ridge and Lasso)
➢ Elastic regularization is a combination of both L1 (Lasso) and L2 (Ridge) regularization techniques.
➢ Elastic adds both L1 and L2 penalty terms alpha (α) and lambda (λ) to the loss function during training, allowing
for a more flexible and balanced approach to regularization.
➢ It address some of the limitations of each individual regularization method.

Elastic Net is particularly useful when dealing with high-


dimensional datasets or datasets where many features may be
correlated.

Dr. Tarandeep Singh 21 February 2025 33


How the penalty is calculated in any cost function

➢ In L1 and L2 regularization, the penalty term (often denoted as α or λ, depending on notation) is a


hyperparameter need to specify before training the model. It controls the strength of regularization.

➢ The larger the value of α or λ, the stronger the regularization effect.

We can typically choose and calculate the penalty term for L1 and L2 regularization:

L1 Regularization (Lasso):

In L1 regularization, it aims to minimize the defined cost function:


Cost(L1) = Cost(Original) + λ * Σ|θ_i|

➢ λ is the regularization parameter for L1 regularization.


➢ You need to choose an appropriate value for λ through techniques like cross-validation or grid search.
➢ Common values to try for λ include a range of positive values, often in a logarithmic scale (e.g., 0.001, 0.01,
0.1, 1, 10).

Dr. Tarandeep Singh 21 February 2025 34


How the penalty is calculated in any cost function
L2 Regularization (Ridge):
In L2 regularization, it aims to minimize the following cost function:

Cost(L2) = Cost(Original) + λ * Σ(θ_i^2)

➢ λ is the regularization parameter for L2 regularization.


➢ We need to choose an appropriate value for λ through techniques like cross-validation or grid search.
➢ Common values for λ in L2 regularization also include a range of positive values, often in a logarithmic
scale.
Keep in mind that the appropriate value of γ can vary from one problem to
another, so it's essential to experiment with different values to find the one
that balances model complexity (number and magnitude of coefficients) and
Selection of γ: model accuracy effectively.

➢ The choice of λ depends on your specific problem and dataset.


➢ You typically start with a broad search over a range of λ values, and you can use techniques like k-fold
cross-validation to evaluate how well your model performs for each value of λ.
➢ The value of λ that results in the best model performance (e.g., lowest validation error) is typically
chosen.

Dr. Tarandeep Singh 21 February 2025 35


Overfitting Problem

How to prevent overfitting?


You can prevent overfitting by diversifying and scaling your training data set or using some strategies like:

Dr. Tarandeep Singh 21 February 2025 36


NOTES : Regularisation Techniques

Regularization is a set of techniques used in machine learning and statistical modeling to prevent overfitting and improve the
generalization performance of a model.
Regularization methods introduce a penalty term into the model's objective function, encouraging it to have simpler and more
stable patterns that generalize better to new data.

There are several common types of regularization techniques used in machine learning, including:

L1 Regularization (Lasso): L1 regularization adds a penalty term to the model's objective function that is proportional to the
absolute values of the model's coefficients. It encourages the model to have sparse feature weights, effectively selecting a subset
of the most important features while setting others to zero. L1 regularization is useful for feature selection.

L2 Regularization (Ridge): L2 regularization adds a penalty term to the objective function that is proportional to the squared
values of the model's coefficients. It encourages the model to have small, evenly distributed feature weights. L2 regularization
helps prevent large coefficients that might lead to overfitting.

Elastic Net Regularization: Elastic Net is a combination of L1 and L2 regularization. It adds both L1 and L2 penalty terms to
the objective function, allowing for feature selection and coefficient shrinkage. Elastic Net is useful when there are many
features, and some of them are highly correlated.

Dr. Tarandeep Singh 21 February 2025 37


NOTES : Regularisation Techniques
Data Augmentation: Data augmentation is a
Dropout (for Neural Networks): Dropout is a regularization technique used primarily in computer
regularization technique specifically used in neural vision. It involves generating new training examples by
networks. During training, dropout randomly deactivates applying various transformations (e.g., rotation,
a fraction of neurons (units) in a neural network, translation, cropping) to the existing data. This helps the
effectively preventing the network from relying too model generalize better by exposing it to more diverse
heavily on any one neuron or feature. This encourages variations of the data.
the network to learn more robust and generalizable
representations. Pruning (for Decision Trees): Pruning is a regularization
technique for decision trees. It involves removing branches
Early Stopping: Early stopping is a regularization or nodes from a decision tree that do not significantly
technique that stops the training process when the improve the model's performance. Pruning prevents the
model's performance on a validation dataset starts to tree from becoming too deep and complex.
degrade. It prevents the model from continuing to learn
❑ The choice of regularization technique and the strength of
the noise in the training data.
regularization (controlled by hyperparameters like
regularization strength or dropout rate) depend on the specific
problem and the characteristics of the data.
❑ Regularization is a crucial tool for achieving better model
performance and preventing overfitting in various machine
learning algorithms.

Dr. Tarandeep Singh 21 February 2025 38


References:
https://www.edureka.co/blog/regularization-in-machine-learning/
https://www.dataquest.io/blog/regularization-in-machine-learning/

https://www.mygreatlearning.com/blog/what-is-ridge-regression/
https://neptune.ai/blog/fighting-overfitting-with-l1-or-l2-regularization

You might also like