0% found this document useful (0 votes)

3 views

Feature selection

The document discusses overfitting and underfitting in machine learning, explaining how overfitting occurs when a model learns noise in the data, while underfitting happens when a model fails to capture relationships in the data. It introduces regularization techniques, specifically Lasso and Ridge regression, which help prevent overfitting by adding penalty terms to the cost function. Additionally, it covers feature selection methods, including wrapper, filter, and embedded methods, to improve model performance and manage high-dimensional data challenges.

Uploaded by

laxman.22bce8268

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Feature selection

Uploaded by

laxman.22bce8268

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Overfitting and Underfitting

Overfitting
● A scenario where the machine learning model
tries to learn from the details along with the
noise in the data and tries to fit each data point
on the curve is called Overfitting.

Underfitting
● A scenario where a machine learning model can
neither learn the relationship between variables
in the testing data nor predict or classify a new
data point is called Underfitting.
Regularization
● Regularization approach prevents the model from overfitting by adding extra
information (training samples) to the training dataset.
● It maintains all variables or features in the model by reducing the magnitude of the
variables for better performance and generalization of the model.
● Primary process is regularizing or reducing the magnitude of the features without
changing the number of features.
Working Principle
● Regularization works by adding a penalty or complexity term to the complex model.
○ Regularization = Loss + λ |w|
Where |w| = |w1| + |w2| + …. + |wn|

● Ridge Example: ● Lasso Example:

Y = 0.9 + 20x1 + 1.2x2 + 40x3 Y = 0.9 + 20x1 + 1.2x2 + 40x3
Y = 0.9 + 1.5x1 + 0.6x2 + 5x3 Y = 0.9 + 0x1 + 0x2 + 5x3
Regularization
● Cost function for the linear model is
Cost Function = Total Error(𝑦𝑖, 𝑦𝑖′) = J(𝜃) =

● Linear model optimizes the 𝜃0 and b to minimize the error.

M= No. of Samples, n = No. of features.
● Loss function for linear model is Least Squared Error or MSE.

Types for Regularized Linear Models

● LASSO (Least Absolute Shrinkage and Selection Operator) Regression (L1
regularization)
● Ridge Regression (L2 regularization)
Ridge Regression (L2 regularization)
● Ridge regression is a regularization technique that reduces the complexity of the model.
● A general linear or polynomial regression performance is poor when high collinearity
between the independent variables. Ridge regression addresses the High collinearity issue
● In Ridge regression, a small amount of bias is added to get better long-term predictions.

● Ridge regression = Combination of Linear regression and L2 norm i.e., |𝜃j|2

● The cost function added with the penalty term bias is called Ridge Regression penalty.
Ridge Regression (L2 regularization)
● It is calculated by multiplying with the lambda the squared weight (coefficient) of
each feature.
● The equation for the cost function
Cost Function = Error(𝑦𝑖, 𝑦𝑖′) = J(𝜃) =
● λ is regularized parameter [0 to 1], M = No. of Samples, n = No of features, 𝜃0 is not
penalized.
● The penalty term regularizes the coefficients of the model.
● If the values of λ tend to zero, the equation becomes the cost function of the linear
regression model.
● Hence, the model will resemble the linear regression model for the minimum value of
λ.
Ridge Regression (L2 regularization) - Example
Cost Function =
(or)
= loss + λ * (slope of the curve)2
For the Linear Regression line, let’s consider two points that are on
the line,
● Loss = 0 (considering the two points on the line)
● λ= 1
● The slope of the curve= 1.4
Cost function = 0 + 1 x (1.4)2 = 1.96
For Ridge Regression, let’s assume,
Loss = 0.32 + 0.22 = 0.13
λ=1
The slope of the curve = 0.7
Then, Cost function = 0.13 + 1 x 0.72 = 0.62

Ridge regression line fits the model more accurately than the linear regression line.
Lasso Regression (L1 regularization)
● Lasso regression is also a regularization technique that reduces the complexity of the
model.

● Lasso regression is defined as combination of Linear regression and L1 norm i.e., |𝜃j|.
● It can shrink the slope to 0 due to absolute values. But Ridge Regression can only
shrink it near to 0.
● The equation for the cost function of Lasso regression:
Cost Function = Error(𝑦𝑖, 𝑦𝑖′) = J(𝜃) =

● M= No. of Samples, n – No of features, Lambda range is 0 to 1. 𝜃0 is not penalized.

● The penalty term regularizes the coefficients of the model.
Lasso Regression (L1 regularization) - Example
Cost Function =
(or)
= loss + λ * |slope of the curve|
For the Linear Regression line, let’s consider two points that
are on the line,
● Loss = 0 (considering the two points on the line)
● λ= 1
● The slope of the curve= 1.4
Cost function = 0 + 1 x (1.4) = 1.4
For Lasso Regression, let’s assume,
Loss = 0.32 + 0.12 = 0.1
λ=1
The slope of the curve = 0.7
Then, Cost function = 0.1 + 1 x 0.7 = 0.8

Lasso regression line fits the model more accurately than the linear regression line.
Difference between Ridge Regression and Lasso
Regression
● Ridge regression is mostly used to reduce the overfitting in the model and includes all
the features present in the model.
● It reduces the complexity of the model by shrinking the coefficients.

● Lasso regression helps reduce the overfitting in the model and feature selection.
Curse of Dimensionality
● The Curse of Dimensionality in Machine Learning arises when working with
high-dimensional data, leading to increased computational complexity, overfitting, and
spurious correlations.
● In high-dimensional spaces, data points become sparse, making it challenging to discern
meaningful patterns or relationships due to the vast amount of data required to adequately
sample the space.
● The Curse of Dimensionality significantly impacts machine learning algorithms in
various ways.
● It leads to increased computational complexity, longer training times, and higher resource
requirements.
● Moreover, it escalates the risk of overfitting and spurious correlations, hindering the
algorithms’ ability to generalize well to unseen data.
Wrapper based methods
● In wrapper methodology, selection of features is done by considering it as a search problem, in which
different combinations are made, evaluated, and compared with other combinations. It trains the
algorithm by using the subset of features iteratively.
● Forward selection -
○ Forward selection is an iterative process, which begins with an empty set of features.
○ After each iteration, it keeps adding on a feature and evaluates the performance to check
whether it is improving the performance or not.
○ The process continues until the addition of a new variable/feature does not improve the
performance of the model.

● Backward elimination -
○ Backward elimination is also an iterative approach, but it is the opposite of forward selection.
○ This technique begins the process by considering all the features and removes the least
significant feature.
○ This elimination process continues until removing the features does not improve the
performance of the model.
Wrapper based methods
● Exhaustive Feature Selection-

○ Exhaustive feature selection is one of the best feature selection methods, which evaluates
each feature set as brute-force.

○ It means this method tries & make each possible combination of features and return the best
performing feature set.

● Recursive Feature Elimination-

○ Recursive feature elimination is a recursive greedy optimization approach, where features are
selected by recursively taking a smaller and smaller subset of features.

○ Now, an estimator is trained with each set of features, and the importance of each feature is
determined using coef_attribute or through a feature_importances_attribute.
Subset selection
Filter based methods
● In Filter Method, features are selected on the basis of statistics measures.

● This method does not depend on the learning algorithm and filters out the irrelevant feature and
redundant columns from the model by using different metrics through ranking

● Information Gain:

○ Information gain determines the reduction in entropy while transforming the dataset.

○ It can be used as a feature selection technique by calculating the information gain of each
variable with respect to the target variable.
Filter based methods
● Chi-square Test:
○ Chi-square test is a technique to determine the relationship between the categorical variables.
○ The chi-square value is calculated between each feature and the target variable, and the
desired number of features with the best chi-square value is selected.
● Fisher's score:
○ Fisher's score returns the rank of the variable on the fisher's criteria in descending order.
○ Then we can select the variables with a large fisher's score.
● Missing Value Ratio:
○ The value of the missing value ratio can be used for evaluating the feature set against the
threshold value.
○ The variable is having more than the threshold value can be dropped.
Embedded methods
● Embedded methods combined the advantages of both filter and wrapper methods by considering
the interaction of features along with low computational cost. These are fast processing methods
similar to the filter method but more accurate than the filter method.
● Regularization- Regularization adds a penalty term to different parameters of the machine
learning model for avoiding overfitting in the model. This penalty term is added to the coefficients;
hence it shrinks some coefficients to zero. Those features with zero coefficients can be removed
from the dataset. The types of regularization techniques are L1 Regularization (Lasso
Regularization) or Elastic Nets (L1 and L2 regularization).
● Random Forest Importance - Different tree-based methods of feature selection help us with
feature importance to provide a way of selecting features. Here, feature importance specifies which
feature has more importance in model building or has a great impact on the target variable.
Random Forest is such a tree-based method, which is a type of bagging algorithm that aggregates a
different number of decision trees. It automatically ranks the nodes by their performance or
decrease in the impurity (Gini impurity) over all the trees. Nodes are arranged as per the impurity
values, and thus it allows to pruning of trees below a specific node. The remaining nodes create a
subset of the most important features.

Machine Learning With Ridge and Lasso Regression
No ratings yet
Machine Learning With Ridge and Lasso Regression
19 pages
Simulation and Modeling MCQ'S
100% (13)
Simulation and Modeling MCQ'S
9 pages
9_Linear Regression-Problems and Solutions
No ratings yet
9_Linear Regression-Problems and Solutions
23 pages
Regularization_(1)
No ratings yet
Regularization_(1)
3 pages
Handout5 Regularization
No ratings yet
Handout5 Regularization
20 pages
Sp 24 BADM 576 Final_Exam_Study_Guide.docx
No ratings yet
Sp 24 BADM 576 Final_Exam_Study_Guide.docx
13 pages
PA Notes 2
No ratings yet
PA Notes 2
23 pages
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
No ratings yet
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
43 pages
Machine learning
No ratings yet
Machine learning
19 pages
Lasso and Ridge Regression
No ratings yet
Lasso and Ridge Regression
30 pages
AI34
No ratings yet
AI34
3 pages
Lec 07-08 - Final
No ratings yet
Lec 07-08 - Final
32 pages
Unit 2
No ratings yet
Unit 2
8 pages
Regularization
No ratings yet
Regularization
5 pages
Honours 1
No ratings yet
Honours 1
5 pages
Lecture 19
No ratings yet
Lecture 19
25 pages
ML Solved Endsem
No ratings yet
ML Solved Endsem
16 pages
Supervised Regression Notes
No ratings yet
Supervised Regression Notes
11 pages
ML models and when to choose one over others
No ratings yet
ML models and when to choose one over others
7 pages
Unit 2
No ratings yet
Unit 2
92 pages
ML 3 (1)
No ratings yet
ML 3 (1)
50 pages
Mod4 Eda
No ratings yet
Mod4 Eda
13 pages
Linear Regression Python Programming
No ratings yet
Linear Regression Python Programming
25 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Bias
No ratings yet
Bias
62 pages
Module 3
No ratings yet
Module 3
35 pages
21csc305p Ml Unit 2 Ppt
No ratings yet
21csc305p Ml Unit 2 Ppt
115 pages
Regularization and Feature Selectio N
No ratings yet
Regularization and Feature Selectio N
102 pages
1. Lecture+Notes+-+Advanced+Regression
No ratings yet
1. Lecture+Notes+-+Advanced+Regression
12 pages
ML Assignment
No ratings yet
ML Assignment
5 pages
Chapter+3+ ++Regression+Algorithms
No ratings yet
Chapter+3+ ++Regression+Algorithms
22 pages
Detailed_Breakdown_Ridge_Lasso
No ratings yet
Detailed_Breakdown_Ridge_Lasso
2 pages
ML-1
No ratings yet
ML-1
24 pages
ML PYQs
No ratings yet
ML PYQs
32 pages
Ridge Lasso Regression Bias Variance Tradeoff 71
No ratings yet
Ridge Lasso Regression Bias Variance Tradeoff 71
19 pages
Edab Module - 4
No ratings yet
Edab Module - 4
16 pages
Slide 1
No ratings yet
Slide 1
4 pages
Module 3.3 Classification Models, An Overview
No ratings yet
Module 3.3 Classification Models, An Overview
11 pages
ML_AI
No ratings yet
ML_AI
53 pages
Ridge and Lasso Regression in Python
No ratings yet
Ridge and Lasso Regression in Python
18 pages
ML Unit 3
No ratings yet
ML Unit 3
2 pages
Data Analytics_Ridge and LASSO Regression
No ratings yet
Data Analytics_Ridge and LASSO Regression
15 pages
0 Regularization PDF
No ratings yet
0 Regularization PDF
88 pages
Describe in Brief Different Types of Regression Algorithms
No ratings yet
Describe in Brief Different Types of Regression Algorithms
25 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
MLT Content
No ratings yet
MLT Content
3 pages
Lecture 09 ML
No ratings yet
Lecture 09 ML
26 pages
Linear Models
No ratings yet
Linear Models
50 pages
Advanced Regression Assignment
No ratings yet
Advanced Regression Assignment
5 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
Chapter 3. Linear Regression
No ratings yet
Chapter 3. Linear Regression
41 pages
CSL0777 L17
No ratings yet
CSL0777 L17
27 pages
Regularization
No ratings yet
Regularization
4 pages
Machine
No ratings yet
Machine
21 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
54 pages
ML Unit 3 Assignment
No ratings yet
ML Unit 3 Assignment
13 pages
Regression_Questionnaire
No ratings yet
Regression_Questionnaire
10 pages
Karthik Nambiar 60009220193
No ratings yet
Karthik Nambiar 60009220193
9 pages
Lasso Vs Ridge Vs Elastic 1
No ratings yet
Lasso Vs Ridge Vs Elastic 1
5 pages
B Ridge - and - Lasso - Regression
No ratings yet
B Ridge - and - Lasso - Regression
5 pages
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
From Everand
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
Fouad Sabry
No ratings yet
Module 1 Lecture 4-Probability Distributions
No ratings yet
Module 1 Lecture 4-Probability Distributions
39 pages
Module 1 Lecture 3_Linear Algibra
No ratings yet
Module 1 Lecture 3_Linear Algibra
34 pages
14-Views of tree-04-02-2025
No ratings yet
14-Views of tree-04-02-2025
21 pages
Kernal and Multiclass
No ratings yet
Kernal and Multiclass
51 pages
13-Recover the BST-25-01-2025
No ratings yet
13-Recover the BST-25-01-2025
13 pages
12-Max Sliding Window-23-01-2025
No ratings yet
12-Max Sliding Window-23-01-2025
14 pages
Polynomial Regression
No ratings yet
Polynomial Regression
16 pages
K_Nearest_Neighbour_Classifier
No ratings yet
K_Nearest_Neighbour_Classifier
24 pages
Geometric Series Activity Sheet (1) - 030803
No ratings yet
Geometric Series Activity Sheet (1) - 030803
1 page
Slide PPT Seminar Internasional
No ratings yet
Slide PPT Seminar Internasional
6 pages
Artcam Post Processor Configur
100% (1)
Artcam Post Processor Configur
64 pages
Chapter 1 Statistic
No ratings yet
Chapter 1 Statistic
36 pages
Physics Formulas 11th First Book
No ratings yet
Physics Formulas 11th First Book
16 pages
CS124S5
No ratings yet
CS124S5
36 pages
MEM 310 Design Project Assignment: Prepared by Bradley R. Schaffer Drexel University Philadelphia, PA 19104
No ratings yet
MEM 310 Design Project Assignment: Prepared by Bradley R. Schaffer Drexel University Philadelphia, PA 19104
33 pages
PracticalResearch2 Q1 W5 Formulation of A Conceptual Framework and Research Hypothesis
No ratings yet
PracticalResearch2 Q1 W5 Formulation of A Conceptual Framework and Research Hypothesis
19 pages
Chapter 3 PG Metrology
No ratings yet
Chapter 3 PG Metrology
3 pages
Grade 6 HCF and LCM in
100% (1)
Grade 6 HCF and LCM in
10 pages
Chapter - Five - Limited Dependent Variable Models
No ratings yet
Chapter - Five - Limited Dependent Variable Models
75 pages
Matematik Tingkatan 1 Kertas 1 SKEMA JAWAPAN
No ratings yet
Matematik Tingkatan 1 Kertas 1 SKEMA JAWAPAN
4 pages
Graphing Exponential Functions
No ratings yet
Graphing Exponential Functions
12 pages
6_Chapter 6_Production Forecast With Decline Curve Analysis Method
No ratings yet
6_Chapter 6_Production Forecast With Decline Curve Analysis Method
19 pages
SEHH2241 Lecture 3
No ratings yet
SEHH2241 Lecture 3
9 pages
Forest
No ratings yet
Forest
115 pages
Industrial Load Modeling
No ratings yet
Industrial Load Modeling
15 pages
Oracle
No ratings yet
Oracle
23 pages
Determinants
No ratings yet
Determinants
5 pages
Adding and Subtracting Decimals
No ratings yet
Adding and Subtracting Decimals
4 pages
Force & Law of Motion
No ratings yet
Force & Law of Motion
2 pages
Evans Pde Homework Solutions
100% (1)
Evans Pde Homework Solutions
4 pages
A Self-Tuning Fuzzy Logic Controller For Aircraft Roll Control System
No ratings yet
A Self-Tuning Fuzzy Logic Controller For Aircraft Roll Control System
8 pages
M. Ed. Semester - IV Examination
No ratings yet
M. Ed. Semester - IV Examination
14 pages
Ansys Report
No ratings yet
Ansys Report
41 pages
Lesson 64-Scientific Notation
No ratings yet
Lesson 64-Scientific Notation
4 pages
Ajms 487 23
No ratings yet
Ajms 487 23
5 pages
All Imp
No ratings yet
All Imp
48 pages

Feature selection

Uploaded by

Feature selection

Uploaded by

Overfitting and Underfitting

● Ridge Example: ● Lasso Example:

● Linear model optimizes the 𝜃0 and b to minimize the error.

Types for Regularized Linear Models

● Ridge regression = Combination of Linear regression and L2 norm i.e., |𝜃j|2

● M= No. of Samples, n – No of features, Lambda range is 0 to 1. 𝜃0 is not penalized.

● Recursive Feature Elimination-

You might also like