0% found this document useful (0 votes)

22 views

Regularization - Ridge and Lasso

Uploaded by

natthaweeilac

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

Regularization - Ridge and Lasso

Uploaded by

natthaweeilac

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Regularization – Ridge and Lasso

Regularization is a crucial technique in machine learning, especially when dealing with linear
models. It helps to prevent overfitting by adding a penalty to the model's complexity. Here’s a
detailed explanation of when to use regularization:

### When to Use Regularization

1. High Variance (Overfitting):

- **Symptoms**: Your model performs exceptionally well on the training data but poorly on
the validation or test data.
- **Reason**: The model has learned not only the underlying patterns but also the noise in the
training data.
- **Solution**: Regularization techniques such as Ridge (L2) or Lasso (L1) regression can
help by penalizing large coefficients, thus simplifying the model and improving generalization.

2. **High Dimensionality**:
- **Symptoms**: You have a large number of features compared to the number of
observations.
- **Reason**: High-dimensional datasets can lead to overfitting because the model has too
many parameters relative to the amount of data.
- **Solution**: Regularization can help by shrinking the coefficients of less important
features, making the model more robust.

3. **Multicollinearity**:
- **Symptoms**: Some of your features are highly correlated with each other.
- **Reason**: Multicollinearity can cause instability in the model coefficients, leading to
overfitting and poor generalization.
- **Solution**: Ridge regression, in particular, is effective at handling multicollinearity by
imposing a penalty on the coefficients.

4. **Feature Selection**:
- **Symptoms**: You suspect that some features in your dataset are irrelevant or redundant.
- **Reason**: Including irrelevant features can degrade the performance of the model.
- **Solution**: Lasso regression is useful for feature selection as it can shrink some
coefficients to exactly zero, effectively removing those features from the model.

5. **Model Complexity**:
- **Symptoms**: Your model is too complex relative to the simplicity of the data.
- **Reason**: Complex models can capture intricate patterns but also the noise, leading to
overfitting.
- **Solution**: Regularization helps by adding a constraint on the magnitude of the
coefficients, thus simplifying the model.

### Practical Examples

1. **Example of Overfitting**:
- Suppose you have a dataset where a simple linear regression model fits the training data
perfectly but performs poorly on new data. Adding regularization (Ridge or Lasso) can help
reduce the model’s complexity, leading to better performance on unseen data.

2. Example of High Dimensionality:

- In genomic data analysis, where the number of features (genes) is much larger than the
number of samples (patients), regularization is essential to build a robust model that generalizes
well.

3. **Example of Multicollinearity**:
- In econometrics, variables such as GDP, income, and spending can be highly correlated.
Using Ridge regression can help mitigate the effects of multicollinearity, leading to more stable
coefficient estimates.

### Choosing Between Ridge and Lasso

- **Ridge Regression**:
- Use when you believe all features are potentially useful but need to control for
multicollinearity and overfitting.
- Example: Ridge is often preferred in cases where the number of features is similar to or less
than the number of observations.

- **Lasso Regression**:
- Use when you suspect that some features are irrelevant and should be removed from the
model.
- Example: Lasso is effective in high-dimensional datasets for feature selection, reducing the
number of features.

### Implementation in Python

Here’s how you can implement regularization in Python using Scikit-Learn:

**Ridge Regression**:
```python
from sklearn.linear_model import Ridge
ridge_reg = Ridge(alpha=1.0)
ridge_reg.fit(X_train, y_train)
y_pred = ridge_reg.predict(X_test)
```

**Lasso Regression**:
```python
from sklearn.linear_model import Lasso
lasso_reg = Lasso(alpha=0.1)
lasso_reg.fit(X_train, y_train)
y_pred = lasso_reg.predict(X_test)
```

### References
1. **Scikit-Learn Documentation**:
- [Ridge Regression](https://scikit-learn.org/stable/modules/linear_model.html#ridge-
regression)
- [Lasso Regression](https://scikit-learn.org/stable/modules/linear_model.html#lasso)
2. **Machine Learning Mastery - Regularization Techniques**: [Machine Learning Mastery]
(https://machinelearningmastery.com/regularization-to-reduce-overfitting/)
3. **Towards Data Science - Regularization in Machine Learning**: [Towards Data Science]
(https://towardsdatascience.com/regularization-in-machine-learning-76441ddcf99a)

Detecting overfitting in machine learning models is a crucial part of the model evaluation
process. Overfitting occurs when a model learns the training data too well, including the noise
and outliers, which negatively impacts its performance on new, unseen data. Here are
professional methods and techniques to detect overfitting:

### Methods to Detect Overfitting

1. **Train-Test Split**:
- **Procedure**: Split the dataset into a training set and a test set. Train the model on the
training set and evaluate its performance on both the training and test sets.
- **Indicator**: If the model performs significantly better on the training set than on the test
set, it is likely overfitting.
- **Example**:
```python
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Split the data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the model

model = LinearRegression()
model.fit(X_train, y_train)
# Evaluate the model
train_error = mean_squared_error(y_train, model.predict(X_train))
test_error = mean_squared_error(y_test, model.predict(X_test))

print(f'Training Error: {train_error}')

print(f'Test Error: {test_error}')
```

2. **Cross-Validation**:
- **Procedure**: Use k-fold cross-validation to train and evaluate the model. This involves
splitting the data into k subsets, training the model on k-1 subsets, and validating it on the
remaining subset. This process is repeated k times.
- **Indicator**: Overfitting is indicated by a low training error and a high cross-validation
error.
- **Example**:
```python
from sklearn.model_selection import cross_val_score

# Cross-validation
scores = cross_val_score(model, X, y, cv=5, scoring='neg_mean_squared_error')
print(f'Cross-Validation Scores: {scores}')
print(f'Mean Cross-Validation Score: {scores.mean()}')
```

3. **Learning Curves**:
- **Procedure**: Plot learning curves, which show the model’s performance on the training
and validation sets as a function of the number of training samples.
- **Indicator**: If the training error is much lower than the validation error, the model is likely
overfitting.
- **Example**:
```python
from sklearn.model_selection import learning_curve
import matplotlib.pyplot as plt

train_sizes, train_scores, val_scores = learning_curve(model, X, y, cv=5,

scoring='neg_mean_squared_error')
train_errors = -train_scores.mean(axis=1)
val_errors = -val_scores.mean(axis=1)

plt.plot(train_sizes, train_errors, label='Training Error')

plt.plot(train_sizes, val_errors, label='Validation Error')
plt.xlabel('Training Set Size')
plt.ylabel('Error')
plt.legend()
plt.show()
```

4. Validation on an Unseen Dataset:

- **Procedure**: After training the model on the training set, validate its performance on a
completely separate validation set that was not used during training or hyperparameter tuning.
- **Indicator**: Poor performance on the validation set compared to the training set indicates
overfitting.
- **Example**:
```python
# Split the data into training, validation, and test sets
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.4, random_state=42)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5,
random_state=42)

# Train the model

model.fit(X_train, y_train)
# Evaluate the model
val_error = mean_squared_error(y_val, model.predict(X_val))
test_error = mean_squared_error(y_test, model.predict(X_test))

print(f'Validation Error: {val_error}')

print(f'Test Error: {test_error}')
```

Machine Learning With Ridge and Lasso Regression
No ratings yet
Machine Learning With Ridge and Lasso Regression
19 pages
Cli Ist of Basic Juniper's Commands
No ratings yet
Cli Ist of Basic Juniper's Commands
5 pages
Software Companies - Surat
0% (1)
Software Companies - Surat
5 pages
B Ridge - and - Lasso - Regression
No ratings yet
B Ridge - and - Lasso - Regression
5 pages
ML Interview Questions
No ratings yet
ML Interview Questions
10 pages
Supervised Regression Notes
No ratings yet
Supervised Regression Notes
11 pages
Unit No. 4
No ratings yet
Unit No. 4
4 pages
PA Notes 2
No ratings yet
PA Notes 2
23 pages
CSL0777 L17
No ratings yet
CSL0777 L17
27 pages
Regularization
No ratings yet
Regularization
4 pages
Slide 1
No ratings yet
Slide 1
4 pages
LAB5_Regularization
No ratings yet
LAB5_Regularization
6 pages
Types of Machine Learning
No ratings yet
Types of Machine Learning
63 pages
Machine Learning
No ratings yet
Machine Learning
10 pages
Machine Learning Notes ?
No ratings yet
Machine Learning Notes ?
64 pages
message (2)
No ratings yet
message (2)
5 pages
1729585037_ML11_Generalization
No ratings yet
1729585037_ML11_Generalization
40 pages
Lab Manual 05
No ratings yet
Lab Manual 05
13 pages
9_Linear Regression-Problems and Solutions
No ratings yet
9_Linear Regression-Problems and Solutions
23 pages
Mlda - Lab
No ratings yet
Mlda - Lab
35 pages
Machine Learning Doc-2
No ratings yet
Machine Learning Doc-2
8 pages
Data Mining Practicals
No ratings yet
Data Mining Practicals
22 pages
AI - W7L14
No ratings yet
AI - W7L14
22 pages
Machine Learning Model Building
No ratings yet
Machine Learning Model Building
6 pages
ML models and when to choose one over others
No ratings yet
ML models and when to choose one over others
7 pages
unit 2 (1)
No ratings yet
unit 2 (1)
23 pages
deeplg1
No ratings yet
deeplg1
5 pages
Wa0002.
No ratings yet
Wa0002.
5 pages
ML Unit 3
No ratings yet
ML Unit 3
2 pages
L11+ Regularization
No ratings yet
L11+ Regularization
25 pages
DA_Programs
No ratings yet
DA_Programs
44 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Nndl Notes
No ratings yet
Nndl Notes
73 pages
AIML_week7_week8_week9
No ratings yet
AIML_week7_week8_week9
6 pages
AI LAB
No ratings yet
AI LAB
19 pages
S-3
No ratings yet
S-3
5 pages
Dma
No ratings yet
Dma
4 pages
Machine learning
No ratings yet
Machine learning
19 pages
Chapter+3+ ++Regression+Algorithms
No ratings yet
Chapter+3+ ++Regression+Algorithms
22 pages
Classification Review
No ratings yet
Classification Review
8 pages
Machine Learning Lab (3) Report (21 CP 81)
No ratings yet
Machine Learning Lab (3) Report (21 CP 81)
7 pages
Ridge and Lasso Regression in Python
No ratings yet
Ridge and Lasso Regression in Python
18 pages
ML PYQs
No ratings yet
ML PYQs
32 pages
Ridge Lasso Regression Bias Variance Tradeoff 71
No ratings yet
Ridge Lasso Regression Bias Variance Tradeoff 71
19 pages
overfitting and underfitting
No ratings yet
overfitting and underfitting
8 pages
Choosing Model and Tuning
No ratings yet
Choosing Model and Tuning
20 pages
ML Solved Endsem
No ratings yet
ML Solved Endsem
16 pages
DIP Assignment Essajan
No ratings yet
DIP Assignment Essajan
2 pages
ML Codes
No ratings yet
ML Codes
9 pages
Overfitting and Solution Sovlve
No ratings yet
Overfitting and Solution Sovlve
3 pages
IML-Summary
No ratings yet
IML-Summary
12 pages
ML Lab....... 3-Converted New
No ratings yet
ML Lab....... 3-Converted New
27 pages
Machine Learning Lab Assignment: Instructions
No ratings yet
Machine Learning Lab Assignment: Instructions
4 pages
ML 3 (1)
No ratings yet
ML 3 (1)
50 pages
Regularization ML
No ratings yet
Regularization ML
28 pages
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
No ratings yet
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
43 pages
Bias Variance Ridge Regression
No ratings yet
Bias Variance Ridge Regression
4 pages
Lecture 09 ML
No ratings yet
Lecture 09 ML
26 pages
ML Labs
No ratings yet
ML Labs
46 pages
Unit 4
No ratings yet
Unit 4
62 pages
Handout5 Regularization
No ratings yet
Handout5 Regularization
20 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
wxTED manual
No ratings yet
wxTED manual
23 pages
Unit-4 HashFunction & DigitalSignature
No ratings yet
Unit-4 HashFunction & DigitalSignature
78 pages
Implement UX
No ratings yet
Implement UX
40 pages
Rift Valley Uiversity Operational Research: Final Exam (2014) Name: - ID - Part One: Multiple Choices
No ratings yet
Rift Valley Uiversity Operational Research: Final Exam (2014) Name: - ID - Part One: Multiple Choices
3 pages
Linux Firewall
No ratings yet
Linux Firewall
2 pages
Ina105 PDF
No ratings yet
Ina105 PDF
17 pages
Chemitek - Sonda Tlenu - S423.C.OPT - Rejestry Modbus PDF
No ratings yet
Chemitek - Sonda Tlenu - S423.C.OPT - Rejestry Modbus PDF
2 pages
Model Dx28 Dual NMEA 0183 Expander: User Manual
No ratings yet
Model Dx28 Dual NMEA 0183 Expander: User Manual
3 pages
(PDF) Muhyiddin Ibn Arabi Nin Evradı - Sırra Yolcu - Academia - Edu
No ratings yet
(PDF) Muhyiddin Ibn Arabi Nin Evradı - Sırra Yolcu - Academia - Edu
282 pages
7 Simulation Techniques
No ratings yet
7 Simulation Techniques
16 pages
Dropped Call Rate
No ratings yet
Dropped Call Rate
1 page
4 DC 2 F 16583 e 31196
No ratings yet
4 DC 2 F 16583 e 31196
20 pages
DSL-2640U B2 Manual
No ratings yet
DSL-2640U B2 Manual
84 pages
Adrian B. Biran Geometric and Engineering Drawing ENGLISH BOOK
No ratings yet
Adrian B. Biran Geometric and Engineering Drawing ENGLISH BOOK
159 pages
3 Q 01 Sabin
No ratings yet
3 Q 01 Sabin
27 pages
Cloth Store Management System Class 12th Final Project
83% (6)
Cloth Store Management System Class 12th Final Project
32 pages
PDEs HW#4 - Strauss
No ratings yet
PDEs HW#4 - Strauss
5 pages
Get Linux Administration: A Beginner's Guide 8th Edition Wale Soyinka PDF ebook with Full Chapters Now
100% (1)
Get Linux Administration: A Beginner's Guide 8th Edition Wale Soyinka PDF ebook with Full Chapters Now
65 pages
Lovejeet Ar Worksheet 10
No ratings yet
Lovejeet Ar Worksheet 10
2 pages
7 4kw
No ratings yet
7 4kw
30 pages
PLDT Osp Training
100% (1)
PLDT Osp Training
55 pages
C++ Tutorial Solution 2022 - 230128 - 072845
No ratings yet
C++ Tutorial Solution 2022 - 230128 - 072845
125 pages
Java Paper - Winter 2023
No ratings yet
Java Paper - Winter 2023
21 pages
C_S4CPB_2502 SAP Certified Associate - Implementation Consultant - SAP S4HANA Cloud Public Edition Free Dumps
No ratings yet
C_S4CPB_2502 SAP Certified Associate - Implementation Consultant - SAP S4HANA Cloud Public Edition Free Dumps
3 pages
Minerva Library: Library Management System CCP 1103 - Computer Programming 3
No ratings yet
Minerva Library: Library Management System CCP 1103 - Computer Programming 3
4 pages
Discussion Questions
100% (1)
Discussion Questions
12 pages
Lec-04 Image Enhancement II
No ratings yet
Lec-04 Image Enhancement II
41 pages
Active Directory
No ratings yet
Active Directory
9 pages

Regularization - Ridge and Lasso

Uploaded by

Regularization - Ridge and Lasso

Uploaded by

Regularization – Ridge and Lasso

### When to Use Regularization

1. **High Variance (Overfitting)**:

### Practical Examples

2. **Example of High Dimensionality**:

### Choosing Between Ridge and Lasso

### Implementation in Python

Here’s how you can implement regularization in Python using Scikit-Learn:

### Methods to Detect Overfitting

# Split the data

# Train the model

print(f'Training Error: {train_error}')

train_sizes, train_scores, val_scores = learning_curve(model, X, y, cv=5,

plt.plot(train_sizes, train_errors, label='Training Error')

4. **Validation on an Unseen Dataset**:

# Train the model

print(f'Validation Error: {val_error}')

You might also like

1. High Variance (Overfitting):

2. Example of High Dimensionality:

4. Validation on an Unseen Dataset: