0% found this document useful (0 votes)
24 views19 pages

Regression

This document provides an introduction and overview of regression analysis techniques. It discusses using regression models to predict continuous target variables and explore relationships between variables. Key topics covered include simple and multiple linear regression, visualizing datasets, correlation analysis, evaluating model performance using measures like MSE and R2, addressing overfitting using regularization methods, and other regression algorithms like polynomial regression, decision tree regression, random forest regression and support vector regression.

Uploaded by

mgs181101
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views19 pages

Regression

This document provides an introduction and overview of regression analysis techniques. It discusses using regression models to predict continuous target variables and explore relationships between variables. Key topics covered include simple and multiple linear regression, visualizing datasets, correlation analysis, evaluating model performance using measures like MSE and R2, addressing overfitting using regularization methods, and other regression algorithms like polynomial regression, decision tree regression, random forest regression and support vector regression.

Uploaded by

mgs181101
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Regression Analysis

Introduction
• In this chapter, we will take a dive into another subcategory
of supervised learning: regression analysis.
• Regression models are used to predict target variables on a
continuous scale.
• Addressing many questions in science as well as
– applications in industry, such as understanding relationships
between variables,
– evaluating trends, or making forecasts.
– One example would be predicting the sales of a company in
future months.
Introduction
• In this chapter, we will discuss the main concepts
of regression models and cover
the following topics:
– Exploring and visualizing datasets
– Looking at different approaches to implement linear
regression models
– Training regression models that are robust to outlier
– Evaluating regression models and diagnosing common
problems
– Fitting regression models to nonlinear data
Simple Linear Regression
Multiple linear regression
Visualizing the important
characteristics of a dataset
• Exploratory Data Analysis (EDA) is an important
and recommended first step prior
to the training of a machine learning model.
• we will create a scatter plot matrix that allows us
to visualize the pair-wise correlations between the
different features in this dataset in one place.
• To plot the scatterplot matrix, we will use the
pairplot function from the seaborn library.
Correlation Analysis
• The correlation matrix is a square matrix that
contains the Pearson product-moment
correlation coefficients (often abbreviated as
Pearson's r), which measure the linear
dependence between pairs of features.
• The correlation coefficients are bounded
to the range -1 and 1.
• Two features have a
– perfect positive correlation if r =1,
– no correlation if r = 0 ,
– perfect negative correlation if r = -1
Qualitative Measure
• Another useful quantitative measure of a
model's performance is the so-called
Mean Squared Error (MSE)
• Sometimes it may be more useful to report
the coefficient of determination ( R2 ), which
can be understood as a standardized version
of the MSE, for better interpretability of
the model performance.
• In other words, R2 is the fraction of response
variance that is captured by the model.
• The R2 value is defined as follows:
For the training dataset,
R2 is bounded between 0 and 1,
but it can become negative for the test set.

If R2 =1 , the model fits the data perfectly


with a corresponding MSE = 0 .
Using regularized methods for
regression
• Regularization is one approach to tackle the
problem of overfitting by
adding additional information, and thereby
shrinking the parameter values of the
model to induce a penalty against complexity.
• The most popular approaches to
regularized linear regression are the so-called
– Ridge Regression,
– Least Absolute Shrinkage and Selection Operator
(LASSO)
– Elastic Net
Regularized Regression
Polynomial Regression
Decision Tree Regression
Other Regression Model
• Random Forest regression
• Support Vector regressor

You might also like