0% found this document useful (0 votes)
123 views

Regression: Unit Iii

Uploaded by

Janhvi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
123 views

Regression: Unit Iii

Uploaded by

Janhvi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 54

UNIT III

REGRESSION

CO-3 :Compare different types of classification models

and their relevant application


CONTENTS:
 Regression: Introduction, Univariate Regression – Least-
Square Method, Model Representation, Cost Functions: MSE,
MAE, R-Square, Performance Evaluation, Optimization of
Simple Linear Regression with Gradient Descent - Example.
Estimating the values of the regression coefficients
 Multivariate Regression: Model Representation

 Introduction to Polynomial Regression: Generalization-

Overfitting Vs. Underfitting, Bias Vs. Variance.


INTRODUCTION
 Regression is a supervised learning technique
 It falls under supervised learning wherein the algorithm is trained with
both input features and output labels
 Linear regression establishes the linear relationship between two variables based on a
line of best fit. Linear regression is thus graphically depicted using a straight line with the
slope defining how the change in one variable impacts a change in the other. The y-intercept
of a linear regression relationship represents the value of one variable when the value of the
other is zero.
 "Regression shows a line or curve that passes through all the datapoints on target-
predictor graph in such a way that the vertical distance between the datapoints and the
regression line is minimum." The distance between datapoints and line tells whether a model
has captured a strong relationship or not.

 Some examples of regression can be as:


 Prediction of rain using temperature and other factors
 Determining Market trends
 Prediction of road accidents due to rash driving.
LINEAR MODEL:
 Linear regression is a linear approach for modelling the relationship
between a scalar response and one or more explanatory variables (also
known as dependent and independent variables).
 The case of one explanatory variable is called
simple linear regression; for more than one, the process is
called multiple linear regression.
 Regression models a target prediction value based on
independent variables.
 It is mostly used for finding out the relationship between
variables and forecasting.
 Different regression models differ based on – the kind of
relationship between dependent and independent variables
they are considering, and the number of independent
variables getting used.
LINEAR MODEL:
 In simple linear regression analysis, each observations has two variables.
 Multiple linear regression analysis consist of two or more independent
variables.
 The case of one explanatory variable is called
simple linear regression; for more than one, the process is
called multiple linear regression.
 Regression models a target prediction value based on
independent variables.
 It is mostly used for finding out the relationship between
variables and forecasting.
 Different regression models differ based on – the kind of
relationship between dependent and independent variables
they are considering, and the number of independent
variables getting used.
LINEAR MODEL:

x: input training data (univariate – one input


variable(parameter))
y: labels to data (supervised learning)
θ1: intercept(Constant)
θ2: coefficient of x
LINEAR MODEL:
 Positive relationship

 Negative relationship
LINEAR MODEL:
 Linear regression is a linear approach for modelling the relationship
between a scalar response and one or more explanatory variables (also
known as dependent and independent variables).
TERMINOLOGIES RELATED TO THE REGRESSION
ANALYSIS:
 Dependent Variable: The main factor in Regression analysis which we want to
predict or understand is called the dependent variable. It is also called target
variable.

 Independent Variable: The factors which affect the dependent variables or which are
used to predict the values of the dependent variables are called independent variable,
also called as a predictor.

 Outliers: Outlier is an observation which contains either very low value or very high
value in comparison to other observed values. An outlier may hamper the result, so it
should be avoided.
 Outliers are defined as abnormal values in a dataset that don't go with the regular
distribution and have the potential to significantly distort any regression model.

 Multicollinearity: If the independent variables are highly correlated with each other
than other variables, then such condition is called Multicollinearity. It should not be
present in the dataset, because it creates problem while ranking the most affecting
variable.
WHY DO WE USE REGRESSION ANALYSIS?
 Regression estimates the relationship between the target and the independent
variable.
 It is used to find the trends in data.
 It helps to predict real/continuous values.
 By performing the regression, we can confidently determine the most important
factor, the least important factor, and how each factor is affecting the other
factors.

 Types of Regression
 Linear Regression
 Logistic Regression
 Polynomial Regression
 Support Vector Regression
 Decision Tree Regression
 Random Forest Regression
 Ridge Regression
 Lasso Regression:
LINEAR REGRESSION

 Linear regression is one of the easiest and most popular


Machine Learning algorithms.
 It is a statistical method that is used for predictive analysis.

Linear regression makes predictions for continuous/real or


numeric variables such as sales, salary, age, product price, etc.
 Linear regression algorithm shows a linear relationship between

a dependent (y) and one or more independent (x) variables,


hence called as linear regression.
 Since linear regression shows the linear relationship, which

means it finds how the value of the dependent variable is


changing according to the value of the independent variable.
LINEAR REGRESSION
LINEAR REGRESSION
 Mathematically, we can represent a linear regression as:
 y= a +a x+ ε
0 1
 Y= Dependent Variable (Target Variable)
X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree of
freedom)
a1 = Linear regression coefficient (scale factor to each input
value).
ε = random error
 The values for x and y variables are training datasets for Linear

Regression model representation.


TYPES OF LINEAR REGRESSION

 Linear regression can be further divided into two types of the


algorithm:
 Simple Linear Regression:
If a single independent variable is used to predict the value of a
numerical dependent variable, then such a Linear Regression
algorithm is called Simple Linear Regression.
 Multiple Linear regression:
If more than one independent variable is used to predict the
value of a numerical dependent variable, then such a Linear
Regression algorithm is called Multiple Linear Regression.
FINDING THE BEST FIT LINE

 Finding the best fit line:


 When working with linear regression, our main goal is to find the best
fit line that means the error between predicted values and actual
values should be minimized. The best fit line will have the least
error.
 The different values for weights or the coefficient of lines (a 0, a1) gives
a different line of regression, so we need to calculate the best values
for a0 and a1 to find the best fit line, so to calculate this we use cost
function.
FINDING THE BEST FIT LINE
UNIVARATE REGRESSION- LEAST SQUARE METHOD:

 Univariate linear regression focuses on determining relationship between one


independent (explanatory variable) variable and one dependent variable.
 Univariate data is the type of data in which the result depends only on one
variable.
UNIVARATE REGRESSION:

 It is also called simple linear regression.

Features of best fit regression line


 Regression line results in minimum sum of errors.
 It does not need to go through all data points.
 It does not need same number of data points above and below.
MODEL REPRESENTATION:
COST FUNCTIONS:
 It is a mechanism utilized in supervised machine learning .
 The cost function returns the error between predicted outcomes compared
with the actual outcomes.
 In other words, it estimates the cost of production.
 Cost function is a measure of how wrong the model is in terms of its
ability to estimate the relationship between x and y.
 Loss function: Used when we refer to the error for a single training
example.
 Cost function: Used to refer to an average of the loss functions over an
entire training dataset.
WHY ON EARTH DO WE NEED A COST FUNCTION? :
 Why on earth do we need a cost function? Consider a scenario where we
wish to classify data. Suppose we have the height & weight details of
some cats & dogs.
WHY ON EARTH DO WE NEED A COST FUNCTION? :
 Blue dots are cats & red dots are dogs. Following are some solutions
to the above classification problem.
 Essentially all three classifiers have very high accuracy but the third
solution is the best because it does not misclassify any point. The
reason why it classifies all the points perfectly is that the line is
almost exactly in between the two groups, and not closer to any one
of the groups. This is where the concept of cost function comes in.
Cost function helps us reach the optimal solution. The cost
function is the technique of evaluating “the performance of our
algorithm/model”.
REGRESSION COST FUNCTION :
 Regression models deal with predicting a continuous value
for example salary of an employee, price of a car, loan
prediction, etc. A cost function used in the regression
problem is called “Regression Cost Function”. They are
calculated on the distance-based error as follows:
 Error = y-y’
 Where,
 Y – Actual Input
 Y’ – Predicted output

The most used Regression cost functions are below:


 Mean Squared Error (MSE)

 Mean Absolute Error (MAE)

 Root Mean Squared Error (RMSE)

 R- Sqaured(R^2) Error
MEAN SQUARED ERROR (MSE) :
 This improves the drawback we encountered in Mean Error above. Here a
square of the difference between the actual and predicted value is calculated
to avoid any possibility of negative error.
 It is measured as the average of the sum of squared differences between
predictions and actual observations.

 MSE = (sum of squared errors)/n


 It is also known as L2 loss.
 In MSE, since each error is squared, it helps to penalize even small
deviations in prediction when compared to MAE. But if our dataset has
outliers that contribute to larger prediction errors, then squaring this error
further will magnify the error many times more and also lead to higher MSE
error.
 Hence we can say that it is less robust to outliers.
MEAN ABSOLUTE ERROR (MAE):
 This cost function also addresses the shortcoming of mean
error differently. Here an absolute difference between the
actual and predicted value is calculated to avoid any
possibility of negative error.
 So in this cost function, MAE is measured as the average of
the sum of absolute differences between predictions and
actual observations.

 It is also known as L1 Loss.


 It is robust to outliers thus it will give better results even
when our dataset has noise or outliers.
Regression Metrices
EXAMPLE:
Suppose we have a regression model that predicts
house prices based on certain features. We collected
data on actual house price and the corresponding
predicted prices for a set of 5 houses:
Actual Prices(y): [200,300,400,500,600]
Predicted Prices(y^):[220,320,420,520,590]

Calculate MAE,MSE,RMSE?
MAE

Calculate the absolute errors:


∣200−220∣=20
∣300−320∣=20
∣400−420∣=20
∣500−520∣=20
∣600−590∣=10

Sum of absolute errors:


20+20+20+20+10=90
Number of observations (n) is 5. Thus:
MAE=90/5 ​=18
2. MEAN SQUARED ERROR (MSE):
 Calculate the squared errors:
These metrices provide insights into the model’s performance in terms
of prediction accuracy and error magnitude.

Lower Values of MAE and RMSE indicated better model


Performance in minimizing prediction error relative to the
actual value.
GRADIENT DESCENT:
 Gradient Descent:
 Gradient descent is used to minimize the MSE by

calculating the gradient of the cost function.


 A regression model uses gradient descent to update

the coefficients of the line by reducing the cost


function.
 It is done by a random selection of values of coefficient

and then iteratively update the values to reach the


minimum cost function.

 Model Performance:
 The Goodness of fit determines how the line of

regression fits the set of observations. The process of


finding the best model out of various models is
called optimization. It can be achieved by below
method:
MEAN SQUARED ERROR (MSE)
 Mean squared error (MSE) is the average of sum of
squared difference between actual value and the predicted
or estimated value. It is also termed as mean squared
deviation (MSD). This is how it is represented
mathematically:

 The value of MSE is always positive or greater than zero. A


value close to zero will represent better quality of the
estimator / predictor (regression model). An MSE of zero
(0) represents the fact that the predictor is a
perfect predictor. When you take a square root of
MSE value, it becomes root mean squared error
(RMSE). In the above equation, Y represents the actual
value and the Y’ is predicted value. Here is the
diagrammatic representation of MSE:
MEAN SQUARED ERROR (MSE)
MEAN ABSOLUTE ERROR (MAE)
 MAE is a very simple metric which calculates the
absolute difference between actual and predicted
values.
 To better understand, let’s take an example you

have input data and output data and use Linear


Regression, which draws a best-fit line.
 Now you have to find the MAE of your model which

is basically a mistake made by the model known


as an error. Now find the difference between the
actual value and predicted value that is an
absolute error but we have to find the mean
absolute of the complete dataset.
 so, sum all the errors and divide them by a

total number of observations and this is


MAE. And we aim to get a minimum MAE because
MEAN ABSOLUTE ERROR (MAE)
R-SQUARED METHOD:
 R-squared is a statistical method that determines the
goodness of fit.
 It measures the strength of the relationship between the
dependent and independent variables on a scale of 0-100%.
 0% indicates that the model explains none of the variability
of the response data around its mean.
 100% indicates that the model explains all the variability of
the response data around its mean.
 The high value of R-square determines the less difference
between the predicted values and actual values and hence
represents a good model.
 It is also called a coefficient of
determination, or coefficient of multiple
determination for multiple regression.
 It can be calculated from the below formula:
LEAST-SQUARE METHOD
 The least-squares method is a form of mathematical
regression analysis used to determine the
line of best fit for a set of data, providing a visual
demonstration of the relationship between the data
points. Each point of data represents the
relationship between a known independent variable
and an unknown dependent variable.

 Here x̅ is the mean of all the values in the


input X and ȳ is the mean of all the values in the
desired output Y. This is the Least Squares method..
SIMPLE LINEAR REGRESSION
 Simple Linear Regression is a type of Regression algorithms
that models the relationship between a dependent variable
and a single independent variable. The relationship shown
by a Simple Linear Regression model is linear or a sloped
straight line, hence it is called Simple Linear Regression.
 The key point in Simple Linear Regression is that
the dependent variable must be a continuous/real
value. However, the independent variable can be
measured on continuous or categorical values.
 Simple Linear regression algorithm has mainly two
objectives:
 Model the relationship between the two
variables. Such as the relationship between Income and
expenditure, experience and Salary, etc.
 Forecasting new observations. Such as Weather
forecasting according to temperature, Revenue of a
company according to the investments in a year, etc.
SIMPLE LINEAR REGRESSION MODEL:
 The Simple Linear Regression model can be represented
using the below equation:
 a0 + a1.x + e
 Where,
 a0= It is the intercept of the Regression line (can be
obtained putting x=0)
a1= It is the slope of the regression line, which tells
whether the line is increasing or decreasing.
ε = The error term. (For a good model it will be
negligible)
MULTIVARIATE REGRESSION
 Multivariate Regression is a supervised machine learning algorithm
involving multiple data variables for analysis. A Multivariate regression
is an extension of multiple regression with one dependent variable and
multiple independent variables. Based on the number of independent
variables, we try to predict the output.
 Multivariate regression tries to find out a formula that can explain how
factors in variables respond simultaneously to changes in others.

 The equation for a model with two input variables can be written as:
 y = β0 + β1.x1 + β2.x2

 The equation for a model with three input variables can be written as:
 y = β0 + β1.x1 + β2.x2 + β3.x3

 Below is the generalized equation for the multivariate regression


model-
 y = β0 + β1.x1 + β2.x2 +….. + βn.xn
 Where n represents the number of independent variables, β0~ βn
represents the coefficients and x1~xn, is the independent variable.
COST FUNCTION
 In simple words it is a function that assigns a cost to instances where the
model deviates from the observed data. In this case, our cost is the sum of
squared errors. The cost function for multiple linear regression is given by:

 We can understand this equation as the summation of square of difference


between our predicted value and the actual value divided by twice of length
of data set. A smaller mean squared error implies a better performance.
Generally a cost function is used along with the Gradient Descent algorithm
to find the best parameters.
 Cost functions are used to estimate how badly models are performing. Put
simply, a cost function is a measure of how wrong the model is in terms of its
ability to estimate the relationship between X and y. This is typically
expressed as a difference or distance between the predicted value and the
actual value.
POLYNOMIAL REGRESSION
 Polynomial Regression is a regression algorithm that models the
relationship between a dependent(y) and independent variable(x) as nth
degree polynomial. The Polynomial Regression equation is given below:
y= b0+b1x1+ b2x12+ b2x13+...... bnx1n
 It is also called the special case of Multiple Linear Regression in ML.
Because we add some polynomial terms to the Multiple Linear
regression equation to convert it into Polynomial Regression.
 It is a linear model with some modification in order to increase the
accuracy.
 The dataset used in Polynomial regression for training is of non-linear
nature.
 It makes use of a linear regression model to fit the complicated and non-
linear functions and datasets.
 Hence, "In Polynomial regression, the original features are converted
into Polynomial features of required degree (2,3,..,n) and then modeled
using a linear model."
NEED FOR POLYNOMIAL REGRESSION:
 The need of Polynomial Regression in ML can be
understood in the below points:
 If we apply a linear model on a linear dataset,

then it provides us a good result as we have seen


in Simple Linear Regression, but if we apply the
same model without any modification on a non-
linear dataset, then it will produce a drastic
output. Due to which loss function will increase,
the error rate will be high, and accuracy will be
decreased.
 So for such cases, where data points are

arranged in a non-linear fashion, we need


the Polynomial Regression model. We can
understand it in a better way using the below
comparison diagram of the linear dataset and non-
NEED FOR POLYNOMIAL REGRESSION:
 In the image below, we have taken a dataset which is arranged
non-linearly. So if we try to cover it with a linear model, then we
can clearly see that it hardly covers any data point. On the other
hand, a curve is suitable to cover most of the data points, which
is of the Polynomial model.
 Hence, if the datasets are arranged in a non-linear fashion, then

we should use the Polynomial Regression model instead of


Simple Linear Regression.
NEED FOR POLYNOMIAL REGRESSION:

 When we compare the above three equations, we


can clearly see that all three equations are
Polynomial equations but differ by the degree of
variables. The Simple and Multiple Linear equations
are also Polynomial equations with a single degree,
and the Polynomial regression equation is Linear
equation with the nth degree. So if we add a
degree to our linear equations, then it will be
converted into Polynomial Linear equations.
GENERALIZATION
 The main goal of each machine learning model
is to generalize well.
 Here generalization defines the ability of an ML

model to provide a suitable output by adapting the


given set of unknown input.
 It means after providing training on the dataset, it

can produce reliable and accurate output.


 Hence, the underfitting and overfitting are the two

terms that need to be checked for the


performance of the model and whether the model
is generalizing well or not.
BIAS AND VARIANCE
 Bias: Bias is a prediction error that is introduced in
the model due to oversimplifying the machine
learning algorithms. Or it is the difference between
the predicted values and the actual values.

 Variance: If the machine learning model performs


well with the training dataset, but does not
perform well with the test dataset, then variance
occurs.
BIAS VS. VARIANCE
BIAS-VARIANCE TRADEOFF
 The two are complementary to each other. In other
words, if the bias of a model is decreased, the
variance of the model automatically increases. The
vice-versa is also true, that is if the variance of a
model decreases, bias starts to increase.
 Hence, it can be concluded that it is nearly

impossible to have a model with no bias or no


variance since decreasing one increases the other.
This phenomenon is known as the Bias-Variance
Trade.
BIAS-VARIANCE TRADEOFF
 Another way of looking at the Bias-Variance Tradeoff graphically is to
plot the graphical representation for error, bias, and variance versus
the complexity of the model. In the graph shown below, the green
dotted line represents variance, the blue dotted line represents bias
and the red solid line represents the error in the prediction of the
concerned model.
 Since bias is high for a simpler model and decreases with an increase
in model complexity, the line representing bias exponentially
decreases as the model complexity increases.
 Similarly, Variance is high for a more complex model and is low for
simpler models. Hence, the line representing variance increases
exponentially as the model complexity increases.
 Finally, it can be seen that on either side, the generalization
error is quite high. Both high bias and high variance lead to a
higher error rate.
 The most optimal complexity of the model is right in the middle,
where the bias and variance intersect. This part of the graph is shown
to produce the least error and is preferred.
 Also, as discussed earlier, the model underfits for high-bias
situations and overfits for high-variance situations.
OVERFITTING
 Overfitting occurs when our machine learning
 model tries to cover all the data points or more than the
required data points present in the given dataset. Because of
this, the model starts caching noise and inaccurate values
present in the dataset, and all these factors reduce the
efficiency and accuracy of the model. The overfitted model
has low bias and high variance.
 The chances of occurrence of overfitting increase as much
we provide training to our model. It means the more we train
our model, the more chances of occurring the overfitted
model.
 Overfitting is the main problem that occurs in
supervised learning
UNDERFITTING
 Underfitting occurs when our machine learning
model is not able to capture the underlying trend
of the data. To avoid the overfitting in the model,
the fed of training data can be stopped at an early
stage, due to which the model may not learn
enough from the training data. As a result, it may
fail to find the best fit of the dominant trend in the
data.
 In the case of underfitting, the model is not able to

learn enough from the training data, and hence it


reduces the accuracy and produces unreliable
predictions.
 An underfitted model has high bias and low

variance.
ASSIGNMENT NO:03

You might also like