Regression: UNIT - V Regression Model
Regression: UNIT - V Regression Model
Contents:
➢ Introduction,
➢ types of regression.
➢ Simple regression- Types, Making predictions, Cost function, Gradient descent, Training,
Model evaluation.
➢ Multivariable regression : Growing complexity, Normalization, Making predictions,
Initialize weights, Cost function, Gradient descent, Simplifying with matrices, Bias term,
Model evaluation
Regression
The term regression is used when you try to find the relationship between
variables.
We can understand the concept of regression analysis using the below example:
which helps in finding the correlation between variables and enables us to predict the continuous
output variable based on the one or more predictor variables. It is mainly used for prediction,
forecasting, time series modeling, and determining the causal-effect relationship
between variables.
In Regression, we plot a graph between the variables which best fits the given
datapoints, using this plot, the machine learning model can make predictions about
the data. In simple words, "Regression shows a line or curve that passes through
all the datapoints on target-predictor graph in such a way that the vertical
distance between the datapoints and the regression line is minimum." The
distance between datapoints and line tells whether a model has captured a strong
relationship or not.
o Regression estimates the relationship between the target and the independent
variable.
o It is used to find the trends in data.
o It helps to predict real/continuous values.
o By performing the regression, we can confidently determine the most important
factor, the least important factor, and how each factor is affecting the other
factors.
Types of Regression
There are various types of regressions which are used in data science and machine
learning. Each type has its own importance on different scenarios, but at the core, all
the regression methods analyze the effect of the independent variable on dependent
variables. Here we are discussing some important types of regression which are given
below:
o Linear Regression
o Logistic Regression
o Polynomial Regression
o Support Vector Regression
o Decision Tree Regression
o Random Forest Regression
o Ridge Regression
o Lasso Regression:
Linear Regression:
o Linear regression is a statistical regression method which is used for predictive analysis.
o It is one of the very simple and easy algorithms which works on regression and shows
the relationship between the continuous variables.
o It is used for solving the regression problem in machine learning.
o Linear regression shows the linear relationship between the independent variable (X-
axis) and the dependent variable (Y-axis), hence called linear regression.
o If there is only one input variable (x), then such linear regression is called simple linear
regression. And if there is more than one input variable, then such linear regression is
called multiple linear regression.
o The relationship between variables in the linear regression model can be explained
using the below image. Here we are predicting the salary of an employee on the basis
of the year of experience.
Logistic Regression:
o Logistic regression is another supervised learning algorithm which is used to solve the
classification problems. In classification problems, we have dependent variables in a
binary or discrete format such as 0 or 1.
o Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes or
No, True or False, Spam or not spam, etc.
o It is a predictive analysis algorithm which works on the concept of probability.
o Logistic regression is a type of regression, but it is different from the linear regression
algorithm in the term how they are used.
o Logistic regression uses sigmoid function or logistic function which is a complex cost
function. This sigmoid function is used to model the data in logistic regression. The
function can be represented as:
When we provide the input values (data) to the function, it gives the S-curve as follows:
o It uses the concept of threshold levels, values above the threshold level are rounded
up to 1, and values below the threshold level are rounded up to 0.
o Binary(0/1, pass/fail)
o Multi(cats, dogs, lions)
o Ordinal(low, medium, high)
Polynomial Regression:
o Polynomial Regression is a type of regression which models the non-linear
dataset using a linear model.
o It is similar to multiple linear regression, but it fits a non-linear curve between the value
of x and corresponding conditional values of y.
o Suppose there is a dataset which consists of datapoints which are present in a non-
linear fashion, so for such case, linear regression will not best fit to those datapoints.
To cover such datapoints, we need Polynomial regression.
o In Polynomial regression, the original features are transformed into polynomial
features of given degree and then modeled using a linear model. Which means
the datapoints are best fitted using a polynomial line.
o The equation for polynomial regression also derived from linear regression equation
that means Linear regression equation Y= b0+ b1x, is transformed into Polynomial
regression equation Y= b0+b1x+ b2x2+ b3x3+.....+ bnxn.
o Here Y is the predicted/target output, b0, b1,... bn are the regression coefficients. x
is our independent/input variable.
o The model is still linear as the coefficients are still linear with quadratic
Note: This is different from Multiple Linear regression in such a way that in
Polynomial regression, a single element has different degrees instead of multiple
variables with the same degree.
Support Vector Regression:
Support Vector Machine is a supervised learning algorithm which can be used for
regression as well as classification problems. So if we use it for regression problems,
then it is termed as Support Vector Regression.
Here, the blue line is called hyperplane, and the other two lines are known as boundary
lines.
o Random forest is one of the most powerful supervised learning algorithms which is
capable of performing regression as well as classification tasks.
o The Random Forest regression is an ensemble learning method which combines
multiple decision trees and predicts the final output based on the average of each tree
output. The combined decision trees are called as base models, and it can be
represented more formally as:
o A general linear or polynomial regression will fail if there is high collinearity between
the independent variables, so to solve such problems, Ridge regression can be used.
o Ridge regression is a regularization technique, which is used to reduce the complexity
of the model. It is also called as L2 regularization.
o It helps to solve the problems if we have more parameters than samples.
Lasso Regression:
o Lasso regression is another regularization technique to reduce the complexity of the
model.
o It is similar to the Ridge Regression except that penalty term contains only the absolute
weights instead of a square of weights.
o Since it takes absolute values, hence, it can shrink the slope to 0, whereas Ridge
Regression can only shrink it near to 0.
o It is also called as L1 regularization. The equation for Lasso regression will be:
Linear Regression-
In Machine Learning,
• Linear Regression is a supervised machine learning algorithm.
• It tries to find out the best linear relationship that describes the data you have.
• It assumes that there exists a linear relationship between a dependent variable
and independent variable(s).
• The value of the dependent variable of a linear regression model is a continuous
value i.e. real numbers.
Based on the number of independent variables, there are two types of linear regression-
1. Simple Linear Regression
2. Multiple Linear Regression
Here,
• Y is a dependent variable.
• X is an independent variable.
• β0 and β1 are the regression coefficients.
• β0 is the intercept or the bias that fixes the offset to a line.
• β1 is the slope or weight that specifies the factor by which X has an impact on Y.
Case-01: β1 < 0
Case-03: β1 > 0
In multiple linear regression, the dependent variable depends on more than one
independent variables.
Here,
• Y is a dependent variable.
• X1, X2, …., Xn are independent variables.
• β0, β1,…, βn are the regression coefficients.
• βj (1<=j<=n) is the slope or weight that specifies the factor by which X j has an
impact on Y.
First of all, we need to have some data set to design the model.
x y
1 3
2 4
3 2
4 4
5 5
Based on the above matters, the graph that most closely fits is as below
x-x(m) is the distance of all the points x through the line y=3.
y-y(m) is the distance of all the points y through the line x=3.6.
For x=1,y=0.4*1+2.4=2.8
For x=2,y=0.4*2+2.4=3.2
For x=3,y=0.4*3+2.4=3.6
For x=4,y=0.4*4+2.4=4.0
For x=5,y=0.4*5+2.4=4.4
To test how good our model is performing, we have a method called the R
Square method
R square method
To check our model’s good, we need to compare the distance between the
actual value and mean versus the distance between the predicted value and
mean; here comes the R formula.
R2=∑(yp-y(m))2/∑(y-y(m))2
If the value of R2 is far away from 1, then the model is least effective
If the value of R is 1, then the actual data points would be on the regression
line.
Conclusion
We have covered all the topics related to Linear Regression. And we also
found the effectiveness of the model using the R square method. For
example, R-value might come close to 1 if the data is regarding a company’s
sales. R-value might be too low if the information is from a doctor in
psychology since different persons have different characters. So the
conclusion is if the R-value is closer to one, the more accurate is the
predicted value.