Regression in M.L
Regression in M.L
What is Regression?
• Regression analysis is a statistical method to model the relationship
between a dependent (target) and independent (predictor) variables
with one or more independent variables. More specifically,
Regression analysis helps us to understand how the value of the
dependent variable is changing corresponding to an independent
variable when other independent variables are held fixed. It predicts
continuous/real values such as temperature, age, salary, price, etc.
What is Regression? (cont.)
• Suppose there is a marketing company A, who does
various advertisement every year and get sales on
that. The below list shows the advertisement made
by the company in the last 5 years and the
corresponding sales.
• Now, the company wants to do the advertisement of
$200 in the year 2019 and wants to know the
prediction about the sales for this year. So to solve
such type of prediction problems in machine
learning, we need regression analysis.
What is Regression? (cont.)
• Regression is a supervised learning technique a which helps in finding
the correlation between variables and enables us to predict the
continuous output variable based on the one or more predictor
variables. It is mainly used for prediction, forecasting, time series
modeling, and determining the causal-effect relationship between
variables.
• In Regression, we plot a graph between the variables which best fits
the given datapoints, using this plot, the machine learning model can
make predictions about the data. In simple words, "Regression shows
a line or curve that passes through all the datapoints on target-
predictor graph in such a way that the vertical distance between the
datapoints and the regression line is minimum." The distance
between datapoints and line tells whether a model has captured a
strong relationship or not.
Terminologies Related to the Regression Analysis:
• Dependent Variable: The main factor in Regression analysis which we want to
predict or understand is called the dependent variable. It is also called target
variable.
• Independent Variable: The factors which affect the dependent variables or which
are used to predict the values of the dependent variables are called independent
variable, also called as a predictor.
• Outliers: Outlier is an observation which contains either very low value or very
high value in comparison to other observed values. An outlier may hamper the
result, so it should be avoided.
• Multicollinearity: If the independent variables are highly correlated with each
other than other variables, then such condition is called Multicollinearity. It
should not be present in the dataset, because it creates problem while ranking
the most affecting variable.
• Underfitting and Overfitting: If our algorithm works well with the training
dataset but not well with test dataset, then such problem is called Overfitting.
And if our algorithm does not perform well even with training dataset, then such
problem is called underfitting.
Types of Regression
Linear Regression:
• Linear regression is a statistical regression method which is used for
predictive analysis.
• It is one of the very simple and easy algorithms which works on regression
and shows the relationship between the continuous variables.
• It is used for solving the regression problem in machine learning.
• Linear regression shows the linear relationship between the independent
variable (X-axis) and the dependent variable (Y-axis), hence called linear
regression.
• If there is only one input variable (x), then such linear regression is
called simple linear regression. And if there is more than one input
variable, then such linear regression is called multiple linear regression.
• The relationship between variables in the linear regression model can be
explained using the below image. Here we are predicting the salary of an
employee on the basis of the year of experience.
Linear Regression: (cont.)
• Y= aX+b
• Here, Y = dependent variables (target variables)
• X= Independent variables (predictor variables)
• a and b are the linear coefficients
• In linear regression, coefficients are the values that multiply the predictor values.
Suppose you have the following regression equation: y = 3X + 5. In this equation, +3 is the
coefficient, X is the predictor, and +5 is the constant.
• Some popular applications of linear regression are:
• Analyzing trends and sales estimates
• Salary forecasting
• Real estate prediction
• Arriving at ETAs in traffic
Logistic Regression:
• Logistic regression is another supervised learning algorithm which is
used to solve the classification problems. In classification problems,
we have dependent variables in a binary or discrete format such as 0
or 1.
• Logistic regression algorithm works with the categorical variable such
as 0 or 1, Yes or No, True or False, Spam or not spam, etc.
• It is a predictive analysis algorithm which works on the concept of
probability.
• Logistic regression is a type of regression, but it is different from the
linear regression algorithm in the term how they are used.
Logistic Regression: (cont.)
• Logistic regression uses sigmoid function or logistic function which is
a complex cost function. This sigmoid function is used to model the
data in logistic regression. The function can be represented as: