Linear Regression
Linear Regression
What is a Regression?
Regression shows a line or curve that passes through all the data points on a target-predictor graph in
such a way that the vertical distance between the data points and the regression line is minimum.” It is
used principally for prediction, forecasting, time series modeling, and determining the causal-effect
relationship between variables.
Linear Regression
Linear regression is a quiet and simple statistical regression method used for predictive analysis and shows
the relationship between the continuous variables. Linear regression shows the linear relationship between
the independent variable (X-axis) and the dependent variable (Y-axis), consequently called linear
regression. If there is a single input variable (x), such linear regression is called simple linear regression.
And if there is more than one input variable, such linear regression is called multiple linear
regression. The linear regression model gives a sloped straight line describing the relationship within the
variables.
The above graph presents the linear relationship between the dependent variable and independent variables.
When the value of x (independent variable) increases, the value of y (dependent variable) is likewise
increasing. The red line is referred to as the best fit straight line. Based on the given data points, we try to
plot a line that models the points the best.
To calculate best-fit line linear regression uses a traditional slope-intercept form.
y= Dependent Variable.
x= Independent Variable.
Using the MSE function, we will change the values of a0 and a1 such that the MSE value settles at the
minima. Model parameters xi, b (a0,a1) can be manipulated to minimize the cost function. These parameters
can be determined using the gradient descent method so that the cost function value is minimum.
Gradient descent
Gradient descent is a method of updating a0 and a1 to minimize the cost function (MSE). A regression
model uses gradient descent to update the coefficients of the line (a0, a1 => xi, b) by reducing the cost
function by a random selection of coefficient values and then iteratively update the values to reach the
minimum cost function.
Imagine a pit in the shape of U. You are standing at the topmost point in the pit, and your objective is to
reach the bottom of the pit. There is a treasure, and you can only take a discrete number of steps to reach
the bottom. If you decide to take one footstep at a time, you would eventually get to the bottom of the pit
but, this would take a longer time. If you choose to take longer steps each time, you may get to sooner but,
there is a chance that you could overshoot the bottom of the pit and not near the bottom. In the gradient
descent algorithm, the number of steps you take is the learning rate, and this decides how fast the algorithm
converges to the minima.
To update a0 and a1, we take gradients from the cost function. To find these gradients, we take partial
derivatives for a0 and a1.
The partial derivate are the gradients, and they are used to update the values of a0 and a1. Alpha is the
learning rate.
Impact of different values for learning rate
The blue line represents the optimal value of the learning rate, and the cost function value is minimized in
a few iterations. The green line represents if the learning rate is lower than the optimal value, then the
number of iterations required high to minimize the cost function. If the learning rate selected is very high,
the cost function could continue to increase with iterations and saturate at a value higher than the minimum
value, that represented by a red and black line.
Advantages and Disadvantages of Linear Regression
Advantages Disadvantages
When you know the relationship between the Diversely, linear regression assumes a linear
independent and dependent variable have a relationship between dependent and independent
linear relationship, this algorithm is the best variables. That means it assumes that there is a
to use because of it’s less complexity to straight-line relationship between them. It assumes
compared to other algorithms. independence between attributes.