Linear Regression
Linear Regression
Eg: Predicting prices of house based on the area of the house, number of floors, number of rooms
etc.
Advertising Data (Multivariate Regression)
Simple Linear Regression
Simple Linear Regression
It is a simple approach for predicting a quantitative response Y on the basis of a single
predictor variable X.
It assumes that there is approximately a linear relationship between X and Y.
Mathematically, it can be written as:
Y ≈ β 0 + β 1X
β0 and β1are two unknown constants that represent the intercept and slope terms in the
linear model.
Together, β0 and β1are known as model coefficients or parameters (to be learnt).
This is also called as regressing Y on X.
Simple Linear Regression
For this example, it can be written as
sales ≈ β0 + β1* TV
Once the values of β0 and β1have been estimated using the training data, we can
predict future sales on the basis of expenditure done on TV advertising.
Estimating the Coefficients
Let us say, we have a dataset as
(x1,y1), (x2,y2),......(xn,yn)
Now, we want to find an intercept β0 and slope β1 such that the resulting straight
line is as close as possible to the n data points.
Then, ei = yi - ŷi represents the ith residual - this is the difference between the ith
observed response value and the ith response value that is predicted by the linear
model.
85 95 88 81 49 196
80 70 72 66 4 16
70 65 64 72 1 49
60 70 69 64 1 36
56 306 SUM
RSS of Model 1 is lower than that of Model 2. So Model 1 is preferable over Model 2.
But the question is, how to get to (find out) Model 1 (or the best model)?
Derivation
Now solving for β1
Alternative Formulas
Sample Question
Working
β1 = ((30500) - (5*78*77)) / (31150 - 5*6084)
β0 = 77 - (-0.6438)(78)
= 77 - 50.2191
= 26.7808
= 26.7808 + 51.504
= 78.2845
Points to Ponder
The intercept ꞵ0 is such that the regression line goes through
Outliers are the points that too far away from the regression line (or most of the
data points), often because of measurement errors.
Multivariate Linear Regression
Given marks in english, marks in mathematics then predict the GATE score
Here, X is a mxn matrix whose first column is all 1s and the remaining columns
are the columns of X and ꞵ hat has the intercept ꞵ0 hat as its first entry and the
regression coefficients as the remaining n entries.
Normal Equation
The ꞵ hat vector can be computed using normal equation as given below:
Characteristics of Normal Equation Method
Normal Equation method is used:
x1 x2 y
1 9 14
2 1 7
3 2 12
4 3 16
5 4 20
Gradient Descent
Gradient Descent
Gradient Descent is a an optimization algorithm that can be used to find the global
or local minima of a differentiable function.
It is an iterative algorithm.
Notations
Matrix Notation
x0 x1 y
Assuming 𝜽0 = 0, find out J(𝜽1)
a) 𝜽1 = 1
1 1 1
1 2 2
b) 𝜽1 = 0.5
1 3 3
c) 𝜽1 = 0
Understanding Cost Function
1. Initialize the weight and bias (i.e. regression coefficients) randomly or with
0(both will work).
2. Make predictions with this initial weight and bias.
3. Compare these predicted values with the actual values and define the loss
function using both these predicted and actual values.
4. With the help of differentiation, calculate how loss function changes with
respect to weight and bias term.
5. Update the weight and bias term so as to minimize the loss function.
A too low value of alpha can make the algorithm move very very slow. The
algorithm is said to converge too slowly.
A too high value of alpha can make the algorithm overshoot the minimum point
and thus never reach the minimum point.
Source: Andrew Ng, Machine Learning Course, Coursera
Source: Andrew Ng, Machine Learning Course, Coursera
Find out the values of regression coefficients for next
two iterations of Gradient Descent. Take initial values
of coefficients as 0. Also, find out the cost at each
iteration. Take alpha = 0.01.
x y
1 0.85
2 1.20
3 1.55
4 1.9
Iteration 1
Iteration 1:updating theta_1
Calculating Cost
J(0.01375,0.03875) = ?
~0.86 (approx.)
Iteration 2
Iteration 2
Calculating Cost
J(0.026393,0.4225) = ?
0.046
Vectorised Notation for Gradient Descent
Solving previous example using vectorized notation
Initially: