0% found this document useful (0 votes)
93 views1 page

Gradient Descent for Linear Regression: repeat until convergence: (:=:=) − α ( −) 1 ∑ − α ( ( −) ) 1 ∑

The document discusses gradient descent, an algorithm used in machine learning to minimize some cost function. It works by taking the derivative of the cost function to determine the direction of steepest descent, and taking steps down in that direction. For linear regression specifically, the gradient descent equations are derived as repeating updates to the parameters θ0 and θ1 by subtracting a step size times the average derivative over the training examples. This process iteratively improves the accuracy of the hypothesis.

Uploaded by

Laith Bounenni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views1 page

Gradient Descent for Linear Regression: repeat until convergence: (:=:=) − α ( −) 1 ∑ − α ( ( −) ) 1 ∑

The document discusses gradient descent, an algorithm used in machine learning to minimize some cost function. It works by taking the derivative of the cost function to determine the direction of steepest descent, and taking steps down in that direction. For linear regression specifically, the gradient descent equations are derived as repeating updates to the parameters θ0 and θ1 by subtracting a step size times the average derivative over the training examples. This process iteratively improves the accuracy of the hypothesis.

Uploaded by

Laith Bounenni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

27/09/2020 Machine Learning - Home | Coursera

The way we do this is by taking the derivative (the tangential line to a function) of our cost function. The slope of
the tangent is the derivative at that point and it will give us a direction to move towards. We make steps down the
cost function in the direction with the steepest descent, and the size of each step is determined by the parameter
α, which is called the learning rate.

The gradient descent algorithm is:

repeat until convergence:

θj := θj − α ∂θ∂ j J(θ0 , θ1 )

where

j=0,1 represents the feature index number.

Intuitively, this could be thought of as:

repeat until convergence:

θj := θj − α[Slope of tangent aka derivative in j dimension][Slope of tangent aka derivative in j dimension]

Gradient Descent for Linear Regression

When specifically applied to the case of linear regression, a new form of the gradient descent equation can be
derived. We can substitute our actual cost function and our actual hypothesis function and modify the equation to
(the derivation of the formulas are out of the scope of this course, but a really great one can be found here):

repeat until convergence: {


1 m
θ0 := θ0 − α ∑ (h θ (x i ) − y i )
m i=1
m
1
θ1 := θ1 − α ∑ ((h θ (x i ) − y i )x i )
m i=1
}

where m is the size of the training set, θ0 a constant that will be changing simultaneously with θ1 and xi , yi are
values of the given training set (data).

Note that we have separated out the two cases for θj into separate equations for θ0 and θ1 ; and that for θ1 we
are multiplying xi at the end due to the derivative.

The point of all this is that if we start with a guess for our hypothesis and then repeatedly apply these gradient
descent equations, our hypothesis will become more and more accurate.

https://www.coursera.org/learn/machine-learning/resources/JXWWS 1/1

You might also like