0% found this document useful (0 votes)
45 views18 pages

Gradient Descent

Gradient descent is an optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent. The gradient is calculated at each step to determine the direction, and the step size is adjusted to minimize the function. The algorithm iterates until a minimum is reached. Newton's method is also described as an optimization technique using the second derivative.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views18 pages

Gradient Descent

Gradient descent is an optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent. The gradient is calculated at each step to determine the direction, and the step size is adjusted to minimize the function. The algorithm iterates until a minimum is reached. Newton's method is also described as an optimization technique using the second derivative.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Gradient Descent

2 4/3/2017
Fundamental Problem in Machine Learning

 In ML, the objective is to find parameters (0, 1, …, n)


such that:

n
Min  h(Xi )  Yi  2
i 1

h(Xi) value of Yi predicted by the algorithm

3 4/3/2017
What is Gradient

 Gradient is a derivative of function of several variables.

 Gradient can be used for finding optimal solution to a


function.

4 4/3/2017
20

18

16

14

12

10

0
-3 -2 -1 0 1 2 3 4

f(x)
x f(x) = x2 + 5
6 41
5.9 39.81
5.8 38.64
5.7 37.49
5.6 36.36
5.5 35.25
5.4 34.16
Derivative

' f(x  h)  f(x)


f (x)  Lt
h 0 h

We start with a random value for x and change x to x + h


such that the value of function decreases.

That means, f’(x) should be negative.

7 4/3/2017
Taylor Series Expansion

2 n
h h n
f ( x  h)  f ( x)  hf ' ( x)  f ' ' ( x)  ...  f ( x)
2! n!

h2
f ( x  h)  f ( x)  hf ' ( x)  f ' ' ( x)
2!

8 4/3/2017
Newton’s Method
2
h
f ( x  h)  f ( x)  hf ' ( x)  f ' ' ( x)
2!
The maximum difference between f(x+h) and f(x) occurs
when:

f ' ( x)
*
h 
f ' ' ( x)
In Newton’s Method the value of xk+1 is chosen such that:

f ' ( x)
xk 1  xk 
f ' ' ( x)
9 4/3/2017
Gradient Method (Ascent or Descent)
 Step 1: Choose the direction of search (direction of
search is -f(xk) for descent f(xk) for ascent.

 Fine a new point xk+1 such that:

xk 1  xk   k d k
k is the step size and dk is the direction

10 4/3/2017
Optimal Step size

 The optimal step size k is calculated using by optimizing


f(xk + kdk). That is, by setting f’(xk + kdk) = 0.

13 4/3/2017
Gradient Decent Example

14 4/3/2017
2 2
f ( x1, x2 )  5x1  x2  4 x1x2  6 x1  4 x2  15

f(x1, x 2 )  10x1  4x 2  6, 2x 2  4x1  4


X11 = 0, X12 = 10, k = 0.1

f ( x1, x2 )  75

f(x1, x 2 )  34, 16

xk 1  xk   k d k
X 12  X 11  0.1f ( X 11)  0  0.1* 34  3.4
X 22  X 21  0.1f ( X 11)  10  0.1*16  8.4
Optimal Step Size

Maximize

2 2
4 x1  6 x2  2 x1  2 x1x2  2 x2
xk 1  xk   k d k

d k  f(xk )

17 4/3/2017
(x1, x2) = (1, 1)

dk  f(xk )  (4  4 x1  2 x2 ,6  2 x1  4 x2 )  (2,0)

xk 1  xk  αk dk  (1,1)   k (2,0)  (1  2 k ),1

f ( k )  4(1  2 k )  6  2(1  2 k ) 2  2(1  2 k )  2

You might also like