Gradient Descent
Gradient Descent
2 4/3/2017
Fundamental Problem in Machine Learning
n
Min h(Xi ) Yi 2
i 1
3 4/3/2017
What is Gradient
4 4/3/2017
20
18
16
14
12
10
0
-3 -2 -1 0 1 2 3 4
f(x)
x f(x) = x2 + 5
6 41
5.9 39.81
5.8 38.64
5.7 37.49
5.6 36.36
5.5 35.25
5.4 34.16
Derivative
7 4/3/2017
Taylor Series Expansion
2 n
h h n
f ( x h) f ( x) hf ' ( x) f ' ' ( x) ... f ( x)
2! n!
h2
f ( x h) f ( x) hf ' ( x) f ' ' ( x)
2!
8 4/3/2017
Newton’s Method
2
h
f ( x h) f ( x) hf ' ( x) f ' ' ( x)
2!
The maximum difference between f(x+h) and f(x) occurs
when:
f ' ( x)
*
h
f ' ' ( x)
In Newton’s Method the value of xk+1 is chosen such that:
f ' ( x)
xk 1 xk
f ' ' ( x)
9 4/3/2017
Gradient Method (Ascent or Descent)
Step 1: Choose the direction of search (direction of
search is -f(xk) for descent f(xk) for ascent.
xk 1 xk k d k
k is the step size and dk is the direction
10 4/3/2017
Optimal Step size
13 4/3/2017
Gradient Decent Example
14 4/3/2017
2 2
f ( x1, x2 ) 5x1 x2 4 x1x2 6 x1 4 x2 15
f ( x1, x2 ) 75
xk 1 xk k d k
X 12 X 11 0.1f ( X 11) 0 0.1* 34 3.4
X 22 X 21 0.1f ( X 11) 10 0.1*16 8.4
Optimal Step Size
Maximize
2 2
4 x1 6 x2 2 x1 2 x1x2 2 x2
xk 1 xk k d k
d k f(xk )
17 4/3/2017
(x1, x2) = (1, 1)
dk f(xk ) (4 4 x1 2 x2 ,6 2 x1 4 x2 ) (2,0)