0% found this document useful (0 votes)
4 views

Lecture 4

Uploaded by

Uzma Mohammed
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Lecture 4

Uploaded by

Uzma Mohammed
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Lecture 4: Linear Neural Network and Linear

Regression: Part 2

Md. Shahriar Hussain


ECE Department, NSU

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain
Linear Regression Single Variable

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 2
Important Equations

Hypothesis:

Parameters:

Cost Function:

Goal:

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 3
Cost Function for two parameters

(for fixed , this is a function of x) (function of the parameters )

500

400

300
Price ($)
200
in 1000’s
100

0
0 500 1000 1500 2000 2500 3000

Size in feet2 (x)

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 4
Cost Function for two parameters

• Previously we plotted our cost function by plotting


– θ1 vs J(θ1)
• Now we have two parameters
– Plot becomes a bit more complicated
– Generates a 3D surface plot where axis are
• X = θ1
• Z = θ0
• Y = J(θ0,θ1)

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 5
Cost Function for two parameters

• We can see that the height


(y) of the graph indicates the
value of the cost function,
• we need to find where y is at
a minimum

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 6
Gradient descent

• We want to get min J(θ0, θ1)


• Gradient descent
– Used all over machine learning for minimization

• Outline:

• Start with some

• Keep changing to reduce until we hopefully


end up at a minimum

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 7
Gradient descent

 Start with initial guesses


 Start at 0,0 (or any other value)
 Keeping changing θ0 and θ1 a little bit to try and reduce J(θ0,θ1)
 Each time you change the parameters, you select the gradient which
reduces J(θ0,θ1) the most possible
 Repeat
 Do so until you converge to a local minimum
 Has an interesting property
 Where you start can determine which minimum you end up
 Here we can see one initialization point led to one local minimum
 The other led to a different one

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 8
Gradient descent

• One initialization point led to one local minimum.


The other led to a different one
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 9
Gradient Descent Algorithm
• Gradient descent is used to minimize the MSE by
calculating the gradient of the cost function

Correct: Simultaneous update Incorrect:

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 10
Gradient Descent Algorithm

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 11
Gradient Descent Algorithm

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 12
Gradient Descent Algorithm

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 13
Gradient Descent Algorithm

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 14
Gradient Descent Algorithm

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 15
Gradient Descent Algorithm

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 16
Learning Rate

• Here, α is the learning rate, a hyperparameter


• It controls how big steps we made
• If α is small, we will take tiny steps
• If α is big, we have an aggressive gradient descent

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 17
Learning Rate

If α is too small, gradient descent


can be slow.
Higher training time

If α is too large, gradient descent


can overshoot the minimum. It may
fail to converge, or even diverge.

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 18
Learning Rate

If α is too small, gradient descent


can be slow.
Higher training time

If α is too large, gradient descent


can overshoot the minimum. It may
fail to converge, or even diverge.

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 19
Learning Rate

If α is too small, gradient descent


can be slow.
Higher training time

If α is too large, gradient descent


can overshoot the minimum. It may
fail to converge, or even diverge.

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 20
Local Minima

• Local minimum: value of the loss function is minimum at that point in a local
region.
• Global minima: value of the loss function is minimum globally across the
entire domain the loss function

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 21
Local Minima

at local minima

Global minima

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 22
Gradient Descent Calculation

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 23
Gradient Descent Calculation

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 24
Gradient Descent Calculation

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 25
Linear Regression Single Variable

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 26
Linear Regression Multiple Variable

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 27
Linear Regression Multiple Variable

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 28
Linear Regression Multiple Variable

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 29
Linear Regression Multiple Variable

Now:

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 30
Linear Regression Multiple Variable

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 31
Linear Regression Multiple Variable

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 32
Gradient Descent for Multi Variables

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 33
Gradient Descent for Multi Variables

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 34
Gradient Descent for Multi Variables

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 35
Gradient Descent for Multi Variables

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 36
Gradient Descent for Multi Variables Vector Format

• Vector format:

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 37
Gradient Descent Calculation

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 38
Gradient Descent for Multi Variables Vector Format

• Compute them in the matrix format

• The gradient vector,

where, X is a matrix

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 39
Gradient Descent for Multi Variables Vector Format

• Suppose, we have d number of features and m number of sample examples.

1 1 1 1
𝑥0 𝑥1 𝑥2 … … . 𝑥𝑑
2 2 2 2
𝑥0 𝑥1 𝑥2 … … . 𝑥𝑑
3 3 3 3
𝑥0 𝑥1 𝑥2 … … . 𝑥𝑑
𝑋= .
.
.
.
𝑥0 𝑚 𝑥1 𝑚 𝑥2 𝑚 … … . 𝑥𝑑 𝑚

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 40
Gradient Descent for Multi Variables Vector Format
• Dimensionality Matching
– Suppose, we have d number of features and m number of sample examples

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 41
Gradient Descent for Multi Variables Vector Format

• Dimensionality Matching

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 42
Gradient Descent for Multi Variables Vector Format

• The gradient Descent update rule in matrix format:

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 43
Batch Gradient Descent

• This formula involves calculations over the full training set X, at each
Gradient Descent step
• This is why the algorithm is called Batch Gradient Descent.

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 44
• Reference:
– Andrew NG Lectures on Machine Learning, Standford University

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 45

You might also like