0% found this document useful (0 votes)

4 views

Lecture 4

Uploaded by

Uzma Mohammed

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Lecture 4

Uploaded by

Uzma Mohammed

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

Lecture 4: Linear Neural Network and Linear

Regression: Part 2

Md. Shahriar Hussain

ECE Department, NSU

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain
Linear Regression Single Variable

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 2
Important Equations

Hypothesis:

Parameters:

Cost Function:

Goal:

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 3
Cost Function for two parameters

(for fixed , this is a function of x) (function of the parameters )

500

400

300
Price ($)
200
in 1000’s
100

0
0 500 1000 1500 2000 2500 3000

Size in feet2 (x)

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 4
Cost Function for two parameters

• Previously we plotted our cost function by plotting

– θ1 vs J(θ1)
• Now we have two parameters
– Plot becomes a bit more complicated
– Generates a 3D surface plot where axis are
• X = θ1
• Z = θ0
• Y = J(θ0,θ1)

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 5
Cost Function for two parameters

• We can see that the height

(y) of the graph indicates the
value of the cost function,
• we need to find where y is at
a minimum

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 6
Gradient descent

• We want to get min J(θ0, θ1)

• Gradient descent
– Used all over machine learning for minimization

• Outline:

• Start with some

• Keep changing to reduce until we hopefully

end up at a minimum

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 7
Gradient descent

 Start with initial guesses

 Start at 0,0 (or any other value)
 Keeping changing θ0 and θ1 a little bit to try and reduce J(θ0,θ1)
 Each time you change the parameters, you select the gradient which
reduces J(θ0,θ1) the most possible
 Repeat
 Do so until you converge to a local minimum
 Has an interesting property
 Where you start can determine which minimum you end up
 Here we can see one initialization point led to one local minimum
 The other led to a different one

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 8
Gradient descent

• One initialization point led to one local minimum.

The other led to a different one
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 9
Gradient Descent Algorithm
• Gradient descent is used to minimize the MSE by
calculating the gradient of the cost function

Correct: Simultaneous update Incorrect:

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 10
Gradient Descent Algorithm

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 11
Gradient Descent Algorithm

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 12
Gradient Descent Algorithm

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 13
Gradient Descent Algorithm

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 14
Gradient Descent Algorithm

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 15
Gradient Descent Algorithm

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 16
Learning Rate

• Here, α is the learning rate, a hyperparameter

• It controls how big steps we made
• If α is small, we will take tiny steps
• If α is big, we have an aggressive gradient descent

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 17
Learning Rate

If α is too small, gradient descent

can be slow.
Higher training time

If α is too large, gradient descent

can overshoot the minimum. It may
fail to converge, or even diverge.

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 18
Learning Rate

If α is too small, gradient descent

can be slow.
Higher training time

If α is too large, gradient descent

can overshoot the minimum. It may
fail to converge, or even diverge.

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 19
Learning Rate

If α is too small, gradient descent

can be slow.
Higher training time

If α is too large, gradient descent

can overshoot the minimum. It may
fail to converge, or even diverge.

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 20
Local Minima

• Local minimum: value of the loss function is minimum at that point in a local
region.
• Global minima: value of the loss function is minimum globally across the
entire domain the loss function

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 21
Local Minima

at local minima

Global minima

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 22
Gradient Descent Calculation

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 23
Gradient Descent Calculation

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 24
Gradient Descent Calculation

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 25
Linear Regression Single Variable

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 26
Linear Regression Multiple Variable

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 27
Linear Regression Multiple Variable

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 28
Linear Regression Multiple Variable

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 29
Linear Regression Multiple Variable

Now:

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 30
Linear Regression Multiple Variable

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 31
Linear Regression Multiple Variable

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 32
Gradient Descent for Multi Variables

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 33
Gradient Descent for Multi Variables

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 34
Gradient Descent for Multi Variables

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 35
Gradient Descent for Multi Variables

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 36
Gradient Descent for Multi Variables Vector Format

• Vector format:

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 37
Gradient Descent Calculation

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 38
Gradient Descent for Multi Variables Vector Format

• Compute them in the matrix format

• The gradient vector,

where, X is a matrix

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 39
Gradient Descent for Multi Variables Vector Format

• Suppose, we have d number of features and m number of sample examples.

1 1 1 1
𝑥0 𝑥1 𝑥2 … … . 𝑥𝑑
2 2 2 2
𝑥0 𝑥1 𝑥2 … … . 𝑥𝑑
3 3 3 3
𝑥0 𝑥1 𝑥2 … … . 𝑥𝑑
𝑋= .
.
.
.
𝑥0 𝑚 𝑥1 𝑚 𝑥2 𝑚 … … . 𝑥𝑑 𝑚

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 40
Gradient Descent for Multi Variables Vector Format
• Dimensionality Matching
– Suppose, we have d number of features and m number of sample examples

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 41
Gradient Descent for Multi Variables Vector Format

• Dimensionality Matching

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 42
Gradient Descent for Multi Variables Vector Format

• The gradient Descent update rule in matrix format:

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 43
Batch Gradient Descent

• This formula involves calculations over the full training set X, at each
Gradient Descent step
• This is why the algorithm is called Batch Gradient Descent.

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 44
• Reference:
– Andrew NG Lectures on Machine Learning, Standford University

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 45

Official Joseph Wexler Criminal
No ratings yet
Official Joseph Wexler Criminal
13 pages
Lecture 3
No ratings yet
Lecture 3
56 pages
Lecture 6
No ratings yet
Lecture 6
51 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
5.1Loss Function, Optimization,Gd
No ratings yet
5.1Loss Function, Optimization,Gd
39 pages
LInear
No ratings yet
LInear
14 pages
Notes Unit 1-3 Part-III
No ratings yet
Notes Unit 1-3 Part-III
25 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Chapter 4
No ratings yet
Chapter 4
65 pages
ML Lecture # 03 Gradient Descent
No ratings yet
ML Lecture # 03 Gradient Descent
23 pages
05 Gradient Descent
No ratings yet
05 Gradient Descent
23 pages
4_Gradient Descent and Stochastic GD
No ratings yet
4_Gradient Descent and Stochastic GD
37 pages
L4 More On Linear Regression and Polynomial Regression
No ratings yet
L4 More On Linear Regression and Polynomial Regression
37 pages
Problem_Set_Linear_Regression_and_Gradient_Descent
No ratings yet
Problem_Set_Linear_Regression_and_Gradient_Descent
3 pages
4. Gradient Descent
No ratings yet
4. Gradient Descent
15 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
ML Notes
No ratings yet
ML Notes
14 pages
3 TrainingNetwork
No ratings yet
3 TrainingNetwork
65 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
Gradient Descent: Disclaimer: This PPT Is Modified Based On Hung-Yi Lee
No ratings yet
Gradient Descent: Disclaimer: This PPT Is Modified Based On Hung-Yi Lee
38 pages
Lecture02a Optimization Annotated PDF
No ratings yet
Lecture02a Optimization Annotated PDF
23 pages
Lecture 7
No ratings yet
Lecture 7
33 pages
Gradient Descent
No ratings yet
Gradient Descent
9 pages
lec6_7_Linear_regression
No ratings yet
lec6_7_Linear_regression
38 pages
DL Unit -2
No ratings yet
DL Unit -2
20 pages
CS 304.A Training Models
No ratings yet
CS 304.A Training Models
149 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
DNN M3 Optimization
No ratings yet
DNN M3 Optimization
81 pages
Gradient Descent_PR
No ratings yet
Gradient Descent_PR
31 pages
Chapter4 PDF
No ratings yet
Chapter4 PDF
9 pages
WINSEM2024-25_CSE4006_ETH_AP2024254000693_2025-01-08_Reference-Material-I
No ratings yet
WINSEM2024-25_CSE4006_ETH_AP2024254000693_2025-01-08_Reference-Material-I
40 pages
Lec05-1-Gradient Descent-Detailed
No ratings yet
Lec05-1-Gradient Descent-Detailed
62 pages
Gradient Descent
No ratings yet
Gradient Descent
6 pages
Gradient Descent a Fundamental Optimization Algorithm
No ratings yet
Gradient Descent a Fundamental Optimization Algorithm
30 pages
Gradient Descent DS Rohit Sharma Fench Knjs
No ratings yet
Gradient Descent DS Rohit Sharma Fench Knjs
15 pages
Gradient Descent (v2)
No ratings yet
Gradient Descent (v2)
38 pages
L3 Linear Regression and Gradient Descent
No ratings yet
L3 Linear Regression and Gradient Descent
46 pages
Lecture 7 - Optimization Part I
No ratings yet
Lecture 7 - Optimization Part I
38 pages
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
No ratings yet
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
37 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
Sheet 3 Sol 3
No ratings yet
Sheet 3 Sol 3
3 pages
Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
AI33
No ratings yet
AI33
6 pages
Linear Regression With One Variable: Gradient Descent
No ratings yet
Linear Regression With One Variable: Gradient Descent
30 pages
Gradient Descent - Xiaowei Huang
No ratings yet
Gradient Descent - Xiaowei Huang
53 pages
CCS355 Neural Networks and Deep Learning
No ratings yet
CCS355 Neural Networks and Deep Learning
142 pages
Presentation
No ratings yet
Presentation
12 pages
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
No ratings yet
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
24 pages
Mlfa Autumn 22 Lec 04
No ratings yet
Mlfa Autumn 22 Lec 04
24 pages
Lecture 2.1 Linear Regression
No ratings yet
Lecture 2.1 Linear Regression
36 pages
Gradient Descent Algorithm is a first
No ratings yet
Gradient Descent Algorithm is a first
5 pages
Module 3
No ratings yet
Module 3
27 pages
Module2-Optimizations
No ratings yet
Module2-Optimizations
65 pages
Lecture 4 - More On Linear Regression and Polynomial Regression
No ratings yet
Lecture 4 - More On Linear Regression and Polynomial Regression
26 pages
Lecture10v01 Descent2
No ratings yet
Lecture10v01 Descent2
18 pages
Linear Regression
No ratings yet
Linear Regression
63 pages
cours5
No ratings yet
cours5
23 pages
Lecture 3 Gradient Descent
No ratings yet
Lecture 3 Gradient Descent
37 pages
Applications of Derivatives Errors and Approximation (Calculus) Mathematics Question Bank
From Everand
Applications of Derivatives Errors and Approximation (Calculus) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Schaum's Easy Outline of Probability and Statistics, Revised Edition
From Everand
Schaum's Easy Outline of Probability and Statistics, Revised Edition
Schiller
No ratings yet
10 Crucial Consumer Trends For 2010
No ratings yet
10 Crucial Consumer Trends For 2010
13 pages
Revision Information For PCMC ActiveX Library For WindowsXP
No ratings yet
Revision Information For PCMC ActiveX Library For WindowsXP
10 pages
Tour Guiding
No ratings yet
Tour Guiding
2 pages
Curtain Walls
No ratings yet
Curtain Walls
10 pages
Meeting Minutes: (Insert Company Name) (Insert Meeting Title or Department, Team or Committee Name)
No ratings yet
Meeting Minutes: (Insert Company Name) (Insert Meeting Title or Department, Team or Committee Name)
2 pages
Standards For Dimensional Tolerances
No ratings yet
Standards For Dimensional Tolerances
10 pages
Free Crochet Pattern Keychain Little Lucky Pig: Blog Categories
100% (3)
Free Crochet Pattern Keychain Little Lucky Pig: Blog Categories
23 pages
March 19 (Binary and CSV Files by Swati Chawla)
No ratings yet
March 19 (Binary and CSV Files by Swati Chawla)
8 pages
Box Tops Product List
No ratings yet
Box Tops Product List
2 pages
Ishrat Jahan - Resume
No ratings yet
Ishrat Jahan - Resume
2 pages
Lit 18626 12 51
No ratings yet
Lit 18626 12 51
120 pages
Strategic Entrepreneurship in Emerging Market Multinationals-Marco Polo Marine
No ratings yet
Strategic Entrepreneurship in Emerging Market Multinationals-Marco Polo Marine
14 pages
Class 8 Science NCERT +
No ratings yet
Class 8 Science NCERT +
5 pages
Kolcaba's Theory of Comfort
No ratings yet
Kolcaba's Theory of Comfort
1 page
Forklift Safety - Answer Guide
No ratings yet
Forklift Safety - Answer Guide
5 pages
Freeemg 1000
No ratings yet
Freeemg 1000
39 pages
TLIM 122 - Photography Module1
No ratings yet
TLIM 122 - Photography Module1
18 pages
Industria Del Software en La India
No ratings yet
Industria Del Software en La India
22 pages
Booking Confirmation On IRCTC, Train: 15028, 04-Jan-2022, SL, GKP - CPR
No ratings yet
Booking Confirmation On IRCTC, Train: 15028, 04-Jan-2022, SL, GKP - CPR
1 page
Open Redirect Vulnerability
No ratings yet
Open Redirect Vulnerability
4 pages
Food Selection Purchase and Storage
No ratings yet
Food Selection Purchase and Storage
34 pages
57e4d64a5ae56 1317890 Sample
100% (3)
57e4d64a5ae56 1317890 Sample
32 pages
CXO Database
No ratings yet
CXO Database
4 pages
Quinn Letter
No ratings yet
Quinn Letter
3 pages
Memstar Membrane - 2022 - Final Version - Compressed
No ratings yet
Memstar Membrane - 2022 - Final Version - Compressed
38 pages
Blank General Payroll
0% (1)
Blank General Payroll
1 page
1900Mhz Digital Band Selective Repeater: Mbda-1933
No ratings yet
1900Mhz Digital Band Selective Repeater: Mbda-1933
2 pages
MLC Rules
No ratings yet
MLC Rules
3 pages

Lecture 4

Uploaded by

Lecture 4

Uploaded by

Lecture 4: Linear Neural Network and Linear

Md. Shahriar Hussain

(for fixed , this is a function of x) (function of the parameters )

Size in feet2 (x)

• Previously we plotted our cost function by plotting

• We can see that the height

• We want to get min J(θ0, θ1)

• Start with some

• Keep changing to reduce until we hopefully

 Start with initial guesses

• One initialization point led to one local minimum.

Correct: Simultaneous update Incorrect:

• Here, α is the learning rate, a hyperparameter

If α is too small, gradient descent

If α is too large, gradient descent

If α is too small, gradient descent

If α is too large, gradient descent

If α is too small, gradient descent

If α is too large, gradient descent

• Compute them in the matrix format

• The gradient vector,

• Suppose, we have d number of features and m number of sample examples.

• The gradient Descent update rule in matrix format:

You might also like