3 Recitation StochasticGradientDescent

Stochastic gradient descent (SGD) is an optimization algorithm for minimizing loss functions. [SGD] approximates the gradient of the expected loss function using samples, and takes steps proportional to the negative of this estimated gradient. This allows SGD to optimize problems with very large datasets more efficiently than traditional gradient descent. SGD converges to the optimal solution by iteratively updating the weights using a single random sample or mini-batch at each step. The learning rate must be set appropriately for convergence. Backtracking line search can be used to adaptively select the learning rate at each step.

Uploaded by

M Rameez Ur Rehman

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views

3 Recitation StochasticGradientDescent

Uploaded by

M Rameez Ur Rehman

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Stochastic Gradient Descent

10701 Recitations 3

Mu Li

Computer Science Department

Cargenie Mellon University

February 5, 2013
The problem

I A typical machine learning problem has a penalty/regularizer

+ loss form
n
1X
min F (w ) = g (w ) + f (w ; yi , xi ),
w n
i=1

xi , w Rp , yi R, both g and f are convex

I Today we only consider differentiable f , and let g = 0 for
simplicity
I For example, let f (w ; yi , xi ) = log p(yi |xi , w ), we are trying
to maximize the log likelihood, which is
n
1X
max log p(yi |xi , w )
w n
i=1
Gradient Descent

I choose initial w (0) , repeat Two dimensional

example:
w (t+1) = w (t) t F (w (t) )

until stop
I t is the learning rate, and
1X
F (w (t) ) = w f (w (t) ; yi , xi )
n
i

I How to stop? kw (t+1) w (t) k or

kF (w (t) )k
Learning rate matters

too small t , after 100

t = t, it is too big
iterations
Backtracking line search

Adaptively choose the learning rate

I choose a parameter 0 < < 1
I start with = 1, repeat t = 0, 1, . . .
I while

L(w (t) L(w (t) )) > L(w (t) ) kL(w (t) )k2
2
update =
I w (t+1) = w (t) L(w (t) )
Backtracking line search

A typical choice = 0.8, converged after 13 iterations:

Stochastic Gradient Descent

We name n1 i f (w ; yi , xi ) the empirical loss, the thing we

P
I
hope to minimize is the expected loss

f (w ) = Eyi ,xi f (w ; yi , xi )

I Suppose we receive an infinite stream of samples (yt , xt ) from

the distribution, one way to optimize the objective is

w (t+1) = w (t) t w f (w (t) ; yt , xt )

I On practice, we simulate the stream by randomly pick up

(yt , xt ) from the samples we have
Comparing the average gradient of GD n1 i w f (w (t) ; yi , xi )
P
I
More about SGD

I the objective does not always decrease for each step

I comparing to GD, SGD needs more steps, but each step is
cheaper
I mini-batch, say pick up 100 samples and do average, may
accelerate the convergence
Relation to Perceptron

I Recall Perceptron: initialize w , repeat

(
yi xi if yi hw , xi i < 0
w =w+
0 otherwise

I Fix learning rate = 1, let f (w ; y , x) = max(0, yi hw , xi i),

then (
yi xi if yi hw , xi i < 0
w f (w ; y , x) =
0 otherwise
we derive Perceptron from SGD
Question?

3 DeltaRule PDF
No ratings yet
3 DeltaRule PDF
10 pages
Assignment 1: CMSC 498L Released: Feb-14. Due Feb-21
No ratings yet
Assignment 1: CMSC 498L Released: Feb-14. Due Feb-21
2 pages
Prob 02
No ratings yet
Prob 02
18 pages
Face Recognition Using Opencv
100% (5)
Face Recognition Using Opencv
70 pages
Huawei E iNOC
No ratings yet
Huawei E iNOC
48 pages
Logistic Regression
No ratings yet
Logistic Regression
10 pages
Lecture5 FGV
No ratings yet
Lecture5 FGV
25 pages
Lec 02 Computation Graphs
No ratings yet
Lec 02 Computation Graphs
64 pages
Support Vector Machines (SVM) : Y.H. Hu
No ratings yet
Support Vector Machines (SVM) : Y.H. Hu
25 pages
Lecture 3_Regression (1)
No ratings yet
Lecture 3_Regression (1)
47 pages
Linear Regression: Volker Tresp 2017
No ratings yet
Linear Regression: Volker Tresp 2017
25 pages
Lecture 3
No ratings yet
Lecture 3
7 pages
Today: - Calculus
No ratings yet
Today: - Calculus
61 pages
Lecture 13. em Algorithm (After-Class)
No ratings yet
Lecture 13. em Algorithm (After-Class)
6 pages
Logistic Regression Loss
No ratings yet
Logistic Regression Loss
7 pages
NeuralNetworks
No ratings yet
NeuralNetworks
29 pages
Boosting
No ratings yet
Boosting
11 pages
Bagging and Boosting: 9.520 Class 10, 13 March 2006 Sasha Rakhlin
No ratings yet
Bagging and Boosting: 9.520 Class 10, 13 March 2006 Sasha Rakhlin
19 pages
Lecture 2
No ratings yet
Lecture 2
57 pages
Unconstrained Optimization III: Topics We'll Cover
No ratings yet
Unconstrained Optimization III: Topics We'll Cover
3 pages
Percept Ron
No ratings yet
Percept Ron
2 pages
Vademecum PROB ML
No ratings yet
Vademecum PROB ML
14 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
s-m-s-t-c--lecture-2425-3
No ratings yet
s-m-s-t-c--lecture-2425-3
61 pages
ECS171: Machine Learning: Lecture 4: Optimization (LFD 3.3, SGD)
No ratings yet
ECS171: Machine Learning: Lecture 4: Optimization (LFD 3.3, SGD)
45 pages
4 Linear Regression Additional Notes
No ratings yet
4 Linear Regression Additional Notes
8 pages
Lec 16
No ratings yet
Lec 16
15 pages
Tutorial 8 Questions
No ratings yet
Tutorial 8 Questions
3 pages
Unconstrained Optimization II: Topics We'll Cover
No ratings yet
Unconstrained Optimization II: Topics We'll Cover
4 pages
Foundations of Deep Learning
No ratings yet
Foundations of Deep Learning
30 pages
Dual Coordinate Descent Methods For Logistic Regression
No ratings yet
Dual Coordinate Descent Methods For Logistic Regression
35 pages
ANN Backpropagation: Weight Updates For Hidden Nodes: Step 1: Update The Weights V
No ratings yet
ANN Backpropagation: Weight Updates For Hidden Nodes: Step 1: Update The Weights V
3 pages
Categorical-Notes-Ch1
No ratings yet
Categorical-Notes-Ch1
18 pages
Gradient Descent Based Learners
No ratings yet
Gradient Descent Based Learners
11 pages
Lec 4
No ratings yet
Lec 4
17 pages
Lecture 5_Logistic Regression (1)
No ratings yet
Lecture 5_Logistic Regression (1)
28 pages
AM207 2 Transforms Sampling
No ratings yet
AM207 2 Transforms Sampling
50 pages
Midterm F02soln
No ratings yet
Midterm F02soln
14 pages
Chaos Superconcentration Multiple Valleys
No ratings yet
Chaos Superconcentration Multiple Valleys
11 pages
Support Vector Machines PDF
No ratings yet
Support Vector Machines PDF
5 pages
hw3_red
No ratings yet
hw3_red
4 pages
R300 Solution Guide 2018M
No ratings yet
R300 Solution Guide 2018M
8 pages
linear
No ratings yet
linear
19 pages
SD-M1 TSI Chapitre 4
No ratings yet
SD-M1 TSI Chapitre 4
42 pages
MAST20005 Statistics Assignment 3
No ratings yet
MAST20005 Statistics Assignment 3
8 pages
EE Exercise Solutions 2022
No ratings yet
EE Exercise Solutions 2022
21 pages
Online Learning: 9.520 Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das
No ratings yet
Online Learning: 9.520 Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das
33 pages
Sol 1
No ratings yet
Sol 1
4 pages
Lecture 05
No ratings yet
Lecture 05
5 pages
Machine Learning 10-701 Final Exam May 5, 2015: Obvious Exceptions For Pacemakers and Hearing Aids
No ratings yet
Machine Learning 10-701 Final Exam May 5, 2015: Obvious Exceptions For Pacemakers and Hearing Aids
17 pages
Imaging Summary
No ratings yet
Imaging Summary
39 pages
MAFE208IU-L11 ODEs Part1
No ratings yet
MAFE208IU-L11 ODEs Part1
25 pages
HW2_e794459f-af6c-4645-b41a-b252424f04f3
No ratings yet
HW2_e794459f-af6c-4645-b41a-b252424f04f3
4 pages
IDC402_lec22
No ratings yet
IDC402_lec22
23 pages
1 D'alembert's Solution: Weston Barger July 12, 2016
No ratings yet
1 D'alembert's Solution: Weston Barger July 12, 2016
11 pages
NNLS1 2019 HW4 Solutions
No ratings yet
NNLS1 2019 HW4 Solutions
11 pages
04-LogisticRegression
No ratings yet
04-LogisticRegression
29 pages
Study Micro
No ratings yet
Study Micro
43 pages
CS 229, Public Course Problem Set #1 Solutions: Supervised Learning
No ratings yet
CS 229, Public Course Problem Set #1 Solutions: Supervised Learning
10 pages
6.7. Ordinary Differential Equation (Ode) and PDE
No ratings yet
6.7. Ordinary Differential Equation (Ode) and PDE
37 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
20 Multiscale Vessel Enhancement
No ratings yet
20 Multiscale Vessel Enhancement
13 pages
Tensor Voting: Theory and Applications: Gérard Medioni Chi-Keung Tang Mi-Suen Lee
No ratings yet
Tensor Voting: Theory and Applications: Gérard Medioni Chi-Keung Tang Mi-Suen Lee
10 pages
19 Detection of Electrophysiology Cathers
No ratings yet
19 Detection of Electrophysiology Cathers
8 pages
Bayes Optimization For Machine Learning
No ratings yet
Bayes Optimization For Machine Learning
29 pages
Modeling and Simulation of A Moving Robot Arm Mounted On Wheelchair
No ratings yet
Modeling and Simulation of A Moving Robot Arm Mounted On Wheelchair
5 pages
xx08 SIreview PDF
No ratings yet
xx08 SIreview PDF
22 pages
06561535
No ratings yet
06561535
6 pages
2Pqlgluhfwlrqdo9Lvlrqedvhg3Rvh&Rqwuroehwzhhq7Zr5Rerwv:, 1 - LD DQJ/L &+ (1+DL/RQJ
No ratings yet
2Pqlgluhfwlrqdo9Lvlrqedvhg3Rvh&Rqwuroehwzhhq7Zr5Rerwv:, 1 - LD DQJ/L &+ (1+DL/RQJ
6 pages
Convex Optimization in Image Processing: Ernie Esser
No ratings yet
Convex Optimization in Image Processing: Ernie Esser
9 pages
Robot Trajectory Optimization Using Approximate Inference
No ratings yet
Robot Trajectory Optimization Using Approximate Inference
8 pages
WWW Electronicslovers Com 2015 09 Arduino Based MPPT Solar C
No ratings yet
WWW Electronicslovers Com 2015 09 Arduino Based MPPT Solar C
35 pages
WWW Hackingdream Net 2015 05 Hack Wifi Wpa Wpa2 Wps in Windo
0% (1)
WWW Hackingdream Net 2015 05 Hack Wifi Wpa Wpa2 Wps in Windo
71 pages
Phy Sops
No ratings yet
Phy Sops
171 pages
Ricochet: - Presented by V.Sudharshan S.Ashok
No ratings yet
Ricochet: - Presented by V.Sudharshan S.Ashok
14 pages
EME-301 Material Science Engineering PDF
No ratings yet
EME-301 Material Science Engineering PDF
3 pages
Philips Go Pure Aca301
No ratings yet
Philips Go Pure Aca301
2 pages
Chapter 3 - Methods of Analysis: N N N N A A A A
No ratings yet
Chapter 3 - Methods of Analysis: N N N N A A A A
15 pages
Tunnel Blast Design Using Artificial Neural Network A Case Study Dsyndr 20110603064345d PDF
No ratings yet
Tunnel Blast Design Using Artificial Neural Network A Case Study Dsyndr 20110603064345d PDF
7 pages
Iminatanoute Admi
No ratings yet
Iminatanoute Admi
8 pages
Turbine Meters
No ratings yet
Turbine Meters
12 pages
IGBT
No ratings yet
IGBT
7 pages
GRP Properties 2016 01
No ratings yet
GRP Properties 2016 01
1 page
Installation Instruction Rheem Eclipse CFEWH 122657 Rev A April 2022 Print Version
No ratings yet
Installation Instruction Rheem Eclipse CFEWH 122657 Rev A April 2022 Print Version
32 pages
2 Ohm's Law: Experiment
No ratings yet
2 Ohm's Law: Experiment
25 pages
Format Invoice Done
No ratings yet
Format Invoice Done
1 page
MIP0225SY
No ratings yet
MIP0225SY
3 pages
GRP Pipeline Project
No ratings yet
GRP Pipeline Project
43 pages
Bioreactor Engineering Module-1 Basic Design and Construction of A Fermentor and Its Ancillaries
No ratings yet
Bioreactor Engineering Module-1 Basic Design and Construction of A Fermentor and Its Ancillaries
23 pages
NanoEdgeAI Training Material
No ratings yet
NanoEdgeAI Training Material
39 pages
2.8.3 Torque Equation of Moving-Iron Instruments
100% (1)
2.8.3 Torque Equation of Moving-Iron Instruments
3 pages
Solution Algorithm
No ratings yet
Solution Algorithm
94 pages
Aluminum Company of America Sag and Tension Data
No ratings yet
Aluminum Company of America Sag and Tension Data
1 page
Wire Rod
100% (1)
Wire Rod
2 pages
Oracle SQL Query Tuning
100% (1)
Oracle SQL Query Tuning
15 pages
Acad Acg
No ratings yet
Acad Acg
496 pages
Dymo LetraTag XR Használati Utasítás
No ratings yet
Dymo LetraTag XR Használati Utasítás
29 pages
BSSA-1B Wall Foundation Slab On Grade US
No ratings yet
BSSA-1B Wall Foundation Slab On Grade US
1 page
Dielectric Properties Analysis of Gas Turbine Synchronous Generator by Polarization and Depolarization Current Measuremen
No ratings yet
Dielectric Properties Analysis of Gas Turbine Synchronous Generator by Polarization and Depolarization Current Measuremen
4 pages
Maintenance Manual: Robolt LPM Boom
No ratings yet
Maintenance Manual: Robolt LPM Boom
26 pages
Gt24 and Gt26 Gas TurbinesGB
100% (2)
Gt24 and Gt26 Gas TurbinesGB
24 pages
Aula 21 - Irganox 1010
100% (1)
Aula 21 - Irganox 1010
2 pages