Gradient Descent 5 Part 2
Gradient Descent 5 Part 2
NETWORKS
Key Points
Cost function:
It is a function that measures the performance of a model for any given data. Cost Function quantifies the
error between predicted values and expected values and presents it in the form of a single real number.
After making a hypothesis with initial parameters, we calculate the Cost function. And with a goal to reduce
the cost function, we modify the parameters by using the Gradient descent algorithm over the given data.
Steps of Gradient Descent
● The algorithm starts with an initial set of parameters and updates them in small steps to minimize the cost
function.
● In each iteration of the algorithm, the gradient of the cost function with respect to each parameter is computed.
● The gradient tells us the direction of the steepest ascent, and by moving in the opposite direction, we can find
the direction of the steepest descent.
● The size of the step is controlled by the learning rate, which determines how quickly the algorithm moves
towards the minimum.
● The process is repeated until the cost function converges to a minimum, indicating that the model has reached
the optimal set of parameters.
x y
Example
1 2
2 4
3 6
4 8
x y w y-pred MSE Derivative
(y-y-pred)2 of MSE
1 2
2 4
3 6
4 8
x y w y-pred MSE Derivative
(y-y-pred)2 of MSE
1 2
2 4
3 6
4 8
Online Learning
Entire sample is not given to network, but we are given instances one by one and would like the
network to update its parameters after each instance, adapting itself slowly in time.
Advantages
1. It saves us the cost of storing the training sample in an external memory and storing the
intermediate results during optimization.
2. The problem may be changing in time, which means that the sample distribution is not fixed,
and a training set cannot be chosen a priori. For example, we may be implementing a speech
recognition system that adapts itself to its user.
3. There may be physical changes in the system. For example, in a robotic system, the
components of the system may wear out, or sensors may degrade.
Stochastic Gradient Descent
Ways of Updating Weights