0% found this document useful (0 votes)
41 views3 pages

Sheet 3 Sol 3

Uploaded by

Ahmed Elkerdawy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views3 pages

Sheet 3 Sol 3

Uploaded by

Ahmed Elkerdawy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

1.

What is the purpose of gradient descent optimization in


machine learning?
- used to minimize a loss function by iteratively moving towards the minimum of the function.
- It calculates the gradient (or slope) of the function at a point and updates the parameters to find
the optimal values.

2. What is the learning rate in gradient descent, and why is


it important?
- controls the size of the steps taken during the gradient descent updates

- It is important because it affects how fast or slow the optimization algorithm works

 A small learning rate makes the convergence slow but precise.


 A large learning rate might speed up the process but can cause overshooting, making it harder to converge to the
minimum.

3. What is the role of batch size in gradient descent


optimization?
- The batch size determines how many training examples are used to compute the gradient .

- Small batch sizes can converge faster but are more noisy and less stable

- Large batch sizes can converge slower but are more accurate and stable
- There are three common variants:

 Batch Gradient Descent: Uses the entire dataset.


 Stochastic Gradient Descent (SGD): Uses one training example at each step.
 Mini-batch Gradient Descent: Uses a subset of the dataset.

4. Consider the quadratic function


f(𝑥, 𝑦) = x 2 + 3𝑦2 + 2𝑥𝑦 − 6𝑥 − 9𝑦 + 5. Use gradient descent to
find the minimum value of the function. Start with an initial guess for x,
y and update it iteratively using the gradient descent algorithm.
Given conditions:
- Initial guess: (1,1) - Learning rate: 𝛼 = .1
- Number of iterations: 3
- Partial derivative of f(x,y) with respect to x: 2x+2y−6

- Partial derivative of f(x,y) with respect to y: 6y+2x-9

The gradient vector is: ∇ f(x,y) = ( 2x+2y−6 , 6y+2x−9 )

Gradient descent updates using the following rules:

at (x,y)=(1,1), α= 0.1, and 3 iterations

Iteration 1 Iteration 2 Iteration 3


∇f(x,y) ∇f(1,1) = (−2 , −1) ∇f(1.2 , 1.1) = (−1.4 , .04) ∇f(1.34 , 1.096) = (−1.128,0.232)
Xnew 1 − 0.1(−2) = 1.2 1.2 - .1(-1.4) = 1.34 1.2 - .1(-1.4) = 1.4528
Ynew 1 − 0.1(−1) = 1.1 1.1 - .1(0.04) = 1.096 1.1 - .1(0.04) = 1.0728

∇f(1.4528 , 1.0728) = 0.4705

You might also like