0% found this document useful (0 votes)
4 views

optimization-techniques

Uploaded by

zo63toscrib
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

optimization-techniques

Uploaded by

zo63toscrib
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Optimization Techniques:

Comprehensive Notes
Introduction to Optimization
Optimization is the process of finding the best possible solution to a problem under given circumstances. In mathematical
terms, it involves finding the minimum or maximum of an objective function subject to constraints.

Types of Optimization Problems


1. Based on Nature of Variables
Continuous Optimization
Discrete Optimization (Integer Programming)
Mixed-Integer Optimization

2. Based on Nature of Objective Function


Linear Programming (LP)
Nonlinear Programming (NLP)
Quadratic Programming (QP)
Convex Optimization
Non-convex Optimization

3. Based on Constraints
Unconstrained Optimization
Constrained Optimization
Equality Constrained
Inequality Constrained

First-Order Optimization Methods


1. Gradient Descent
Basic Algorithm
θ(t+1) = θ(t) - η∇f(θ(t))
where:
- θ: Parameters
- η: Learning rate
- ∇f: Gradient of objective function

Variants

1. Batch Gradient Descent

Uses entire dataset


More stable
Computationally expensive

2. Stochastic Gradient Descent (SGD)

Updates parameters using single sample


Faster but noisier
Formula: θ = θ - η∇f(θ; x(i), y(i))

3. Mini-batch Gradient Descent

Compromise between batch and SGD


Typical batch size: 32-512
Better convergence stability

2. Momentum-based Methods
Classical Momentum

v(t) = γv(t-1) + η∇f(θ(t))


θ(t+1) = θ(t) - v(t)
where:
- γ: Momentum coefficient
- v: Velocity vector

Nesterov Accelerated Gradient (NAG)

v(t) = γv(t-1) + η∇f(θ(t) - γv(t-1))


θ(t+1) = θ(t) - v(t)

3. Adaptive Learning Rate Methods


AdaGrad

Adapts learning rate per parameter


Accumulates squared gradients

g(t) = ∇f(θ(t))
s(t) = s(t-1) + g(t)²
θ(t+1) = θ(t) - η/√(s(t) + ε) * g(t)

RMSprop

Exponentially decaying average

s(t) = βs(t-1) + (1-β)g(t)²


θ(t+1) = θ(t) - η/√(s(t) + ε) * g(t)

Adam (Adaptive Moment Estimation)

Combines momentum and RMSprop

m(t) = β₁m(t-1) + (1-β₁)g(t)


v(t) = β₂v(t-1) + (1-β₂)g(t)²
m̂(t) = m(t)/(1-β₁ᵗ)
v̂(t) = v(t)/(1-β₂ᵗ)
θ(t+1) = θ(t) - η * m̂(t)/√(v̂(t) + ε)

Second-Order Optimization Methods


1. Newton's Method
Uses second derivatives (Hessian)

θ(t+1) = θ(t) - [H(θ(t))]⁻¹∇f(θ(t))


where H is the Hessian matrix

2. Quasi-Newton Methods
BFGS (Broyden-Fletcher-Goldfarb-Shanno)

Approximates Hessian matrix


Stores dense matrix
More memory efficient than Newton's method
L-BFGS (Limited-memory BFGS)

Stores only few vectors


More memory efficient
Suitable for large-scale problems

Constrained Optimization Techniques


1. Lagrange Multipliers
For equality constraints

L(x,λ) = f(x) + Σᵢλᵢgᵢ(x)


where:
- f(x): Objective function
- gᵢ(x): Constraint functions
- λᵢ: Lagrange multipliers

2. KKT Conditions
For inequality constraints

∇f(x*) + Σᵢλᵢ∇gᵢ(x*) = 0
gᵢ(x*) ≤ 0
λᵢ ≥ 0
λᵢgᵢ(x*) = 0

3. Penalty Methods
Convert constrained to unconstrained
Add penalty term for constraint violation

P(x) = f(x) + c*Σᵢmax(0,gᵢ(x))²


where c is penalty parameter

Global Optimization Methods


1. Simulated Annealing
Inspired by annealing in metallurgy
Probabilistic technique
Steps:
1. Generate neighbor solution
2. Accept if better
3. Accept worse solutions with probability
4. Decrease temperature

2. Genetic Algorithms
Population-based search
Components:
1. Selection
2. Crossover
3. Mutation
4. Evaluation

3. Particle Swarm Optimization


Population-based
Inspired by social behavior
Updates:
1. Position
2. Velocity
3. Personal best
4. Global best

Special Optimization Techniques


1. Linear Programming
Simplex Method
Interior Point Methods
Dual Problems

2. Dynamic Programming
Principle of optimality
Subproblem overlapping
Memoization

3. Convex Optimization
Interior Point Methods
Cutting Plane Methods
Ellipsoid Method

Practical Considerations
1. Learning Rate Selection
Fixed learning rate
Learning rate schedules
Adaptive methods
Grid search

2. Batch Size Selection


Memory constraints
Computational efficiency
Convergence stability
Parallelization

3. Initialization
Xavier/Glorot initialization
He initialization
Random initialization
Zero initialization

4. Regularization
L1 regularization
L2 regularization
Elastic net
Early stopping

Common Challenges and Solutions


1. Local Minima
Multiple restarts
Momentum methods
Stochastic methods
Population-based methods

2. Saddle Points
Second-order methods
Adding noise
Momentum methods

3. Ill-conditioning
Preconditioning
Adaptive methods
Quasi-Newton methods

4. Vanishing/Exploding Gradients
Gradient clipping
Layer normalization
Residual connections
Proper initialization

Implementation Tips
1. Code Optimization

# Example of efficient gradient computation


def compute_gradient(X, y, w):
m = len(y)
h = sigmoid(np.dot(X, w))
gradient = (1/m) * np.dot(X.T, (h - y))
return gradient

2. Monitoring Convergence

# Example of convergence monitoring


def check_convergence(loss_history, tol=1e-6):
if len(loss_history) < 2:
return False
return abs(loss_history[-1] - loss_history[-2]) < tol

Advanced Topics
1. Multi-objective Optimization
Pareto optimality
Weighted sum method
ε-constraint method
Goal programming

2. Online Optimization
Online learning
Regret minimization
Bandit algorithms

3. Distributed Optimization
Parameter server
AllReduce
Asynchronous SGD
Model averaging

Best Practices
1. Problem Analysis

Understand problem structure


Identify constraints
Choose appropriate method

2. Implementation

Start simple
Monitor convergence
Use proper validation
Implement early stopping

3. Tuning

Grid/random search
Bayesian optimization
Cross-validation
Ensemble methods

Conclusion
Success in optimization requires:

1. Understanding of problem structure


2. Proper method selection
3. Careful implementation
4. Proper monitoring and tuning
5. Consideration of computational resources

You might also like