LecML -3 NN
LecML -3 NN
Equivalent to
dimensions of 𝑥
Why?
• Activation functions like sigmoid and tanh squash values into small ranges.
• Their derivatives are also small (<= 0.25 for sigmoid), leading to small gradients.
• This shrinks the gradient exponentially as it moves backward.
Solutions:
• Use ReLU (Rectified Linear Unit) (ReLU has gradients of 1 for positive inputs).
• Use Batch Normalization to stabilize activations.
• Better weight initialization (Xavier- sigmoid & tanh- similar variance of outputs across
layers /He initialization- ReLU Accounts for the non-zero mean and asymmetry).
Why?
• Happens in very deep networks with large weight updates.
• Poor weight initialization or using large learning rates.
Solutions:
• Use Gradient Clipping (cap gradients at a threshold).
• Xavier/He weight initialization.
• Reduce the learning rate.
Dr. Devesh Bhimsaria 28
Slide credit: Online
Slow Convergence
What?
• Training takes too long due to small weight updates.
• Learning stagnates, especially in deep networks.
Solutions:
• Use optimizers like Adam, RMSprop, or Momentum.
• Use a learning rate scheduler (reduce learning rate when needed).
• Use Batch Normalization to speed up training.
Solutions:
• Use optimizers like Adam, Momentum, or RMSprop, which
help escape local minima.
• Increase network capacity (more neurons/layers) to create
a smoother loss surface.
• Train longer or use learning rate annealing (manual).
Why?
• Too many parameters compared to the training data.
• No regularization used (e.g., dropout, L2 regularization).
• Too many training epochs.
Solutions:
• Use Dropout (randomly disable neurons during training).
• Use L2 Regularization (Weight Decay) to prevent large
weights.
• Increase training data or use data augmentation.
Why?
• The network is too shallow.
• Not enough neurons in hidden layers.
• High regularization making the model too restrictive.
Solutions:
• Increase the number of layers or neurons in hidden layers.
• Reduce regularization strength (e.g., lower L2 penalty).
• Train for more epochs.