Backpropagation in Neural Network
Last Updated :
01 Jul, 2025
Back Propagation is also known as "Backward Propagation of Errors" is a method used to train neural network . Its goal is to reduce the difference between the model’s predicted output and the actual output by adjusting the weights and biases in the network.
It works iteratively to adjust weights and bias to minimize the cost function. In each epoch the model adapts these parameters by reducing loss by following the error gradient. It often uses optimization algorithms like gradient descent or stochastic gradient descent. The algorithm computes the gradient using the chain rule from calculus allowing it to effectively navigate complex layers in the neural network to minimize the cost function.
Fig(a) A simple illustration of how the backpropagation works by adjustments of weightsBack Propagation plays a critical role in how neural networks improve over time. Here's why:
- Efficient Weight Update: It computes the gradient of the loss function with respect to each weight using the chain rule making it possible to update weights efficiently.
- Scalability: The Back Propagation algorithm scales well to networks with multiple layers and complex architectures making deep learning feasible.
- Automated Learning: With Back Propagation the learning process becomes automated and the model can adjust itself to optimize its performance.
Working of Back Propagation Algorithm
The Back Propagation algorithm involves two main steps: the Forward Pass and the Backward Pass.
1. Forward Pass Work
In forward pass the input data is fed into the input layer. These inputs combined with their respective weights are passed to hidden layers. For example in a network with two hidden layers (h1 and h2) the output from h1 serves as the input to h2. Before applying an activation function, a bias is added to the weighted inputs.
Each hidden layer computes the weighted sum (`a`) of the inputs then applies an activation function like ReLU (Rectified Linear Unit) to obtain the output (`o`). The output is passed to the next layer where an activation function such as softmax converts the weighted outputs into probabilities for classification.
The forward pass using weights and biases2. Backward Pass
In the backward pass the error (the difference between the predicted and actual output) is propagated back through the network to adjust the weights and biases. One common method for error calculation is the Mean Squared Error (MSE) given by:
\text{MSE} = (\text{Predicted Output} - \text{Actual Output})^2
Once the error is calculated the network adjusts weights using gradients which are computed with the chain rule. These gradients indicate how much each weight and bias should be adjusted to minimize the error in the next iteration. The backward pass continues layer by layer ensuring that the network learns and improves its performance. The activation function through its derivative plays a crucial role in computing these gradients during Back Propagation.
Example of Back Propagation in Machine Learning
Let’s walk through an example of Back Propagation in machine learning. Assume the neurons use the sigmoid activation function for the forward and backward pass. The target output is 0.5 and the learning rate is 1.
Example (1) of backpropagation sumForward Propagation
1. Initial Calculation
The weighted sum at each node is calculated using:
a j =∑(w i ,j∗x i )
Where,
- a_j is the weighted sum of all the inputs and weights at each node
- w_{i,j} represents the weights between the i^{th}input and the j^{th} neuron
- x_i represents the value of the i^{th} input
O
(output): After applying the activation function to a,
we get the output of the neuron:
o_j = activation function(a_j )
2. Sigmoid Function
The sigmoid function returns a value between 0 and 1, introducing non-linearity into the model.
y_j = \frac{1}{1+e^{-a_j}}
To find the outputs of y3, y4 and y53. Computing Outputs
At h1 node
\begin {aligned}a_1 &= (w_{1,1} x_1) + (w_{2,1} x_2) \\& = (0.2 * 0.35) + (0.2* 0.7)\\&= 0.21\end {aligned}
Once we calculated the a1 value, we can now proceed to find the y3 value:
y_j= F(a_j) = \frac 1 {1+e^{-a_1}}
y_3 = F(0.21) = \frac 1 {1+e^{-0.21}}
y_3 = 0.56
Similarly find the values of y4 at h2 and y5 at O3
a_2 = (w_{1,2} * x_1) + (w_{2,2} * x_2) = (0.3*0.35)+(0.3*0.7)=0.315
y_4 = F(0.315) = \frac 1{1+e^{-0.315}}
a3 = (w_{1,3}*y_3)+(w_{2,3}*y_4) =(0.3*0.57)+(0.9*0.59) =0.702
y_5 = F(0.702) = \frac 1 {1+e^{-0.702} } = 0.67
Values of y3, y4 and y54. Error Calculation
Our actual output is 0.5 but we obtained 0.67. To calculate the error we can use the below formula:
Error_j= y_{target} - y_5
\text{=>} \space 0.5 - 0.67 = -0.17
Using this error value we will be backpropagating.
Back Propagation
1. Calculating Gradients
The change in each weight is calculated as:
\Delta w_{ij} = \eta \times \delta_j \times O_j
Where:
- \delta_j is the error term for each unit,
- \eta is the learning rate.
2. Output Unit Error
For O3:
\delta_5 = y_5(1-y_5) (y_{target} - y_5)
= 0.67(1-0.67)(-0.17) = -0.0376
3. Hidden Unit Error
For h1:
\delta_3 = y_3 (1-y_3)(w_{1,3} \times \delta_5)
= 0.56(1-0.56)(0.3 \times -0.0376) = -0.0027
For h2:
\delta_4 = y_4(1-y_4)(w_{2,3} \times \delta_5)
=0.59 (1-0.59)(0.9 \times -0.0376) = -0.0819
4. Weight Updates
For the weights from hidden to output layer:
\Delta w_{2,3} = 1 \times (-0.0376) \times 0.59 = -0.022184
New weight:
w_{2,3}(\text{new}) = -0.022184 + 0.9 = 0.877816
For weights from input to hidden layer:
\Delta w_{1,1} = 1 \times (-0.0027) \times 0.35 = 0.000945
New weight:
w_{1,1}(\text{new}) = 0.000945 + 0.2 = 0.200945
Similarly other weights are updated:
- w_{1,2}(\text{new}) = 0.273225
- w_{1,3}(\text{new}) = 0.086615
- w_{2,1}(\text{new}) = 0.269445
- w_{2,2}(\text{new}) = 0.18534
The updated weights are illustrated below
Through backward pass the weights are updatedAfter updating the weights the forward pass is repeated yielding:
- y_3 = 0.57
- y_4 = 0.56
- y_5 = 0.61
Since y_5 = 0.61 is still not the target output the process of calculating the error and backpropagating continues until the desired output is reached.
This process demonstrates how Back Propagation iteratively updates weights by minimizing errors until the network accurately predicts the output.
Error = y_{target} - y_5
= 0.5 - 0.61 = -0.11
This process is said to be continued until the actual output is gained by the neural network.
Back Propagation Implementation in Python for XOR Problem
This code demonstrates how Back Propagation is used in a neural network to solve the XOR problem. The neural network consists of:
1. Defining Neural Network
We define a neural network as Input layer with 2 inputs, Hidden layer with 4 neurons, Output layer with 1 output neuron and use Sigmoid function as activation function.
- self.input_size = input_size: stores the size of the input layer
- self.hidden_size = hidden_size: stores the size of the hidden layer
- self.weights_input_hidden = np.random.randn(self.input_size, self.hidden_size): initializes weights for input to hidden layer
- self.weights_hidden_output = np.random.randn(self.hidden_size, self.output_size): initializes weights for hidden to output layer
- self.bias_hidden = np.zeros((1, self.hidden_size)): initializes bias for hidden layer
- self.bias_output = np.zeros((1, self.output_size)): initializes bias for output layer
Python
import numpy as np
class NeuralNetwork:
def __init__(self, input_size, hidden_size, output_size):
self.input_size = input_size
self.hidden_size = hidden_size
self.output_size = output_size
self.weights_input_hidden = np.random.randn(
self.input_size, self.hidden_size)
self.weights_hidden_output = np.random.randn(
self.hidden_size, self.output_size)
self.bias_hidden = np.zeros((1, self.hidden_size))
self.bias_output = np.zeros((1, self.output_size))
def sigmoid(self, x):
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(self, x):
return x * (1 - x)
2. Defining Feed Forward Network
In Forward pass inputs are passed through the network activating the hidden and output layers using the sigmoid function.
- self.hidden_activation = np.dot(X, self.weights_input_hidden) + self.bias_hidden: calculates activation for hidden layer
- self.hidden_output= self.sigmoid(self.hidden_activation): applies activation function to hidden layer
- self.output_activation= np.dot(self.hidden_output, self.weights_hidden_output) + self.bias_output: calculates activation for output layer
- self.predicted_output = self.sigmoid(self.output_activation): applies activation function to output layer
Python
def feedforward(self, X):
self.hidden_activation = np.dot(
X, self.weights_input_hidden) + self.bias_hidden
self.hidden_output = self.sigmoid(self.hidden_activation)
self.output_activation = np.dot(
self.hidden_output, self.weights_hidden_output) + self.bias_output
self.predicted_output = self.sigmoid(self.output_activation)
return self.predicted_output
3. Defining Backward Network
In Backward pass or Back Propagation the errors between the predicted and actual outputs are computed. The gradients are calculated using the derivative of the sigmoid function and weights and biases are updated accordingly.
- output_error = y - self.predicted_output: calculates the error at the output layer
- output_delta = output_error * self.sigmoid_derivative(self.predicted_output): calculates the delta for the output layer
- hidden_error = np.dot(output_delta, self.weights_hidden_output.T): calculates the error at the hidden layer
- hidden_delta = hidden_error * self.sigmoid_derivative(self.hidden_output): calculates the delta for the hidden layer
- self.weights_hidden_output += np.dot(self.hidden_output.T, output_delta) * learning_rate: updates weights between hidden and output layers
- self.weights_input_hidden += np.dot(X.T, hidden_delta) * learning_rate: updates weights between input and hidden layers
Python
def backward(self, X, y, learning_rate):
output_error = y - self.predicted_output
output_delta = output_error * \
self.sigmoid_derivative(self.predicted_output)
hidden_error = np.dot(output_delta, self.weights_hidden_output.T)
hidden_delta = hidden_error * self.sigmoid_derivative(self.hidden_output)
self.weights_hidden_output += np.dot(self.hidden_output.T,
output_delta) * learning_rate
self.bias_output += np.sum(output_delta, axis=0,
keepdims=True) * learning_rate
self.weights_input_hidden += np.dot(X.T, hidden_delta) * learning_rate
self.bias_hidden += np.sum(hidden_delta, axis=0,
keepdims=True) * learning_rate
4. Training Network
The network is trained over 10,000 epochs using the Back Propagation algorithm with a learning rate of 0.1 progressively reducing the error.
- output = self.feedforward(X): computes the output for the current inputs
- self.backward(X, y, learning_rate): updates weights and biases using Back Propagation
- loss = np.mean(np.square(y - output)): calculates the mean squared error (MSE) loss
Python
def train(self, X, y, epochs, learning_rate):
for epoch in range(epochs):
output = self.feedforward(X)
self.backward(X, y, learning_rate)
if epoch % 4000 == 0:
loss = np.mean(np.square(y - output))
print(f"Epoch {epoch}, Loss:{loss}")
5. Testing Neural Network
- X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]): defines the input data
- y = np.array([[0], [1], [1], [0]]): defines the target values
- nn = NeuralNetwork(input_size=2, hidden_size=4, output_size=1): initializes the neural network
- nn.train(X, y, epochs=10000, learning_rate=0.1): trains the network
- output = nn.feedforward(X): gets the final predictions after training
Python
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])
nn = NeuralNetwork(input_size=2, hidden_size=4, output_size=1)
nn.train(X, y, epochs=10000, learning_rate=0.1)
output = nn.feedforward(X)
print("Predictions after training:")
print(output)
Output:
Trained Model- The output shows the training progress of a neural network over 10,000 epochs. Initially the loss was high (0.2713) but it gradually decreased as the network learned reaching a low value of 0.0066 by epoch 8000.
- The final predictions are close to the expected XOR outputs: approximately 0 for [0, 0] and [1, 1] and approximately 1 for [0, 1] and [1, 0] indicating that the network successfully learned to approximate the XOR function.
Advantages of Back Propagation for Neural Network Training
The key benefits of using the Back Propagation algorithm are:
- Ease of Implementation: Back Propagation is beginner-friendly requiring no prior neural network knowledge and simplifies programming by adjusting weights with error derivatives.
- Simplicity and Flexibility: Its straightforward design suits a range of tasks from basic feedforward to complex convolutional or recurrent networks.
- Efficiency: Back Propagation accelerates learning by directly updating weights based on error especially in deep networks.
- Generalization: It helps models generalize well to new data improving prediction accuracy on unseen examples.
- Scalability: The algorithm scales efficiently with larger datasets and more complex networks making it ideal for large-scale tasks.
Challenges with Back Propagation
While Back Propagation is useful it does face some challenges:
- Vanishing Gradient Problem: In deep networks the gradients can become very small during Back Propagation making it difficult for the network to learn. This is common when using activation functions like sigmoid or tanh.
- Exploding Gradients: The gradients can also become excessively large causing the network to diverge during training.
- Overfitting: If the network is too complex it might memorize the training data instead of learning general patterns.
Similar Reads
Deep Learning Tutorial Deep Learning is a subset of Artificial Intelligence (AI) that helps machines to learn from large datasets using multi-layered neural networks. It automatically finds patterns and makes predictions and eliminates the need for manual feature extraction. Deep Learning tutorial covers the basics to adv
5 min read
Deep Learning Basics
Introduction to Deep LearningDeep Learning is transforming the way machines understand, learn and interact with complex data. Deep learning mimics neural networks of the human brain, it enables computers to autonomously uncover patterns and make informed decisions from vast amounts of unstructured data. How Deep Learning Works?
7 min read
Artificial intelligence vs Machine Learning vs Deep LearningNowadays many misconceptions are there related to the words machine learning, deep learning, and artificial intelligence (AI), most people think all these things are the same whenever they hear the word AI, they directly relate that word to machine learning or vice versa, well yes, these things are
4 min read
Deep Learning Examples: Practical Applications in Real LifeIn the field of Artificial Intelligence (AI), deep learning stands out as a revolutionary technology for far-reaching applications. Deep learning, based on neurons driven by the human brain, has revolutionized a variety of fields, from health to finance to finance. Its ability to process large amoun
5 min read
Challenges in Deep LearningDeep learning, a branch of artificial intelligence, uses neural networks to analyze and learn from large datasets. It powers advancements in image recognition, natural language processing, and autonomous systems. Despite its impressive capabilities, deep learning is not without its challenges. It in
7 min read
Why Deep Learning is ImportantDeep learning has emerged as one of the most transformative technologies of our time, revolutionizing numerous fields from computer vision to natural language processing. Its significance extends far beyond just improving predictive accuracy; it has reshaped entire industries and opened up new possi
5 min read
Neural Networks Basics
What is a Neural Network?Neural networks are machine learning models that mimic the complex functions of the human brain. These models consist of interconnected nodes or neurons that process data, learn patterns, and enable tasks such as pattern recognition and decision-making.In this article, we will explore the fundamenta
14 min read
Types of Neural NetworksNeural networks are computational models that mimic the way biological neural networks in the human brain process information. They consist of layers of neurons that transform the input data into meaningful outputs through a series of mathematical operations. In this article, we are going to explore
6 min read
Layers in Artificial Neural Networks (ANN)In Artificial Neural Networks (ANNs), data flows from the input layer to the output layer through one or more hidden layers. Each layer consists of neurons that receive input, process it, and pass the output to the next layer. The layers work together to extract features, transform data, and make pr
4 min read
Activation functions in Neural NetworksWhile building a neural network, one key decision is selecting the Activation Function for both the hidden layer and the output layer. It is a mathematical function applied to the output of a neuron. It introduces non-linearity into the model, allowing the network to learn and represent complex patt
8 min read
Feedforward Neural NetworkFeedforward Neural Network (FNN) is a type of artificial neural network in which information flows in a single directionâfrom the input layer through hidden layers to the output layerâwithout loops or feedback. It is mainly used for pattern recognition tasks like image and speech classification.For
6 min read
Backpropagation in Neural NetworkBack Propagation is also known as "Backward Propagation of Errors" is a method used to train neural network . Its goal is to reduce the difference between the modelâs predicted output and the actual output by adjusting the weights and biases in the network.It works iteratively to adjust weights and
9 min read
Deep Learning Models
Deep Learning Frameworks
TensorFlow TutorialTensorFlow is an open-source machine-learning framework developed by Google. It is written in Python, making it accessible and easy to understand. It is designed to build and train machine learning (ML) and deep learning models. It is highly scalable for both research and production.It supports CPUs
2 min read
Keras TutorialKeras high-level neural networks APIs that provide easy and efficient design and training of deep learning models. It is built on top of powerful frameworks like TensorFlow, making it both highly flexible and accessible. Keras has a simple and user-friendly interface, making it ideal for both beginn
3 min read
PyTorch TutorialPyTorch is an open-source deep learning framework designed to simplify the process of building neural networks and machine learning models. With its dynamic computation graph, PyTorch allows developers to modify the networkâs behavior in real-time, making it an excellent choice for both beginners an
7 min read
Caffe : Deep Learning FrameworkCaffe (Convolutional Architecture for Fast Feature Embedding) is an open-source deep learning framework developed by the Berkeley Vision and Learning Center (BVLC) to assist developers in creating, training, testing, and deploying deep neural networks. It provides a valuable medium for enhancing com
8 min read
Apache MXNet: The Scalable and Flexible Deep Learning FrameworkIn the ever-evolving landscape of artificial intelligence and deep learning, selecting the right framework for building and deploying models is crucial for performance, scalability, and ease of development. Apache MXNet, an open-source deep learning framework, stands out by offering flexibility, sca
6 min read
Theano in PythonTheano is a Python library that allows us to evaluate mathematical operations including multi-dimensional arrays efficiently. It is mostly used in building Deep Learning Projects. Theano works way faster on the Graphics Processing Unit (GPU) rather than on the CPU. This article will help you to unde
4 min read
Model Evaluation
Deep Learning Projects