0% found this document useful (0 votes)
13 views

ML UNIT 2

This document provides an overview of neural networks, including their biological motivation, structure, and functioning. It discusses key components such as perceptrons, feedforward networks, activation functions, and the backpropagation algorithm used for training. The document also highlights the differences between single-layer and multi-layer perceptrons, along with their advantages and disadvantages.

Uploaded by

syb9721
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

ML UNIT 2

This document provides an overview of neural networks, including their biological motivation, structure, and functioning. It discusses key components such as perceptrons, feedforward networks, activation functions, and the backpropagation algorithm used for training. The document also highlights the differences between single-layer and multi-layer perceptrons, along with their advantages and disadvantages.

Uploaded by

syb9721
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

MODULE II

NEURAL NETWORKS

Biological Motivation for Neural Network – Neural Network Representation -


Perceptron – Feed Forward Neural Networks – Multilayer Networks and Back Propagation
Algorithms – Convergence and Local Minima – Hidden Layer Representation in Back
Propagation – Remarks on the Back Propagation Algorithm.

Neural Network

 A Neural Network is a system designed to operate like a human brain. Human information processing
takes place through the interaction of many billions of neurons connected to each other sending
signals to other neurons.
 Similarly, a Neural Network is a network of artificial neurons, as found in human brains, for solving
artificial intelligence problems such as image identification. They may be a physical device or
mathematical constructs.
 In other words, Artificial Neural Network is a parallel computational system consisting of many simple
processing elements connected to perform a particular task.

Biological Motivation for Neural Network

 Motivation behind neural network is human brain. Human brain is called as the best processor
even though it works slower than other computers.

 Many researchers thought to make a machine that would work in the prospective of the human
brain.

 Human brain contains billion of neurons which are connected to many other neurons to form a
network so that if it sees any image, it recognizes the image and processes the output.
 Dendrite receives signals from other neurons.
 Cell body sums the incoming signals to generate input.
 When the sum reaches a threshold value, neuron fires and the signal travels down the axon to the
other neurons.
 The amount of signal transmitted depend upon the strength of the connections.
 Connections can be inhibitory, i.e. decreasing strength or excitatory, i.e. increasing strength in nature.

In the similar manner, it was thought to make artificial interconnected neurons like biological
neurons making up an Artificial Neural Network (ANN). Each biological neuron is capable of
taking a number of inputs and produce output.

Neurons in human brain are capable of making very complex decisions, so this means they run
many parallel processes for a particular task. One motivation for ANN is that to work for a
particular task identification through many parallel processes.

Neural Network Representation

Artificial Neuron

Artificial Neuron are also called as perceptrons. This consist of the following basic terms:

 Input
 Weight
 Bias
 Activation Function
 Output

How perceptron works?


A. All the inputs X1, X2, X3,…., Xn multiplies with their respective weights.

B. All the multiplied values are added.

C. Sum of the values are applied to the activation function.

Weights and Bias

 Weights W1, W2, W3,…., Wn shows the strength of a neuron.


 Bias allows you to change/vary the curve of the activation curve.
 By varying the bias b, you can change when the node activates. Without a bias, you cannot vary
the output.
Input layer, Hidden layer and Output layer

Input Layer
Input layer contains inputs and weights. Example: X1, W1, etc.
Hidden Layer
In a neural network, there can be more than one hidden layer. Hidden layer contains the
summation and activation function.
Output Layer
Output layer consists the set of results generated by the previous layer. It also contains the
desired value, i.e. values that are already present in the output layer to check with the values
generated by the previous layer. It may be also used to improve the end results.
Let’s understand with an example.
Suppose you want to go to a food shop. Based on the three factors you will decide whether to go
out or not, i.e.

 Weather is good or not, i.e. X1. Say X1=1 for good weather and X1=0 for bad weather.
 You have vehicle available or not, i.e. X2. Say X2=1 for vehicle available and X2=0 for not having
vehicle.
 You have money or not, i.e. X3. Say X3=1 for having money and X3=0 for not having money.

Based on the conditions, you choose weight on each condition like W1=6 for money as money is
the first important thing you must have, W2=2 for vehicle and W3=2 for weather and say you
have set threshold to 5.
In this way, perceptron makes decision making model by calculating X1W1, X2W2, and X3W3
and comparing these values to the desired output.

Activation Function

Activation functions are used for non-linear complex functional mappings between the inputs and
required variable. They introduce non-linear properties to our Network.
They convert an input of an artificial neuron to output. That output signal now is used as input in
the next layer.

Simply, input between the required values like (0, 1) or (-1, 1) are mapped with the activation
function.
Activation Function helps to solve the complex non-linear model. Without activation function,
output signal will just be a linear function and your neural network will not be able to learn
complex data such as audio, image, speech, etc.
Some commonly used activation functions are:

 Sigmoid or Logistic
 Tanh — Hyperbolic tangent

Sigmoid Activation Function:

Sigmoid Activation Function can be represented as:


f(x) = 1 / 1 + exp(-x)
 Range of sigmoid function is between 0 and 1.
 It has some disadvantages like slow convergence, vanishing gradient problem or it kill
gradient, etc. Output of Sigmoid is not zero centered that makes its gradient to go in different
directions.

Tanh- Hyperbolic tangent

Tanh can be represented as:


f(x) = 1 — exp(-2x) / 1 + exp(-2x)

It solves the problem occurring with Sigmoid function. Output of Tanh is zero centered because
range is between -1 and 1.
Optimization is easy as compared to Sigmoid function.
PERCEPTRON
A perceptron is a binary classification algorithm modeled after the functioning of the human brain—it was intended
to emulate the neuron. The perceptron, while it has a simple structure, has the ability to learn and solve very
complex problems. In Machine Learning, Perceptron is considered as a single-layer neural network that consists of
four main parameters named input values, weights and Bias, net sum, and an activation function. The perceptron
model begins with the multiplication of all input values and their weights, then adds these values together to create
the weighted sum. Then this weighted sum is applied to the activation function 'f' to obtain the desired output. This
activation function is also known as the step function and is represented by 'f'.

This step function or Activation function plays a vital role in ensuring that output is mapped between required
values (0,1) or (-1,1). It is important to note that the weight of input is indicative of the strength of a node. Similarly,
an input's bias value gives the ability to shift the activation function curve up or down.

Perceptron model works in two important steps as follows:

Step-1

In the first step first, multiply all input values with corresponding weight values and then add them to determine
the weighted sum. Mathematically, we can calculate the weighted sum as follows:

∑wi*xi = x1*w1 + x2*w2 +…wn*xn

Add a special term called bias 'b' to this weighted sum to improve the model's performance.

∑wi*xi + b

Step-2

In the second step, an activation function is applied with the above-mentioned weighted sum, which gives us
output either in binary form or a continuous value as follows:

Y = f(∑wi*xi + b)

Types of Perceptron Models


Based on the layers, Perceptron models are divided into two types. These are as follows:

1. Single-layer Perceptron Model

2. Multi-layer Perceptron model

Single Layer Perceptron Model:


This is one of the easiest Artificial neural networks (ANN) types. A single-layered perceptron model consists feed-
forward network and also includes a threshold transfer function inside the model. The main objective of the single-
layer perceptron model is to analyze the linearly separable objects with binary outcomes.

In a single layer perceptron model, its algorithms do not contain recorded data, so it begins with inconstantly
allocated input for weight parameters. Further, it sums up all inputs (weight). After adding all inputs, if the total
sum of all inputs is more than a pre-determined value, the model gets activated and shows the output value as +1.

If the outcome is same as pre-determined or threshold value, then the performance of this model is stated as
satisfied, and weight demand does not change. However, this model consists of a few discrepancies triggered when
multiple weight inputs values are fed into the model. Hence, to find desired output and minimize errors, some
changes should be necessary for the weights input.

"Single-layer perceptron can learn only linearly separable patterns."

Multi-Layered Perceptron Model:


Like a single-layer perceptron model, a multi-layer perceptron model also has the same model structure but has a
greater number of hidden layers.

The multi-layer perceptron model is also known as the Backpropagation algorithm, which executes in two stages as
follows:

o Forward Stage: Activation functions start from the input layer in the forward stage and terminate on the
output layer.

o Backward Stage: In the backward stage, weight and bias values are modified as per the model's
requirement. In this stage, the error between actual output and demanded originated backward on the
output layer and ended on the input layer.

Hence, a multi-layered perceptron model has considered as multiple artificial neural networks having various layers
in which activation function does not remain linear, similar to a single layer perceptron model. Instead of linear,
activation function can be executed as sigmoid, TanH, ReLU, etc., for deployment.

A multi-layer perceptron model has greater processing power and can process linear and non-linear patterns.
Further, it can also implement logic gates such as AND, OR, XOR, NAND, NOT, XNOR, NOR.

Advantages of Multi-Layer Perceptron:


o A multi-layered perceptron model can be used to solve complex non-linear problems.

o It works well with both small and large input data.

o It helps us to obtain quick predictions after the training.

o It helps to obtain the same accuracy ratio with large as well as small data.

Disadvantages of Multi-Layer Perceptron:


o In Multi-layer perceptron, computations are difficult and time-consuming.

o In multi-layer Perceptron, it is difficult to predict how much the dependent variable affects each
independent variable.

o The model functioning depends on the quality of the training.

Perceptron Function
Perceptron function ''f(x)'' can be achieved as output by multiplying the input 'x' with the learned weight coefficient
'w'.
Mathematically, we can express it as follows:

f(x)=1; if w.x+b>0

otherwise, f(x)=0

o 'w' represents real-valued weights vector

o 'b' represents the bias

o 'x' represents a vector of input x values.

Characteristics of Perceptron
The perceptron model has the following characteristics.

1. Perceptron is a machine learning algorithm for supervised learning of binary classifiers.

2. In Perceptron, the weight coefficient is automatically learned.

3. Initially, weights are multiplied with input features, and the decision is made whether the neuron is fired or
not.

4. The activation function applies a step rule to check whether the weight function is greater than zero.

5. The linear decision boundary is drawn, enabling the distinction between the two linearly separable classes
+1 and -1.

6. If the added sum of all input values is more than the threshold value, it must have an output signal;
otherwise, no output will be shown.

Feed Forward Neural Network

A feedforward neural network is an artificial neural network in which nodes’ connections do not form a loop. Often
referred to as a multi-layered network of neurons, feedforward neural networks are so named because all
information flows forward only.

Data enters the input nodes, travels through the hidden layers, and exits the output nodes. The network lacks links,
allowing the information leaving the output node to be sent back into the network.

The purpose of feedforward neural networks is to approximate functions.

A classifier uses the formula y = f* (x).

This assigns the value of input x to the category y.

The feedfоrwаrd netwоrk will mар y = f (x; θ). It then memorizes the value of θ that most closely approximates the
function.

Types of Neural Network’s Layers


The following are the components of a feedforward neural network:

Layer of input

It contains the neurons that receive input. The data is subsequently passed on to the next tier. The input layer’s
total number of neurons equals the number of variables in the dataset.
Hidden layer

This is the intermediate layer, which is concealed between the input and output layers. It has many neurons that
alter the inputs and then communicate with the output layer.

Output layer

It is the last layer and depends on the model’s construction. The output layer is the expected feature, as you know
the desired outcome.

Neurons weights
Weights describe the strength of a connection between neurons. A weight’s value ranges from 0 to 1.

Cost Function in Feedforward Neural Network


The cost function is an important factor of a feedforward neural network. Generally, minor adjustments to weights
and biases have little effect on the categorized data points. Thus, a method for improving performance can be
determined by making minor adjustments to weights and biases using a smooth cost function.

The mean square error cost function is defined as follows:

Where,

w = weights collected in the network

b = biases

a = output vectors

x = input

‖v‖ = usual length of vector v

Loss Function in Feedforward Neural Network

The cross-entropy loss associated with multi-class categorization is as follows:


Gradient Learning Algorithm
The Gradient Descent Algorithm repeatedly calculates the next point using gradient at the current location, then
scales it (by a learning rate) and subtracts the achieved value from the current position (makes a step) (makes a
step). It subtracts the value since we want to decrease the function (to increase it would be adding) (and to
maximize it would be adding). This procedure may be written as:

There’s a crucial parameter η, which adjusts the gradient and hence affects the step size. In machine learning, it is
termed learning rate and substantially affects performance.

 The smaller the learning rate, the longer GD converges or may approach maximum iteration before finding
the optimal point

 If the learning rate is too great, the algorithm may not converge to the ideal point (jump around) or diverge
altogether.

The Gradient Descent method’s steps are:

1. Pick a beginning point (initialization)

2. Compute the gradient at this spot

3. Produce a scaled step in the opposite direction to the gradient (objective: minimize) (objective: minimize)

4. Repeat points 2 and 3 until one of the conditions is met:

 maximum number of repetitions reached

 step size is smaller than the tolerance.

The following is an example of how to construct the Gradient Descent algorithm (with steps tracking):

The function accepts the following five parameters:

1. Starting point: In our example, we specify it manually, but it is frequently determined randomly.

2. Gradient function – must be defined in advance

3. Learning rate – factor used to scale step sizes

4. Maximum iterations

5. Tolerance for the algorithm to be stopped on a conditional basis (in this case, a default value is 0.01)
Multilayer and Back Propagation Algorithm

 In machine learning, backpropagation is an effective algorithm used to train artificial neural networks,
especially in feed-forward neural networks.

 Backpropagation is an iterative algorithm, that helps to minimize the cost function by determining which
weights and biases should be adjusted. During every epoch, the model learns by adapting the weights and
biases to minimize the loss by moving down toward the gradient of the error. Thus, it involves the two most
popular optimization algorithms, such as gradient descent or stochastic gradient descent.

 Computing the gradient in the backpropagation algorithm helps to minimize the cost function and it can be
implemented by using the mathematical rule called chain rule from calculus to navigate through complex
layers of the neural network.

fig(a) A simple illustration of how the backpropagation works by adjustments of weights

Advantages of Using the Backpropagation Algorithm in Neural Networks

Backpropagation, a fundamental algorithm in training neural networks, offers several advantages that make it a
preferred choice for many machine learning tasks. Here, we discuss some key advantages of using the
backpropagation algorithm:

1. Ease of Implementation: Backpropagation does not require prior knowledge of neural networks, making it
accessible to beginners. Its straightforward nature simplifies the programming process, as it primarily
involves adjusting weights based on error derivatives.

2. Simplicity and Flexibility: The algorithm’s simplicity allows it to be applied to a wide range of problems and
network architectures. Its flexibility makes it suitable for various scenarios, from simple feedforward
networks to complex recurrent or convolutional neural networks.

3. Efficiency: Backpropagation accelerates the learning process by directly updating weights based on the
calculated error derivatives. This efficiency is particularly advantageous in training deep neural networks,
where learning features of a function can be time-consuming.
4. Generalization: Backpropagation enables neural networks to generalize well to unseen data by iteratively
adjusting weights during training. This generalization ability is crucial for developing models that can make
accurate predictions on new, unseen examples.

5. Scalability: Backpropagation scales well with the size of the dataset and the complexity of the network. This
scalability makes it suitable for large-scale machine learning tasks, where training data and network size are
significant factors.

Working of Backpropagation Algorithm


The Backpropagation algorithm works by two different passes, they are:

 Forward pass

 Backward pass

Working of Forward pass

 In forward pass, initially the input is fed into the input layer. Since the inputs are raw data, they can be used
for training our neural network.

 The inputs and their corresponding weights are passed to the hidden layer. The hidden layer performs the
computation on the data it receives. If there are two hidden layers in the neural network, for instance,
consider the illustration fig(a), h1 and h2 are the two hidden layers, and the output of h1 can be used as an
input of h2. Before applying it to the activation function, the bias is added.

 To the weighted sum of inputs, the activation function is applied in the hidden layer to each of its neurons.
And finally, the weighted outputs from the last hidden layer are fed into the output to compute the final
prediction.

The forward pass using weights and biases

Working of backward pass

 In the backward pass process shows, the error is transmitted back to the network which helps the network,
to improve its performance by learning and adjusting the internal weights.

 To find the error generated through the process of forward pass, we can use one of the most commonly
used methods called mean squared error which calculates the difference between the predicted output
and desired output. The formula for mean squared error is:

Meansquarederror = (predictedoutput – actualoutput)2


 Once we have done the calculation at the output layer, we then propagate the error backward through the
network, layer by layer.

 The key calculation during the backward pass is determining the gradients for each weight and bias in the
network. This gradient is responsible for telling us how much each weight/bias should be adjusted to
minimize the error in the next forward pass. The chain rule is used iteratively to calculate this gradient
efficiently.

 In addition to gradient calculation, the activation function also plays a crucial role in backpropagation, it
works by calculating the gradients with the help of the derivative of the activation function.
CONVERGENCE AND LOCAL MINIMA
Backpropagation is only guaranteed to converge to a local, and not a global, minima. However, since each
weight in a network essentially corresponds to a different dimension in the error space, a local minimum
with respect to one weight may not be a local minimum with respect to other weights. This can provide
an “escape route” from becoming trapped in local minima. If the weights are initialized to values close to
zero, the sigmoid threshold function is approximately linear and so they produce linear outputs. As the
weights grow, though, the network is able to represent more complex functions that are not linear in
nature. It is the hope that by the time the weights are able to approximate the desired function that they
will be close enough to the global minimum that even becoming stuck in a local minima will be
acceptable. Common heuristic methods to reduce the problem of local minima are: • Add a momentum
term to the weight-update rule 18 • Use stochastic gradient descent rather than true gradient descent •
Train multiple networks using the same training data but initialize the networks with different random
weights. If the different networks lead to different local minima, choose the network that performs best
on a validation set of data or all networks can be kept and treated as a committee whose output is the
(possibly weighted) average of individual network outputs.

HIDDEN LAYER REPRESENTATION IN BACK PROPAGATION

The final values at the hidden neurons, are computed using

z^l — weighted inputs in layer l,


and

a^l— activations in layer l.

For layer 2 and 3 the equations are:

 l=2

Equations for z² and a²

 l=3

Equations for z³ and a³

W² and W³ are the weights in layer 2 and 3 while b² and b³ are the biases in those layers.

Activations a² and a³ are computed using an activation function f. Typically, this function f is non-

linear (e.g. sigmoid, tanh) and allows the network to learn complex patterns in data.

Let’s pick layer 2 and its parameters as an example. The same operations can be applied to any layer in the

network.

 W¹ is a weight matrix of shape (n, m) where n is the number of output neurons (neurons in the next

layer) and m is the number of input neurons (neurons in the previous layer). For us, n = 2 and m = 4.
Equation for W¹

NB: The first number in any weight’s subscript matches the index of the neuron in the next layer (in our

case this is the Hidden_2 layer) and the second number matches the index of the neuron in previous
layer (in our case this is the Input layer).

 x is the input vector of shape (m, 1) where m is the number of input neurons. For us, m = 4.

Equation for x

 b¹ is a bias vector of shape (n , 1) where n is the number of neurons in the current layer. For us, n = 2.

Equation for b¹

Following the equation for z², we can use the above definitions of W¹, x and b¹ to derive “Equation for z²”:
Equation for z²

Now carefully observe the neural network illustration from above.

Input and Hidden_1 layers

You will see that z² can be expressed using (z_1)² and (z_2)² where (z_1)² and (z_2)² are the sums of the
multiplication between every input x_i with the corresponding weight (W_ij)¹.

This leads to the same “Equation for z²” and proofs that the matrix representations for z², a², z³ and a³ are
correct.

REMARKS ON THE BACK PROPOGATION ALGORITHM


Most prominent advantages of Backpropagation are:

 Backpropagation is fast, simple and easy to program


 It has no parameters to tune apart from the numbers of input
 It is a flexible method as it does not require prior knowledge about the network
 It is a standard method that generally works well
 It does not need any special mention of the features of the function to be learned.
(I)Convergence and local minima
Backpropagation is a multi-layer algorithm. In multi-layer neural networks, it can go back and
change the weights.

All neurons are interconnected to each other and they converge at a point so that the information is
passed onto every neuron in the network.

Using the backpropagation algorithm we are minimizing the errors by modifying the weights. This
minimization of errors can be done only locally but not globally.

The Representational Power of the Feed-Forward Networks


The representation power is how effectively you are representing the neural network. It depends on
the depth and width of the network.

We use boolean, continuous, and arbitrary functions in order to represent the network.
(ii)Hypothesis Space search and Inductive Bias

(iii)Inductive Bias
(iv)Hypothesis Space search

(v)Hidden Layer Representation


By using backpropagation algorithm one can define new features in the hidden layer which are not
explicitly represented in the input.

You might also like