0% found this document useful (0 votes)
32 views

ARTIFICIAL NEUERAL NETWORK Notes

Ann notes for cs students

Uploaded by

Cherry Garg
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

ARTIFICIAL NEUERAL NETWORK Notes

Ann notes for cs students

Uploaded by

Cherry Garg
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 28

ARTIFICIAL NEUERAL NETWORK

21CSE326T
*HISTORY OF NEURAL NETWORK RESEARCH
Early Foundations (1940s–1950s)
 1943: Neurophysiologist Warren McCulloch and mathematician Walter Pitts published
a seminal paper introducing the concept of artificial neurons. They modeled neural
networks using simple logic gates, laying the groundwork for understanding how
networks of neurons could perform computations.
 1950: Alan Turing’s paper "Computing Machinery and Intelligence" proposed the idea
of machines that could simulate human intelligence, indirectly influencing the
development of neural networks.
2. The Birth of Neural Networks (1950s–1960s)
 1951: Marvin Minsky and Dean Edmonds created the first neural network hardware,
the SNARC (Stochastic Neural Analog Reinforcement Calculator), to simulate learning
processes.
 1958: Frank Rosenblatt invented the Perceptron, an early type of neural network
capable of binary classification. The Perceptron algorithm was one of the first models
that demonstrated the potential of neural networks for learning.
3. The First AI Winter (1960s–1970s)
 1969: Minsky and Papert’s book "Perceptrons" critically evaluated the limitations of
single-layer perceptrons, particularly their inability to solve non-linearly separable
problems like XOR. This criticism led to a decline in funding and interest in neural
network research, a period often referred to as the “AI Winter.”
4. Resurgence and Backpropagation (1980s)
 1986: David Rumelhart, Geoffrey Hinton, and Ronald Williams rediscovered and
popularized the backpropagation algorithm, which allowed multi-layer neural networks
(also known as multi-layer perceptrons) to be trained effectively. This breakthrough
marked a significant revival in neural network research.
 1980s: The introduction of algorithms like Hopfield networks and Kohonen’s Self-
Organizing Maps further expanded the capabilities and applications of neural networks.
5. Growth and Expansion (1990s)
 1990s: The development of Support Vector Machines and advancements in kernel
methods provided alternative approaches to classification and regression, leading to
broader research in machine learning and neural networks.
 1997: The successful application of neural networks in fields such as speech recognition
and computer vision helped establish their practical utility.
6. Deep Learning Revolution (2000s–2010s)
 2006: Geoffrey Hinton and his colleagues introduced the concept of deep learning
through unsupervised pre-training techniques. This method allowed for training of very
deep neural networks and set the stage for major advances.
 2012: AlexNet, a deep convolutional neural network developed by Alex Krizhevsky, Ilya
Sutskever, and Geoffrey Hinton, won the ImageNet competition by a significant margin,
demonstrating the power of deep learning in image recognition and sparking
widespread interest in neural networks.
7. Modern Era (2010s–Present)
 2010s-Present: Neural networks, especially deep learning models, have become central
to many AI applications, including natural language processing, computer vision, and
reinforcement learning. Technologies like GPT (Generative Pre-trained Transformer)
and various transformer models have revolutionized natural language understanding
and generation.
 Recent Advances: Innovations such as large language models (LLMs), generative
adversarial networks (GANs), and transformer architectures continue to push the
boundaries of what neural networks can achieve, driving forward research and practical
applications in diverse fields.
 BIOLOGICAL INSPIRATION IN ARTIFICIAL NEURAL NETWORK
Neuronal Models
 Neurons: The basic unit of artificial neural networks is inspired by biological neurons.
Biological neurons receive inputs through dendrites, process information in the cell
body, and transmit outputs through axons. Similarly, artificial neurons (or nodes)
receive inputs, apply a weight to these inputs, sum them, and pass them through an
activation function to produce an output.
 Synapses: In biological systems, synapses are the connections between neurons that
transmit signals. In ANNs, weights serve a similar purpose, representing the strength of
connections between neurons. These weights are adjusted during training to optimize
the network’s performance.
2. Learning Mechanisms
 Hebbian Learning: Proposed by Donald Hebb in 1949, Hebbian learning is a principle
where connections between neurons are strengthened if they are activated
simultaneously. This idea influenced the development of algorithms like the perceptron
and later models, which adjust weights based on the correlation between input and
output.
 Backpropagation: While not a direct biological analog, backpropagation is inspired by
the general idea of error correction in biological systems. The method adjusts weights to
minimize the error between predicted and actual outcomes, akin to how biological
systems adapt based on feedback.
3. Neural Network Architectures
 Feedforward Networks: These mimic the unidirectional flow of information in biological
neural networks, where signals travel in one direction from input to output without
looping back.
 Recurrent Networks: Recurrent Neural Networks (RNNs) are inspired by the biological
concept of feedback loops, where outputs from neurons are fed back as inputs to the
same or other neurons. This allows RNNs to handle sequences and temporal patterns,
similar to how biological systems process temporal information.
4. Biological Learning and Plasticity
 Synaptic Plasticity: Biological neural networks exhibit plasticity, where the strength of
synaptic connections changes with experience. This concept is reflected in learning
algorithms where weights are adjusted based on the network’s performance and
experience.
 Spike-Timing-Dependent Plasticity (STDP): This form of plasticity adjusts synaptic
strength based on the relative timing of spikes from pre- and post-synaptic neurons.
Although not directly implemented in traditional ANNs, STDP principles influence the
development of spiking neural networks (SNNs), which attempt to more closely emulate
biological neural dynamics.
5. Inspiration for Advanced Models
 Convolutional Neural Networks (CNNs): Inspired by the visual processing in the human
brain, CNNs are designed to handle grid-like data (such as images) and automatically
learn spatial hierarchies of features. This architecture mimics the way the visual cortex
processes visual information.
 Generative Adversarial Networks (GANs): Although more abstract, GANs’ concept of
adversarial training, where two networks (generator and discriminator) compete with
each other, can be loosely compared to competitive neural interactions observed in
biological systems.
6. Neuroscientific Insights
 Hierarchical Processing: The hierarchical structure of CNNs reflects the hierarchical
processing seen in the brain’s visual system, where lower-level features (edges) are
combined to form higher-level representations (objects).
 Biologically Plausible Networks: Researchers are also exploring neural architectures
that more closely emulate the brain’s structure and function, including brain-inspired
algorithms and neuromorphic computing, which aim to create hardware that mimics
neural processes more closely.
7. Limitations and Future Directions
 Simplification: While inspired by biological processes, ANNs are highly simplified
models compared to the complexity of the human brain. Current models do not fully
capture the intricate dynamics of real neural networks, and ongoing research aims to
bridge this gap.
 Neuromorphic Engineering: This field focuses on creating hardware that mimics the
brain’s neural architecture and processing methods, potentially leading to more efficient
and brain-like artificial intelligence systems.
Neural Computation Basics
Neurons and Activation Functions
 Neurons (Nodes): The fundamental units in an ANN are neurons or nodes. Each neuron
receives input signals, processes them, and produces an output signal.
 Weights: Inputs to a neuron are multiplied by weights, which adjust the importance of
each input. Weights are crucial for learning and adapting the network.
 Bias: An additional parameter that allows the activation function to be shifted, helping
the model fit the data better.
 Activation Function: After computing the weighted sum of inputs plus bias, the result is
passed through an activation function, which introduces non-linearity into the model.
Common activation functions include:
o Sigmoid:

o ReLU (Rectified Linear Unit)

o Tanh (Hyperbolic Tangent)

o Softmax: Used in classification problems to convert logits into probabilities.

o Feedforward Computation

o Forward Propagation: The process of moving data through the network from
the input layer to the output layer. In a feedforward network, this involves
computing the weighted sums, applying activation functions, and propagating
the results through each layer until the output is produced.
o 2. Learning and Optimization

o Loss Function

o Purpose: The loss function quantifies the difference between the predicted
output and the actual target values. Common loss functions include:
o Mean Squared Error (MSE): Used for regression tasks.

o Cross-Entropy Loss: Used for classification tasks.

o Optimization Algorithms

o Gradient Descent: The primary algorithm for optimizing neural networks. It


involves:
o Calculating Gradients: Computing the gradients of the loss function with respect
to each weight and bias using backpropagation.
o Updating Weights: Adjusting the weights and biases in the direction that reduces
the loss. The learning rate controls the size of these adjustments.
o Variants of Gradient Descent: To improve performance and convergence,
several variants exist:
o Stochastic Gradient Descent (SGD): Updates weights using a single sample or a
small batch of samples at a time.
o Mini-Batch Gradient Descent: Uses a small batch of samples to update weights,
balancing between computational efficiency and convergence.
o Adam (Adaptive Moment Estimation): Combines the advantages of both
AdaGrad and RMSprop by computing adaptive learning rates for each
parameter.
o Backpropagation

o Process: Backpropagation is the algorithm used to train ANNs. It involves:

o Forward Pass: Calculating the network’s output and the loss.

o Backward Pass: Computing the gradient of the loss function with respect to each
weight by applying the chain rule of calculus.
o Weight Update: Adjusting the weights and biases based on the computed
gradients to minimize the loss.
o 3. Network Architectures

o Feedforward Neural Networks (FNNs)

o Structure: Consists of an input layer, one or more hidden layers, and an output
layer. Each layer is fully connected to the next layer.
o Convolutional Neural Networks (CNNs)

o Purpose: Designed for processing grid-like data such as images. They use
convolutional layers to automatically and adaptively learn spatial hierarchies of
features.
o Recurrent Neural Networks (RNNs)

o Purpose: Designed for sequential data. They use feedback connections, allowing
them to maintain state or memory over time. Variants include Long Short-Term
Memory (LSTM) networks and Gated Recurrent Units (GRUs).
o Generative Adversarial Networks (GANs)

o Purpose: Consist of two networks (a generator and a discriminator) that


compete with each other. The generator creates data samples, and the
discriminator evaluates them. GANs are used for generating realistic data and
other creative applications.
o *MODELS OF COMPUTATIONS

o Feedforward Neural Networks (FNNs)

o Structure: Composed of an input layer, one or more hidden layers, and an


output layer. Each layer is fully connected to the next one.
o Computation: Data passes in one direction from the input layer through the
hidden layers to the output layer without any cycles.
o Activation Functions: Common functions include Sigmoid, ReLU, and Tanh,
which introduce non-linearity into the model.
o Applications: Used in tasks such as image recognition, classification, and
regression.
o 2. Convolutional Neural Networks (CNNs)

o Structure: Includes convolutional layers, pooling layers, and fully connected


layers.
o Convolutional Layers: Apply convolutional filters to input data to capture
spatial hierarchies and features. The filters slide over the input, creating feature
maps.
o Pooling Layers: Reduce the dimensionality of feature maps and retain the most
important features by performing operations like max pooling or average
pooling.
o Fully Connected Layers: Integrate features from the convolutional and pooling
layers for final classification or regression.
o Computation: Designed to handle grid-like data such as images by leveraging
local connectivity and shared weights.
o Applications: Primarily used in image and video processing, including object
detection and image classification.
o 3. Recurrent Neural Networks (RNNs)

o Structure: Designed to handle sequential data. Each neuron can maintain a state
or memory of previous inputs through feedback connections.
o Computation: Utilizes loops to maintain context over sequences, allowing the
network to process time-series or sequences of data.
o Variants:

o Long Short-Term Memory (LSTM) Networks: Address the vanishing gradient


problem in standard RNNs by introducing gates (input, forget, and output gates)
to control the flow of information.
o Gated Recurrent Units (GRUs): Simplified versions of LSTMs with fewer gates
but similar capabilities.
o Applications: Used for tasks such as language modeling, speech recognition, and
time-series forecasting.
o 4. Generative Adversarial Networks (GANs)

o Structure: Consists of two networks:

o Generator: Creates synthetic data samples to mimic real data.

o Discriminator: Evaluates and distinguishes between real and generated samples.

o Computation: The two networks are trained adversarially; the generator aims to
fool the discriminator, while the discriminator tries to correctly identify the
generator’s outputs.
o Applications: Used in image generation, style transfer, and data augmentation.

o 5. Autoencoders

o Structure: Comprises an encoder and a decoder:

o Encoder: Compresses input data into a lower-dimensional representation (latent


space).
o Decoder: Reconstructs the original data from the compressed representation.

o Computation: Trains to minimize the reconstruction error between the input


and the output, effectively learning efficient representations.
o Applications: Utilized in dimensionality reduction, denoising, and anomaly
detection.
o 6. Transformer Models

o Structure: Includes attention mechanisms and encoder-decoder architecture:

o Attention Mechanism: Allows the model to weigh the importance of different


parts of the input data dynamically.
o Encoder-Decoder Structure: In the original Transformer architecture, the
encoder processes the input data, and the decoder generates the output sequence.
o Computation: Efficiently processes sequences in parallel rather than
sequentially, making it suitable for handling long-range dependencies.
o Applications: Widely used in natural language processing tasks such as
translation (e.g., BERT, GPT), text generation, and question answering.
o 7. Spiking Neural Networks (SNNs)

o Structure: Models neurons that communicate via discrete spikes or events,


emulating the temporal dynamics of biological neural systems.
o Computation: The network uses spike-timing-dependent plasticity (STDP) and
other biologically-inspired learning rules.
o Applications: Research in neuromorphic computing and brain-like computation,
with potential applications in sensory processing and robotics.
o 8. Neuro-Inspired Computation

o Neuromorphic Computing: Uses hardware designed to mimic the structure and


function of the brain, often incorporating elements such as spiking neurons and
analog computations.
o Brain-Inspired Algorithms: Focus on algorithms and architectures that draw
from biological processes, such as reinforcement learning and self-organizing
maps.
o *ELEMENTS OF COMPUTING MODELS

o Neurons (Nodes)

o Definition: Basic units of computation in an ANN, inspired by biological


neurons.
o Function: Each neuron receives input signals, processes them through a
weighted sum, applies an activation function, and produces an output.
o Components:

o Input Signals: The data or features fed into the neuron.


o Weights: Parameters that adjust the influence of each input signal.

o Bias: An additional parameter that allows shifting the activation function,


helping the network learn more effectively.
o 2. Activation Functions

o Purpose: Introduce non-linearity into the network, enabling it to model complex


relationships.
o Common Types:

o Sigmoid: σ

o  ReLU (Rectified Linear Unit): Allows positive values to pass through while
setting negative values to zero.
o  Tanh (Hyperbolic Tangent): Maps inputs to a range between -1 and 1.

o  Softmax: Converts logits into probabilities, often used in classification tasks.

o Layers

o Input Layer: The first layer that receives and holds the raw input data.

o Hidden Layers: Intermediate layers where computations occur, transforming the


input data into more abstract representations. Multiple hidden layers contribute
to the depth of the network.
o Output Layer: The final layer that produces the network’s prediction or output.

o 4. Weights and Biases

o Weights: Parameters that scale the input signals. They are adjusted during
training to minimize the loss function.
o Biases: Parameters that are added to the weighted sum of inputs before applying
the activation function. They help the model to fit the data better.
o 5. Loss Function

o Purpose: Measures the difference between the network’s prediction and the
actual target values.
o Common Loss Functions:

o Mean Squared Error (MSE): For regression tasks, calculates the average
squared difference between predicted and actual values.
o Cross-Entropy Loss: For classification tasks, measures the performance of a
classification model whose output is a probability value.
o 6. Optimization Algorithms

o Purpose: Adjust the weights and biases to minimize the loss function.

o Common Algorithms:
o Gradient Descent: Updates weights in the direction of the negative gradient of
the loss function.
o Stochastic Gradient Descent (SGD): Uses a single training example or a small
batch to update weights, which can be more computationally efficient.
o Adam (Adaptive Moment Estimation): Combines the advantages of both
AdaGrad and RMSprop by computing adaptive learning rates for each
parameter.
o 7. Forward Propagation

o Process: Involves passing input data through the network layer by layer,
calculating the weighted sums, applying activation functions, and producing the
output.
o Computation Flow: Ensures that data moves from the input layer through
hidden layers to the output layer.
o 8. Backpropagation

o Purpose: The algorithm used to train neural networks by adjusting weights and
biases based on the error gradient.
o Process:

o Forward Pass: Computes the network’s output and loss.

o Backward Pass: Calculates the gradient of the loss function with respect to each
weight using the chain rule.
o Weight Update: Adjusts the weights based on the gradients to minimize the loss
function.
o 9. Regularization Techniques

o Purpose: To prevent overfitting and improve the model’s ability to generalize to


new data.
o Common Techniques:

o Dropout: Randomly drops neurons during training to prevent co-adaptation.

o L1/L2 Regularization: Adds a penalty to the loss function based on the


magnitude of weights (L1 for sparsity, L2 for weight decay).
o 10. Batch Normalization

o Purpose: Normalizes the inputs of each layer to improve training speed and
stability.
o Process: Scales and shifts the activations to have a mean of zero and a variance
of one.
o 11. Hyperparameters

o Definition: Parameters that control the training process but are not learned
from the data.
o Examples:

o Learning Rate: Controls the step size during weight updates.

o Batch Size: Number of samples processed before updating the model.

o Number of Epochs: Number of times the entire training dataset is passed


through the network.
o 12. Architectures and Models

o Feedforward Neural Networks (FNNs): Basic structure with input, hidden, and
output layers.
o Convolutional Neural Networks (CNNs): Specialized for grid-like data,
incorporating convolutional and pooling layers.
o Recurrent Neural Networks (RNNs): Designed for sequential data with feedback
connections.
o Generative Adversarial Networks (GANs): Consist of a generator and a
discriminator that compete against each other.
o 13. Learning Rate Schedulers

o Purpose: Adjust the learning rate during training to improve convergence.

o Types:

o Step Decay: Reduces the learning rate by a factor at specific intervals.

o Exponential Decay: Decreases the learning rate exponentially over time.

o Adaptive Learning Rates: Adjusts the learning rate based on the performance of
the model.
o **INFORMATION PROCESSING AT NEOURNS AND SYNAPSES

Processing at Neurons

Basic Neuron Model

 Input Aggregation:
o Weighted Sum: Each neuron receives input signals, which are scaled by
corresponding weights. The weighted sum of these inputs is computed.
z=∑i(wi⋅xi)+b

Activation Function:

 Purpose: Applies a non-linear transformation to the weighted sum to introduce non-


linearity into the network. This allows the network to model complex relationships.
 Common Activation Functions:
o Sigmoid Function: σ(z)
o RELU FUNCTIONS

o SOFTMAX
o TANH

o Training Neurons

o Backpropagation:

o Error Computation: Determines the error between the network’s prediction and
the actual target.
o Gradient Calculation: Computes the gradient of the error with respect to each
weight using the chain rule.
o Weight Update: Adjusts the weights to minimize the error, typically using
optimization algorithms like Gradient Descent or its variants.
o 2. Information Processing at Synapses

o Weights and Connections

o Weights:

o Role: Represent the strength or importance of the connection between neurons.


They scale the input signals and are adjusted during training to optimize
network performance.
o Learning: Weights are updated based on the gradients computed during
backpropagation to minimize the loss function.
o Synaptic Plasticity:

o Concept: In biological systems, synapses (connections between neurons) change


their strength based on activity, a concept that is mimicked in ANNs by adjusting
weights.
o Learning Rules: Various learning rules, such as Hebbian learning or gradient-
based optimization, are used to adjust weights.
o Synaptic Updates

o Gradient Descent:

o Basic Update Rule: Weights are updated by subtracting a fraction of the


gradient of the loss function with respect to the weight. wn
o Regularization Techniques:

o L1 and L2 Regularization: Add penalties to the loss function based on the


magnitude of the weights to prevent overfitting.
o L1 Regularization: Encourages sparsity by adding the sum of the absolute values
of the weights.
o L2 Regularization: Penalizes large weights by adding the sum of the squared
values of the weights.
o Adaptive Learning Rates:

o Algorithms: Techniques like Adam or RMSprop adjust the learning rate based
on the magnitude of recent gradients to improve convergence.
o 3. Information Flow and Connectivity

o Feedforward Networks

o Forward Propagation: Information flows through the network from the input
layer to the output layer in one direction. Each neuron’s output serves as input
to the neurons in the subsequent layer.
o Recurrent Networks

o Temporal Dependencies: Neurons can maintain state across time steps through
feedback connections. This allows them to process sequences and temporal
patterns in data.
o Convolutional Networks

o Local Connectivity: Convolutional layers apply filters to local regions of the


input, preserving spatial hierarchies and reducing the number of parameters.
o Attention Mechanisms

o Dynamic Weighing: Attention mechanisms adjust the focus on different parts of


the input based on their relevance, improving the network’s ability to handle
complex dependencies.
o *NEOURNS AS SELF-ORGANIZING SYSTEMS

o Self-Organizing Maps (SOMs)

o Self-Organizing Maps (SOMs) are a type of artificial neural network that


exemplify self-organizing principles. They are used for unsupervised learning
tasks such as clustering and dimensionality reduction.
o Structure:

o Grid of Neurons: Neurons are arranged in a grid or lattice structure.

o Topology Preservation: The SOM preserves the topological relationships


between the input data points on the output grid.
o Learning Process:

o Competitive Learning: Neurons compete to respond to input patterns. The


neuron with the smallest distance to the input vector (the "best matching unit"
or BMU) wins the competition.
o Neighborhood Function: The BMU and its neighboring neurons are updated to
move closer to the input vector. The influence of this update decreases with
distance from the BMU.
o Weight Adjustment: Weights are adjusted to reflect the input data, leading to a
mapping that captures the underlying structure of the data.
o Applications: Data visualization, clustering, and feature extraction.

o 2. Hebbian Learning
o Hebbian learning is based on the principle that neurons that fire together wire
together, which is a foundational idea for self-organization in neural systems.
o Rule:

o Hebbian Update Rule: If two neurons are activated simultaneously, the


connection strength (weight) between them is increased. Δwij=η⋅x
o Applications: Used in various unsupervised learning algorithms and is the basis
for some models of associative memory.
o 3. Kohonen Networks

o Kohonen networks are a type of self-organizing neural network, also known as


SOMs, named after Teuvo Kohonen.
o Structure and Learning:

o Initial Random Weights: Neurons start with random weights.

o Input Presentation: Inputs are presented, and each neuron calculates the
distance to the input vector.
o BMU Identification: The neuron closest to the input vector is selected as the
BMU.
o Weight Adjustment: The BMU and its neighbors adjust their weights to become
more similar to the input vector, driven by a learning rate and neighborhood
function.
o Applications: Similar to SOMs, including clustering, dimensionality reduction,
and feature mapping.
o 4. Adaptive Resonance Theory (ART)

o Adaptive Resonance Theory (ART) is a family of neural network models


designed for stable, plastic learning in dynamic environments.
o Types:

o ART1: For binary input patterns.

o ART2: For continuous-valued input patterns.

o ART3: For higher-dimensional and more complex data.

o Mechanism:

o Resonance: Neurons adapt to new patterns while preserving previously learned


patterns. When a new input is similar to a previously learned pattern, the system
"resonates" and reinforces the existing memory.
o Learning Rules: Includes mechanisms for maintaining stability and plasticity,
preventing catastrophic forgetting.
o Applications: Pattern recognition, clustering, and adaptive learning systems.

o 5. Competitive Learning
o Competitive learning is a type of unsupervised learning where neurons compete
to represent different parts of the input space.
o Process:

o Winner-Takes-All: Neurons compete for activation based on their similarity to


the input. Only the neuron with the highest activation (the winner) is updated.
o Weight Update: The winner's weights are adjusted to become more similar to
the
o input vector, while other neurons' weights remain unchanged.

o Applications: Feature mapping and clustering, such as in Kohonen networks


and SOMs.
o 6. Neurogenesis and Pruning

o In some advanced neural network models, self-organization involves


mechanisms analogous to neurogenesis and pruning observed in biological
systems.
o Neurogenesis: The process of creating new neurons or units in the network to
better capture the complexity of the data.
o Pruning: The process of removing neurons or connections that are less relevant
or redundant, improving network efficiency and performance.
o NETWORK OF PRIMITIVE FUNCTIONS

o In neural networks, especially deep learning models, the concept of "primitive


functions" typically refers to the basic mathematical operations or functions that
are used to build more complex models. These primitive functions form the
foundation of the network's architecture and its ability to learn and generalize
from data.
o Here’s a breakdown of how these primitive functions are utilized in neural
networks:
o 1. Basic Mathematical Operations

o Addition and Subtraction: These are fundamental operations used in neurons to


combine inputs.
o Multiplication and Division: These operations can be used for scaling and
adjusting values within the network.
o 2. Activation Functions

o These functions introduce non-linearity into the network, allowing it to model


complex relationships:
o Sigmoid: Maps input values to a range between 0 and 1. It's often used in binary
classification tasks.
o Tanh: Maps input values to a range between -1 and 1, providing outputs that are
zero-centered.
o ReLU (Rectified Linear Unit): Outputs the input directly if it’s positive;
otherwise, it outputs zero. It’s widely used due to its simplicity and efficiency.
o Leaky ReLU: A variant of ReLU that allows a small, non-zero gradient when the
input is negative, helping to mitigate the "dying ReLU" problem.
o Softmax: Converts the output into a probability distribution, used mainly in
multi-class classification problems.
o 3. Linear Transformations

o Weights and Biases: Weights are parameters that are learned and adjusted
during training. Biases allow the activation function to be shifted to better fit the
data.
o Dot Product: Computes the weighted sum of inputs, fundamental to the
operation of neurons in fully connected layers.
o 4. Pooling Functions

o These functions reduce the dimensionality of feature maps, helping to make the
network more computationally efficient and less sensitive to minor variations in
the input:
o Max Pooling: Takes the maximum value from a set of values in a region of the
feature map.
o Average Pooling: Computes the average value from a set of values in a region of
the feature map.
o 5. Normalization Techniques

o These functions help to stabilize and accelerate the training of neural networks:

o Batch Normalization: Normalizes the inputs of each layer to have a mean of zero
and a variance of one, improving convergence and performance.
o Layer Normalization: Normalizes the activations of each layer for each
individual example rather than across a batch.
o 6. Regularization Techniques

o To prevent overfitting, several regularization functions are used:

o Dropout: Randomly drops a fraction of neurons during training to prevent over-


reliance on specific neurons.
o L1/L2 Regularization: Adds a penalty based on the magnitude of weights to the
loss function to discourage overly complex models.
o 7. Optimization Functions

o These functions adjust the weights and biases to minimize the loss:

o Gradient Descent: Updates the weights based on the gradient of the loss function
with respect to each weight.
o Adam (Adaptive Moment Estimation): An extension of gradient descent that
adapts the learning rate based on the first and second moments of the gradients.
o **SINGLE AND MULTIPLE INPUT NEOURNS

In neural networks, neurons are the fundamental units that process inputs and produce
outputs. They can be categorized into single-input and multiple-input neurons based on
the number of inputs they receive.
Single-Input Neurons
A single-input neuron receives one input value and processes it to produce an output.
This is a simplified model, often used in educational contexts or in very basic neural
network architectures. The output of a single-input neuron can be calculated using a
simple activation function. For example:
 Input: xxx
 Weight: www
 Bias: bbb
 Activation Function:
The output yyy is computed as:
y=f(wx+b)y = f(wx + b)y=f(wx+b)
In this case, wx+bwx + bwx+b is often referred to as the neuron's weighted sum, and fff
is the activation function.
Multiple-Input Neurons
A multiple-input neuron, as the name suggests, receives multiple input values. Each
input has an associated weight, and the neuron computes a weighted sum of all inputs,
adds a bias, and then applies an activation function. This setup is more common in
practical neural network models.
For a neuron with nnn inputs, the output is calculated as:
 Inputs: x1,x2,…,xnx_1, x_2, \ldots, x_nx1,x2,…,xn
 Weights: w1,w2,…,wnw_1, w_2, \ldots, w_nw1,w2,…,wn
 Bias: bbb
 Activation Function: fff
The output yyy is given by:
y=f(∑i=1nwixi+b)y = f\left(\sum_{i=1}^n w_i x_i + b\right)y=f(∑i=1nwixi+b)
Activation Functions
The activation function fff determines the output of the neuron and introduces non-
linearity into the model, enabling the network to learn complex patterns. Common
activation functions include:
 Sigmoid: f(x)=11+e−xf(x) = \frac{1}{1 + e^{-x}}f(x)=1+e−x1
 ReLU (Rectified Linear Unit): f(x)=max⁡(0,x)f(x) = \max(0, x)f(x)=max(0,x)
 Tanh (Hyperbolic Tangent): f(x)=tanh⁡(x)f(x) = \tanh(x)f(x)=tanh(x)
 Softmax: Used in the output layer of classification networks to convert logits into
probabilities.
Example in a Simple Neural Network
Consider a neuron in a basic feedforward neural network:
1. Input Layer: Takes input values (e.g., features from a dataset).
2. Hidden Layer: Each neuron in this layer takes inputs from the previous layer, computes
the weighted sum, adds bias, and applies an activation function.
3. Output Layer: Produces the final output of the network.
*TRANSFER FUNCTIONS
Key learnings:
 Transfer Function Definition: A transfer function is defined as the ratio of the Laplace
transform of a system’s output to the input, assuming initial conditions are zero.
 Utilization of Block Diagrams: Block diagrams simplify complex control systems into
manageable components, making it easier to analyze and derive transfer functions.
 Understanding Poles and Zeros: Poles and zeros critically influence a system’s behavior
by indicating points where the transfer function respectively becomes infinite or zero.
 Laplace Transform in Control Systems: Laplace transform is essential for representing
all types of signals in a uniform format, aiding in the mathematical analysis of control
systems.
 Impulse Response Insight: The output from an impulse input reveals the transfer
function, illustrating the direct relationship between a system’s input and output.
A transfer function represents the relationship between the output signal of a control
system and the input signal, for all possible input values. A block diagram is a
visualization of the control system which uses blocks to represent the transfer function,
and arrows which represent the various input and output signals.
Every control system has a reference input, often called excitation or cause, that works
through a transfer function to create a controlled output or response.
Thus the cause and effect relationship between the output and input is related to each
other through a transfer function.

In a Laplace Transform, if the input is represented by R(s) and the output is represented
by C(s), then the transfer function will be:

That is, the transfer function of the system multiplied by the input function gives the
output function of the system.
What is a Transfer Function
Transfer Function Explained: It is defined as the ratio of the Laplace transform of the
output to the Laplace transform of the input, assuming zero initial conditions.

Procedure for determining the transfer function of a control system are as follows:
1. We form the equations for the system.
2. Now we take Laplace transform of the system equations, assuming initial conditions as
zero.
3. Specify system output and input.
4. Lastly we take the ratio of the Laplace transform of the output and the Laplace
transform of the input which is the required transfer function.
Inputs and outputs in a control system may differ. For instance, electric motors take
electrical signals as inputs and produce mechanical outputs to rotate, while generators
take mechanical inputs to generate electrical outputs.
But for mathematical analysis, of a system all kinds of signals should be represented in a
similar form. This is done by transforming all kinds of signal to their Laplace form. Also
the transfer function of a system is represented by Laplace form by dividing output
Laplace transfer function to input Laplace transfer function. Hence a basic block
diagram of a control system can be represented as
Where r(t) and c(t) are time domain function of input and output signal respectively.
Methods of Obtaining a Transfer Function
There are major two ways of obtaining a transfer function for the control system. The
ways are:
 Block Diagram Method: It is not convenient to derive a complete transfer function for a
complex control system. Therefore the transfer function of each element of a control
system is represented by a block diagram. Block diagram reduction techniques are
applied to obtain the desired transfer function.
 Signal Flow Graphs: The modified form of a block diagram is a signal flow graph. Block
diagrams visually outline a control system, while signal flow graphs provide a more
compact representation.

What is Recurrent Neural Network (RNN)?


Recurrent Neural Network(RNN) is a type of Neural
Network where the output from the previous step is fed as input
to the current step. In traditional neural networks, all the inputs
and outputs are independent of each other. Still, in cases when it
is required to predict the next word of a sentence, the previous
words are required and hence there is a need to remember the
previous words. Thus RNN came into existence, which solved this
issue with the help of a Hidden Layer. The main and most
important feature of RNN is its Hidden state, which remembers
some information about a sequence. The state is also referred to
as Memory State since it remembers the previous input to the
network. It uses the same parameters for each input as it
performs the same task on all the inputs or hidden layers to
produce the output. This reduces the complexity of parameters,
unlike other neural networks.

Types of RNN :
1. One-to-One RNN:

 The above diagram represents the structure of the Vanilla Neural Network. It is used to
solve general machine learning problems that have only one input and output.
 2. One-to-Many RNN:

 One-to-Many RNN

 A single input and several outputs describe a one-to-many


Recurrent Neural Network. The above diagram is an
example of this
 3. Many-to-One RNN:

 Many-to-One RNN

 This RNN creates a single output from the given series of


inputs.
 4. Many-to-Many RNN:

 This RNN receives a set of inputs and produces a set of outputs.


 Example: Machine Translation, in which the RNN scans any English text and then
converts it to French.
 Advantages of RNN :
 RNN may represent a set of data in such a way that each sample is assumed to be reliant
on the previous one.
 To extend the active pixel neighbourhood, a Recurrent Neural Network is combined
with convolutional layers.
 Disadvantages of RNN :
 RNN training is a difficult process.
 If it is using tanh or ReLu like activation function, it wouldn’t be able to handle very
lengthy sequences.
 The Vanishing or Exploding Gradient problem in RNN

 What is LSTM?
 Long Short-Term Memory is an improved version of recurrent neural network designed
by Hochreiter & Schmidhuber.
 A traditional RNN has a single hidden state that is passed through time, which can make
it difficult for the network to learn long-term dependencies. LSTMs model address this
problem by introducing a memory cell, which is a container that can hold information
for an extended period.
 LSTM architectures are capable of learning long-term dependencies in sequential data,
which makes them well-suited for tasks such as language translation, speech recognition,
and time series forecasting.
 LSTMs can also be used in combination with other neural network architectures, such
as Convolutional Neural Networks (CNNs) for image and video analysis.
 LSTM Architecture
 The LSTM architectures involves the memory cell which is controlled by three gates: the
input gate, the forget gate, and the output gate. These gates decide what information to
add to, remove from, and output from the memory cell.
 The input gate controls what information is added to the memory cell.
 The forget gate controls what information is removed from the memory cell.
 The output gate controls what information is output from the memory cell.
 This allows LSTM networks to selectively retain or discard information as it flows
through the network, which allows them to learn long-term dependencies.
 The LSTM maintains a hidden state, which acts as the short-term memory of the
network. The hidden state is updated based on the input, the previous hidden state, and
the memory cell’s current state.
 Bidirectional LSTM Model
 Bidirectional LSTM (Bi LSTM/ BLSTM) is recurrent neural network (RNN) that is able
to process sequential data in both forward and backward directions. This allows Bi
LSTM to learn longer-range dependencies in sequential data than traditional LSTMs,
which can only process sequential data in one direction.
 Bi LSTMs are made up of two LSTM networks, one that processes the input sequence in
the forward direction and one that processes the input sequence in the backward
direction.
 The outputs of the two LSTM networks are then combined to produce the final output.
 LSTM models, including Bi LSTMs, have demonstrated state-of-the-art performance
across various tasks such as machine translation, speech recognition, and text
summarization.
 Networks in LSTM architectures can be stacked to create deep architectures, enabling
the learning of even more complex patterns and hierarchies in sequential data. Each
LSTM layer in a stacked configuration captures different levels of abstraction and
temporal dependencies within the input data.
 LSTM Working
 LSTM architecture has a chain structure that contains four neural networks and
different memory blocks called cells.


 Forget Gate
 The information that is no longer useful in the cell state is removed with the forget gate.
Two inputs xt (input at the particular time) and ht-1 (previous cell output) are fed to the
gate and multiplied with weight matrices followed by the addition of bias. The resultant
is passed through an activation function which gives a binary output. If for a particular
cell state the output is 0, the piece of information is forgotten and for output 1, the
information is retained for future use. The equation for the forget gate is:

 ft=σ(Wf⋅[ht−1,xt]+bf)
 where:
 W_f represents the weight matrix associated with the forget gate.
 [h_t-1, x_t] denotes the concatenation of the current input and the previous hidden state.
 b_f is the bias with the forget gate.
 σ is the sigmoid activation function.
 -----

Input gate
The addition of useful information to the cell state is done by the
input gate. First, the information is regulated using the sigmoid
function and filter the values to be remembered similar to the
forget gate using inputs ht-1 and xt. . Then, a vector is created
using tanh function that gives an output from -1 to +1, which
contains all the possible values from h t-1 and xt. At last, the
values of the vector and the regulated values are multiplied to

it=σ(Wi⋅[ht−1,xt]+bi) it=σ(Wi⋅[ht−1,xt]+bi)
obtain the useful information. The equation for the input gate is:

C^t=tanh(Wc⋅[ht−1,xt]+bc)C^t=tanh(Wc⋅[ht−1,xt]+bc)

we had previously chosen to ignore. Next, we include i t∗Ct. This


We multiply the previous state by f t, disregarding the information

represents the updated candidate values, adjusted for the

Ct=ft⊙Ct−1+it⊙C^tCt=ft⊙Ct−1+it⊙C^t
amount that we chose to update each state value.

⊙ denotes element-wise multiplication


where

 tanh is tanh activation function

Applications of LSTM
 Some of the famous applications of LSTM includes:
 Language Modeling: LSTMs have been used for natural language processing tasks such
as language modeling, machine translation, and text summarization. They can be
trained to generate coherent and grammatically correct sentences by learning the
dependencies between words in a sentence.
 Speech Recognition: LSTMs have been used for speech recognition tasks such as
transcribing speech to text and recognizing spoken commands. They can be trained to
recognize patterns in speech and match them to the corresponding text.
 Time Series Forecasting: LSTMs have been used for time series forecasting tasks such as
predicting stock prices, weather, and energy consumption. They can learn patterns in
time series data and use them to make predictions about future events.
 Anomaly Detection: LSTMs have been used for anomaly detection tasks such as
detecting fraud and network intrusion. They can be trained to identify patterns in data
that deviate from the norm and flag them as potential anomalies.
 Recommender Systems: LSTMs have been used for recommendation tasks such as
recommending movies, music, and books. They can learn patterns in user behavior and
use them to make personalized recommendations.
 Video Analysis: LSTMs have been used for video analysis tasks such as object detection,
activity recognition, and action classification. They can be used in combination with
other neural network architectures, such as Convolutional Neural Networks (CNNs), to
analyze video data and extract useful information.
LSTM (Long
Short-term RNN (Recurrent
 Feature Memory) Neural Network)

Has a special
memory unit that
allows it to learn Does not have a
long-term memory unit
dependencies in
Memory sequential data

Can be trained to
Can only be trained
process sequential
to process
data in both forward
sequential data in
and backward
one direction
Directionality directions

More difficult to train


than RNN due to the
Easier to train than
complexity of the
LSTM
gates and memory
Training unit

Long-term
dependency Yes Limited
learning

Ability to learn
Yes Yes
sequential data

Machine translation, Natural language


speech recognition, processing, machine
text summarization, translation, speech
natural language recognition, image
processing, time processing, video
Applications series forecasting processing

Problem with Long-Term Dependencies in RNN


Recurrent Neural Networks (RNNs) are designed to handle
sequential data by maintaining a hidden state that captures
information from previous time steps. However, they often face
challenges in learning long-term dependencies, where
information from distant time steps becomes crucial for making
accurate predictions. This problem is known as the vanishing
gradient or exploding gradient problem.
Few common issues are listed below:
Vanishing Gradient
During backpropagation through time, gradients can become
extremely small as they are multiplied through the chain of
recurrent connections, causing the model to have difficulty
learning dependencies that are separated by many time steps.
Exploding Gradient
Conversely, gradients can explode during backpropagation,
leading to numerical instability and making it difficult for the
model to converge.
Different Variants on Long Short-Term Memory
Over time, several variants and improvements to the original
LSTM architecture have been proposed.
Vanilla LSTM
This is the original LSTM architecture proposed by Hochreiter and
Schmidhuber. It includes memory cells with input, forget, and
output gates to control the flow of information. The key idea is to
allow the network to selectively update and forget information
from the memory cell.
Peephole Connections
In the peephole LSTM, the gates are allowed to look at the cell
state in addition to the hidden state. This allows the gates to
consider the cell state when making decisions, providing more
context information.
Gated Recurrent Unit (GRU)
GRU is an alternative to LSTM, designed to be simpler and
computationally more efficient. It combines the input and forget
gates into a single “update” gate and merges the cell state and
hidden state. While GRUs have fewer parameters than LSTMs,
they have been shown to perform similarly in practice.

You might also like