ARTIFICIAL NEUERAL NETWORK Notes
ARTIFICIAL NEUERAL NETWORK Notes
21CSE326T
*HISTORY OF NEURAL NETWORK RESEARCH
Early Foundations (1940s–1950s)
1943: Neurophysiologist Warren McCulloch and mathematician Walter Pitts published
a seminal paper introducing the concept of artificial neurons. They modeled neural
networks using simple logic gates, laying the groundwork for understanding how
networks of neurons could perform computations.
1950: Alan Turing’s paper "Computing Machinery and Intelligence" proposed the idea
of machines that could simulate human intelligence, indirectly influencing the
development of neural networks.
2. The Birth of Neural Networks (1950s–1960s)
1951: Marvin Minsky and Dean Edmonds created the first neural network hardware,
the SNARC (Stochastic Neural Analog Reinforcement Calculator), to simulate learning
processes.
1958: Frank Rosenblatt invented the Perceptron, an early type of neural network
capable of binary classification. The Perceptron algorithm was one of the first models
that demonstrated the potential of neural networks for learning.
3. The First AI Winter (1960s–1970s)
1969: Minsky and Papert’s book "Perceptrons" critically evaluated the limitations of
single-layer perceptrons, particularly their inability to solve non-linearly separable
problems like XOR. This criticism led to a decline in funding and interest in neural
network research, a period often referred to as the “AI Winter.”
4. Resurgence and Backpropagation (1980s)
1986: David Rumelhart, Geoffrey Hinton, and Ronald Williams rediscovered and
popularized the backpropagation algorithm, which allowed multi-layer neural networks
(also known as multi-layer perceptrons) to be trained effectively. This breakthrough
marked a significant revival in neural network research.
1980s: The introduction of algorithms like Hopfield networks and Kohonen’s Self-
Organizing Maps further expanded the capabilities and applications of neural networks.
5. Growth and Expansion (1990s)
1990s: The development of Support Vector Machines and advancements in kernel
methods provided alternative approaches to classification and regression, leading to
broader research in machine learning and neural networks.
1997: The successful application of neural networks in fields such as speech recognition
and computer vision helped establish their practical utility.
6. Deep Learning Revolution (2000s–2010s)
2006: Geoffrey Hinton and his colleagues introduced the concept of deep learning
through unsupervised pre-training techniques. This method allowed for training of very
deep neural networks and set the stage for major advances.
2012: AlexNet, a deep convolutional neural network developed by Alex Krizhevsky, Ilya
Sutskever, and Geoffrey Hinton, won the ImageNet competition by a significant margin,
demonstrating the power of deep learning in image recognition and sparking
widespread interest in neural networks.
7. Modern Era (2010s–Present)
2010s-Present: Neural networks, especially deep learning models, have become central
to many AI applications, including natural language processing, computer vision, and
reinforcement learning. Technologies like GPT (Generative Pre-trained Transformer)
and various transformer models have revolutionized natural language understanding
and generation.
Recent Advances: Innovations such as large language models (LLMs), generative
adversarial networks (GANs), and transformer architectures continue to push the
boundaries of what neural networks can achieve, driving forward research and practical
applications in diverse fields.
BIOLOGICAL INSPIRATION IN ARTIFICIAL NEURAL NETWORK
Neuronal Models
Neurons: The basic unit of artificial neural networks is inspired by biological neurons.
Biological neurons receive inputs through dendrites, process information in the cell
body, and transmit outputs through axons. Similarly, artificial neurons (or nodes)
receive inputs, apply a weight to these inputs, sum them, and pass them through an
activation function to produce an output.
Synapses: In biological systems, synapses are the connections between neurons that
transmit signals. In ANNs, weights serve a similar purpose, representing the strength of
connections between neurons. These weights are adjusted during training to optimize
the network’s performance.
2. Learning Mechanisms
Hebbian Learning: Proposed by Donald Hebb in 1949, Hebbian learning is a principle
where connections between neurons are strengthened if they are activated
simultaneously. This idea influenced the development of algorithms like the perceptron
and later models, which adjust weights based on the correlation between input and
output.
Backpropagation: While not a direct biological analog, backpropagation is inspired by
the general idea of error correction in biological systems. The method adjusts weights to
minimize the error between predicted and actual outcomes, akin to how biological
systems adapt based on feedback.
3. Neural Network Architectures
Feedforward Networks: These mimic the unidirectional flow of information in biological
neural networks, where signals travel in one direction from input to output without
looping back.
Recurrent Networks: Recurrent Neural Networks (RNNs) are inspired by the biological
concept of feedback loops, where outputs from neurons are fed back as inputs to the
same or other neurons. This allows RNNs to handle sequences and temporal patterns,
similar to how biological systems process temporal information.
4. Biological Learning and Plasticity
Synaptic Plasticity: Biological neural networks exhibit plasticity, where the strength of
synaptic connections changes with experience. This concept is reflected in learning
algorithms where weights are adjusted based on the network’s performance and
experience.
Spike-Timing-Dependent Plasticity (STDP): This form of plasticity adjusts synaptic
strength based on the relative timing of spikes from pre- and post-synaptic neurons.
Although not directly implemented in traditional ANNs, STDP principles influence the
development of spiking neural networks (SNNs), which attempt to more closely emulate
biological neural dynamics.
5. Inspiration for Advanced Models
Convolutional Neural Networks (CNNs): Inspired by the visual processing in the human
brain, CNNs are designed to handle grid-like data (such as images) and automatically
learn spatial hierarchies of features. This architecture mimics the way the visual cortex
processes visual information.
Generative Adversarial Networks (GANs): Although more abstract, GANs’ concept of
adversarial training, where two networks (generator and discriminator) compete with
each other, can be loosely compared to competitive neural interactions observed in
biological systems.
6. Neuroscientific Insights
Hierarchical Processing: The hierarchical structure of CNNs reflects the hierarchical
processing seen in the brain’s visual system, where lower-level features (edges) are
combined to form higher-level representations (objects).
Biologically Plausible Networks: Researchers are also exploring neural architectures
that more closely emulate the brain’s structure and function, including brain-inspired
algorithms and neuromorphic computing, which aim to create hardware that mimics
neural processes more closely.
7. Limitations and Future Directions
Simplification: While inspired by biological processes, ANNs are highly simplified
models compared to the complexity of the human brain. Current models do not fully
capture the intricate dynamics of real neural networks, and ongoing research aims to
bridge this gap.
Neuromorphic Engineering: This field focuses on creating hardware that mimics the
brain’s neural architecture and processing methods, potentially leading to more efficient
and brain-like artificial intelligence systems.
Neural Computation Basics
Neurons and Activation Functions
Neurons (Nodes): The fundamental units in an ANN are neurons or nodes. Each neuron
receives input signals, processes them, and produces an output signal.
Weights: Inputs to a neuron are multiplied by weights, which adjust the importance of
each input. Weights are crucial for learning and adapting the network.
Bias: An additional parameter that allows the activation function to be shifted, helping
the model fit the data better.
Activation Function: After computing the weighted sum of inputs plus bias, the result is
passed through an activation function, which introduces non-linearity into the model.
Common activation functions include:
o Sigmoid:
o Feedforward Computation
o Forward Propagation: The process of moving data through the network from
the input layer to the output layer. In a feedforward network, this involves
computing the weighted sums, applying activation functions, and propagating
the results through each layer until the output is produced.
o 2. Learning and Optimization
o Loss Function
o Purpose: The loss function quantifies the difference between the predicted
output and the actual target values. Common loss functions include:
o Mean Squared Error (MSE): Used for regression tasks.
o Optimization Algorithms
o Backward Pass: Computing the gradient of the loss function with respect to each
weight by applying the chain rule of calculus.
o Weight Update: Adjusting the weights and biases based on the computed
gradients to minimize the loss.
o 3. Network Architectures
o Structure: Consists of an input layer, one or more hidden layers, and an output
layer. Each layer is fully connected to the next layer.
o Convolutional Neural Networks (CNNs)
o Purpose: Designed for processing grid-like data such as images. They use
convolutional layers to automatically and adaptively learn spatial hierarchies of
features.
o Recurrent Neural Networks (RNNs)
o Purpose: Designed for sequential data. They use feedback connections, allowing
them to maintain state or memory over time. Variants include Long Short-Term
Memory (LSTM) networks and Gated Recurrent Units (GRUs).
o Generative Adversarial Networks (GANs)
o Structure: Designed to handle sequential data. Each neuron can maintain a state
or memory of previous inputs through feedback connections.
o Computation: Utilizes loops to maintain context over sequences, allowing the
network to process time-series or sequences of data.
o Variants:
o Computation: The two networks are trained adversarially; the generator aims to
fool the discriminator, while the discriminator tries to correctly identify the
generator’s outputs.
o Applications: Used in image generation, style transfer, and data augmentation.
o 5. Autoencoders
o Neurons (Nodes)
o Sigmoid: σ
o ReLU (Rectified Linear Unit): Allows positive values to pass through while
setting negative values to zero.
o Tanh (Hyperbolic Tangent): Maps inputs to a range between -1 and 1.
o Layers
o Input Layer: The first layer that receives and holds the raw input data.
o Weights: Parameters that scale the input signals. They are adjusted during
training to minimize the loss function.
o Biases: Parameters that are added to the weighted sum of inputs before applying
the activation function. They help the model to fit the data better.
o 5. Loss Function
o Purpose: Measures the difference between the network’s prediction and the
actual target values.
o Common Loss Functions:
o Mean Squared Error (MSE): For regression tasks, calculates the average
squared difference between predicted and actual values.
o Cross-Entropy Loss: For classification tasks, measures the performance of a
classification model whose output is a probability value.
o 6. Optimization Algorithms
o Purpose: Adjust the weights and biases to minimize the loss function.
o Common Algorithms:
o Gradient Descent: Updates weights in the direction of the negative gradient of
the loss function.
o Stochastic Gradient Descent (SGD): Uses a single training example or a small
batch to update weights, which can be more computationally efficient.
o Adam (Adaptive Moment Estimation): Combines the advantages of both
AdaGrad and RMSprop by computing adaptive learning rates for each
parameter.
o 7. Forward Propagation
o Process: Involves passing input data through the network layer by layer,
calculating the weighted sums, applying activation functions, and producing the
output.
o Computation Flow: Ensures that data moves from the input layer through
hidden layers to the output layer.
o 8. Backpropagation
o Purpose: The algorithm used to train neural networks by adjusting weights and
biases based on the error gradient.
o Process:
o Backward Pass: Calculates the gradient of the loss function with respect to each
weight using the chain rule.
o Weight Update: Adjusts the weights based on the gradients to minimize the loss
function.
o 9. Regularization Techniques
o Purpose: Normalizes the inputs of each layer to improve training speed and
stability.
o Process: Scales and shifts the activations to have a mean of zero and a variance
of one.
o 11. Hyperparameters
o Definition: Parameters that control the training process but are not learned
from the data.
o Examples:
o Feedforward Neural Networks (FNNs): Basic structure with input, hidden, and
output layers.
o Convolutional Neural Networks (CNNs): Specialized for grid-like data,
incorporating convolutional and pooling layers.
o Recurrent Neural Networks (RNNs): Designed for sequential data with feedback
connections.
o Generative Adversarial Networks (GANs): Consist of a generator and a
discriminator that compete against each other.
o 13. Learning Rate Schedulers
o Types:
o Adaptive Learning Rates: Adjusts the learning rate based on the performance of
the model.
o **INFORMATION PROCESSING AT NEOURNS AND SYNAPSES
Processing at Neurons
Input Aggregation:
o Weighted Sum: Each neuron receives input signals, which are scaled by
corresponding weights. The weighted sum of these inputs is computed.
z=∑i(wi⋅xi)+b
Activation Function:
o SOFTMAX
o TANH
o Training Neurons
o Backpropagation:
o Error Computation: Determines the error between the network’s prediction and
the actual target.
o Gradient Calculation: Computes the gradient of the error with respect to each
weight using the chain rule.
o Weight Update: Adjusts the weights to minimize the error, typically using
optimization algorithms like Gradient Descent or its variants.
o 2. Information Processing at Synapses
o Weights:
o Gradient Descent:
o Algorithms: Techniques like Adam or RMSprop adjust the learning rate based
on the magnitude of recent gradients to improve convergence.
o 3. Information Flow and Connectivity
o Feedforward Networks
o Forward Propagation: Information flows through the network from the input
layer to the output layer in one direction. Each neuron’s output serves as input
to the neurons in the subsequent layer.
o Recurrent Networks
o Temporal Dependencies: Neurons can maintain state across time steps through
feedback connections. This allows them to process sequences and temporal
patterns in data.
o Convolutional Networks
o 2. Hebbian Learning
o Hebbian learning is based on the principle that neurons that fire together wire
together, which is a foundational idea for self-organization in neural systems.
o Rule:
o Input Presentation: Inputs are presented, and each neuron calculates the
distance to the input vector.
o BMU Identification: The neuron closest to the input vector is selected as the
BMU.
o Weight Adjustment: The BMU and its neighbors adjust their weights to become
more similar to the input vector, driven by a learning rate and neighborhood
function.
o Applications: Similar to SOMs, including clustering, dimensionality reduction,
and feature mapping.
o 4. Adaptive Resonance Theory (ART)
o Mechanism:
o 5. Competitive Learning
o Competitive learning is a type of unsupervised learning where neurons compete
to represent different parts of the input space.
o Process:
o Weights and Biases: Weights are parameters that are learned and adjusted
during training. Biases allow the activation function to be shifted to better fit the
data.
o Dot Product: Computes the weighted sum of inputs, fundamental to the
operation of neurons in fully connected layers.
o 4. Pooling Functions
o These functions reduce the dimensionality of feature maps, helping to make the
network more computationally efficient and less sensitive to minor variations in
the input:
o Max Pooling: Takes the maximum value from a set of values in a region of the
feature map.
o Average Pooling: Computes the average value from a set of values in a region of
the feature map.
o 5. Normalization Techniques
o These functions help to stabilize and accelerate the training of neural networks:
o Batch Normalization: Normalizes the inputs of each layer to have a mean of zero
and a variance of one, improving convergence and performance.
o Layer Normalization: Normalizes the activations of each layer for each
individual example rather than across a batch.
o 6. Regularization Techniques
o These functions adjust the weights and biases to minimize the loss:
o Gradient Descent: Updates the weights based on the gradient of the loss function
with respect to each weight.
o Adam (Adaptive Moment Estimation): An extension of gradient descent that
adapts the learning rate based on the first and second moments of the gradients.
o **SINGLE AND MULTIPLE INPUT NEOURNS
In neural networks, neurons are the fundamental units that process inputs and produce
outputs. They can be categorized into single-input and multiple-input neurons based on
the number of inputs they receive.
Single-Input Neurons
A single-input neuron receives one input value and processes it to produce an output.
This is a simplified model, often used in educational contexts or in very basic neural
network architectures. The output of a single-input neuron can be calculated using a
simple activation function. For example:
Input: xxx
Weight: www
Bias: bbb
Activation Function:
The output yyy is computed as:
y=f(wx+b)y = f(wx + b)y=f(wx+b)
In this case, wx+bwx + bwx+b is often referred to as the neuron's weighted sum, and fff
is the activation function.
Multiple-Input Neurons
A multiple-input neuron, as the name suggests, receives multiple input values. Each
input has an associated weight, and the neuron computes a weighted sum of all inputs,
adds a bias, and then applies an activation function. This setup is more common in
practical neural network models.
For a neuron with nnn inputs, the output is calculated as:
Inputs: x1,x2,…,xnx_1, x_2, \ldots, x_nx1,x2,…,xn
Weights: w1,w2,…,wnw_1, w_2, \ldots, w_nw1,w2,…,wn
Bias: bbb
Activation Function: fff
The output yyy is given by:
y=f(∑i=1nwixi+b)y = f\left(\sum_{i=1}^n w_i x_i + b\right)y=f(∑i=1nwixi+b)
Activation Functions
The activation function fff determines the output of the neuron and introduces non-
linearity into the model, enabling the network to learn complex patterns. Common
activation functions include:
Sigmoid: f(x)=11+e−xf(x) = \frac{1}{1 + e^{-x}}f(x)=1+e−x1
ReLU (Rectified Linear Unit): f(x)=max(0,x)f(x) = \max(0, x)f(x)=max(0,x)
Tanh (Hyperbolic Tangent): f(x)=tanh(x)f(x) = \tanh(x)f(x)=tanh(x)
Softmax: Used in the output layer of classification networks to convert logits into
probabilities.
Example in a Simple Neural Network
Consider a neuron in a basic feedforward neural network:
1. Input Layer: Takes input values (e.g., features from a dataset).
2. Hidden Layer: Each neuron in this layer takes inputs from the previous layer, computes
the weighted sum, adds bias, and applies an activation function.
3. Output Layer: Produces the final output of the network.
*TRANSFER FUNCTIONS
Key learnings:
Transfer Function Definition: A transfer function is defined as the ratio of the Laplace
transform of a system’s output to the input, assuming initial conditions are zero.
Utilization of Block Diagrams: Block diagrams simplify complex control systems into
manageable components, making it easier to analyze and derive transfer functions.
Understanding Poles and Zeros: Poles and zeros critically influence a system’s behavior
by indicating points where the transfer function respectively becomes infinite or zero.
Laplace Transform in Control Systems: Laplace transform is essential for representing
all types of signals in a uniform format, aiding in the mathematical analysis of control
systems.
Impulse Response Insight: The output from an impulse input reveals the transfer
function, illustrating the direct relationship between a system’s input and output.
A transfer function represents the relationship between the output signal of a control
system and the input signal, for all possible input values. A block diagram is a
visualization of the control system which uses blocks to represent the transfer function,
and arrows which represent the various input and output signals.
Every control system has a reference input, often called excitation or cause, that works
through a transfer function to create a controlled output or response.
Thus the cause and effect relationship between the output and input is related to each
other through a transfer function.
In a Laplace Transform, if the input is represented by R(s) and the output is represented
by C(s), then the transfer function will be:
That is, the transfer function of the system multiplied by the input function gives the
output function of the system.
What is a Transfer Function
Transfer Function Explained: It is defined as the ratio of the Laplace transform of the
output to the Laplace transform of the input, assuming zero initial conditions.
Procedure for determining the transfer function of a control system are as follows:
1. We form the equations for the system.
2. Now we take Laplace transform of the system equations, assuming initial conditions as
zero.
3. Specify system output and input.
4. Lastly we take the ratio of the Laplace transform of the output and the Laplace
transform of the input which is the required transfer function.
Inputs and outputs in a control system may differ. For instance, electric motors take
electrical signals as inputs and produce mechanical outputs to rotate, while generators
take mechanical inputs to generate electrical outputs.
But for mathematical analysis, of a system all kinds of signals should be represented in a
similar form. This is done by transforming all kinds of signal to their Laplace form. Also
the transfer function of a system is represented by Laplace form by dividing output
Laplace transfer function to input Laplace transfer function. Hence a basic block
diagram of a control system can be represented as
Where r(t) and c(t) are time domain function of input and output signal respectively.
Methods of Obtaining a Transfer Function
There are major two ways of obtaining a transfer function for the control system. The
ways are:
Block Diagram Method: It is not convenient to derive a complete transfer function for a
complex control system. Therefore the transfer function of each element of a control
system is represented by a block diagram. Block diagram reduction techniques are
applied to obtain the desired transfer function.
Signal Flow Graphs: The modified form of a block diagram is a signal flow graph. Block
diagrams visually outline a control system, while signal flow graphs provide a more
compact representation.
Types of RNN :
1. One-to-One RNN:
The above diagram represents the structure of the Vanilla Neural Network. It is used to
solve general machine learning problems that have only one input and output.
2. One-to-Many RNN:
One-to-Many RNN
Many-to-One RNN
What is LSTM?
Long Short-Term Memory is an improved version of recurrent neural network designed
by Hochreiter & Schmidhuber.
A traditional RNN has a single hidden state that is passed through time, which can make
it difficult for the network to learn long-term dependencies. LSTMs model address this
problem by introducing a memory cell, which is a container that can hold information
for an extended period.
LSTM architectures are capable of learning long-term dependencies in sequential data,
which makes them well-suited for tasks such as language translation, speech recognition,
and time series forecasting.
LSTMs can also be used in combination with other neural network architectures, such
as Convolutional Neural Networks (CNNs) for image and video analysis.
LSTM Architecture
The LSTM architectures involves the memory cell which is controlled by three gates: the
input gate, the forget gate, and the output gate. These gates decide what information to
add to, remove from, and output from the memory cell.
The input gate controls what information is added to the memory cell.
The forget gate controls what information is removed from the memory cell.
The output gate controls what information is output from the memory cell.
This allows LSTM networks to selectively retain or discard information as it flows
through the network, which allows them to learn long-term dependencies.
The LSTM maintains a hidden state, which acts as the short-term memory of the
network. The hidden state is updated based on the input, the previous hidden state, and
the memory cell’s current state.
Bidirectional LSTM Model
Bidirectional LSTM (Bi LSTM/ BLSTM) is recurrent neural network (RNN) that is able
to process sequential data in both forward and backward directions. This allows Bi
LSTM to learn longer-range dependencies in sequential data than traditional LSTMs,
which can only process sequential data in one direction.
Bi LSTMs are made up of two LSTM networks, one that processes the input sequence in
the forward direction and one that processes the input sequence in the backward
direction.
The outputs of the two LSTM networks are then combined to produce the final output.
LSTM models, including Bi LSTMs, have demonstrated state-of-the-art performance
across various tasks such as machine translation, speech recognition, and text
summarization.
Networks in LSTM architectures can be stacked to create deep architectures, enabling
the learning of even more complex patterns and hierarchies in sequential data. Each
LSTM layer in a stacked configuration captures different levels of abstraction and
temporal dependencies within the input data.
LSTM Working
LSTM architecture has a chain structure that contains four neural networks and
different memory blocks called cells.
Forget Gate
The information that is no longer useful in the cell state is removed with the forget gate.
Two inputs xt (input at the particular time) and ht-1 (previous cell output) are fed to the
gate and multiplied with weight matrices followed by the addition of bias. The resultant
is passed through an activation function which gives a binary output. If for a particular
cell state the output is 0, the piece of information is forgotten and for output 1, the
information is retained for future use. The equation for the forget gate is:
ft=σ(Wf⋅[ht−1,xt]+bf)
where:
W_f represents the weight matrix associated with the forget gate.
[h_t-1, x_t] denotes the concatenation of the current input and the previous hidden state.
b_f is the bias with the forget gate.
σ is the sigmoid activation function.
-----
Input gate
The addition of useful information to the cell state is done by the
input gate. First, the information is regulated using the sigmoid
function and filter the values to be remembered similar to the
forget gate using inputs ht-1 and xt. . Then, a vector is created
using tanh function that gives an output from -1 to +1, which
contains all the possible values from h t-1 and xt. At last, the
values of the vector and the regulated values are multiplied to
it=σ(Wi⋅[ht−1,xt]+bi) it=σ(Wi⋅[ht−1,xt]+bi)
obtain the useful information. The equation for the input gate is:
C^t=tanh(Wc⋅[ht−1,xt]+bc)C^t=tanh(Wc⋅[ht−1,xt]+bc)
Ct=ft⊙Ct−1+it⊙C^tCt=ft⊙Ct−1+it⊙C^t
amount that we chose to update each state value.
Applications of LSTM
Some of the famous applications of LSTM includes:
Language Modeling: LSTMs have been used for natural language processing tasks such
as language modeling, machine translation, and text summarization. They can be
trained to generate coherent and grammatically correct sentences by learning the
dependencies between words in a sentence.
Speech Recognition: LSTMs have been used for speech recognition tasks such as
transcribing speech to text and recognizing spoken commands. They can be trained to
recognize patterns in speech and match them to the corresponding text.
Time Series Forecasting: LSTMs have been used for time series forecasting tasks such as
predicting stock prices, weather, and energy consumption. They can learn patterns in
time series data and use them to make predictions about future events.
Anomaly Detection: LSTMs have been used for anomaly detection tasks such as
detecting fraud and network intrusion. They can be trained to identify patterns in data
that deviate from the norm and flag them as potential anomalies.
Recommender Systems: LSTMs have been used for recommendation tasks such as
recommending movies, music, and books. They can learn patterns in user behavior and
use them to make personalized recommendations.
Video Analysis: LSTMs have been used for video analysis tasks such as object detection,
activity recognition, and action classification. They can be used in combination with
other neural network architectures, such as Convolutional Neural Networks (CNNs), to
analyze video data and extract useful information.
LSTM (Long
Short-term RNN (Recurrent
Feature Memory) Neural Network)
Has a special
memory unit that
allows it to learn Does not have a
long-term memory unit
dependencies in
Memory sequential data
Can be trained to
Can only be trained
process sequential
to process
data in both forward
sequential data in
and backward
one direction
Directionality directions
Long-term
dependency Yes Limited
learning
Ability to learn
Yes Yes
sequential data