Unit 4 notes
Unit 4 notes
Artificial Neural Networks (ANNs) are machine learning models inspired by the structure and
function of the human brain. They consist of layers of interconnected nodes (neurons), which
work together to process data and solve complex tasks.
1. Input Layer: The first layer that takes in raw data or features for processing.
2. Hidden Layers: Intermediate layers where computations happen. Each neuron in these
layers applies a weighted sum of its inputs, passes it through an activation function (like
ReLU, Sigmoid, or Tanh), and sends the output to the next layer. These layers extract
meaningful patterns and features.
3. Output Layer: The final layer that produces the output or prediction based on the learned
features.
Connections between neurons have weights, which are adjusted during training to minimize
error. This structure enables ANNs to learn from data and perform tasks like classification,
regression, and pattern recognition.
Perceptron
A perceptron is the simplest type of artificial neural network, consisting of a single neuron. It
takes multiple inputs, applies weights, sums them up, and passes the result through an activation
function (usually a step function) to produce an output.
Key Features:
o Used for binary classification problems.
o Can only solve linearly separable problems.
Multilayer Perceptron (MLP)
A multilayer perceptron is a more complex neural network with one or more hidden layers
between the input and output layers.
Key Features:
o Can solve non-linear problems by using activation functions like ReLU or Sigmoid.
o Each layer is fully connected to the next layer (fully connected network).
Structure:
o Input Layer: Takes the raw input.
o Hidden Layers: Perform computations to extract features.
o Output Layer: Produces the final result, such as class probabilities.
Learning:
MLP uses gradient descent and the backpropagation algorithm to adjust weights and
minimize error during training.
MLPs are the foundation of many advanced neural network architectures and are widely used in
applications like image recognition, speech processing, and time-series prediction.
Gradient Descent and Delta Rule in Neural Networks
1. Gradient Descent
Gradient Descent is a widely used optimization algorithm in machine learning and neural
networks to minimize the error between predicted and actual outputs by adjusting the weights
of the model. It helps the model learn from the data by minimizing the cost function (also called
the loss function), which measures how far off the model's predictions are from the actual results.
Objective: To find the optimal weights that minimize the error by adjusting them in the
direction of the steepest decrease in the cost function.
Working: The algorithm calculates the gradient (slope) of the cost function with respect
to each weight and adjusts the weights in the opposite direction of the gradient.
The Delta Rule is used to update the weights in a neural network, particularly in perceptrons and
single-layer networks. Its objective is to minimize the error by adjusting the weights based on the
difference between the predicted output and the actual output.
Goal: To reduce the error in predictions by adjusting the weights in proportion to the error.
Working: The weight adjustment is made based on the error between the actual output
and the predicted output, scaled by a learning rate.
Limitations: It is primarily effective for linear problems and cannot handle complex, non-
linear relationships well.
Multilayer Networks
Multilayer networks (also known as Multilayer Perceptrons or MLPs) are a type of artificial neural
network that consists of multiple layers of neurons. Unlike a single-layer perceptron, which has
only one layer of output neurons, multilayer networks have one or more hidden layers between
the input and output layers. These networks are capable of modeling complex, non-linear
relationships in data.
1. Input Layer:
o This layer receives the input features or raw data.
o Each neuron in the input layer represents one feature of the input data.
2. Hidden Layers:
o One or more layers that process the input data.
o These layers contain neurons that perform weighted summation of inputs, apply
an activation function (like ReLU or Sigmoid), and pass the result to the next
layer.
o The number of hidden layers and neurons per layer can vary, depending on the
complexity of the problem being solved.
3. Output Layer:
o The final layer that produces the network’s output.
o For classification tasks, this layer typically contains neurons equal to the number
of classes, with the activation function often being softmax or sigmoid. For
regression, a single neuron with a linear activation function is commonly used.
Non-linearity:
The hidden layers allow the network to model non-linear relationships, making it more
powerful than single-layer perceptrons. The combination of multiple hidden layers can
create a deep network that is capable of solving complex problems.
Activation Functions:
The neurons in hidden layers use activation functions like ReLU, Sigmoid, Tanh, etc., to
introduce non-linearity. This helps the network capture more complex patterns in data.
Backpropagation:
During training, backpropagation is used to update the weights of the network. This
involves calculating the gradient of the loss function with respect to each weight and
using it to adjust the weights to minimize the error.
Training Multilayer Networks
1. Forward Pass:
o Input data is passed through the network, layer by layer, to get the predicted
output.
2. Loss Calculation:
o The error (loss) between the predicted output and the actual target is calculated
using a loss function (e.g., Mean Squared Error for regression or Cross-Entropy
Loss for classification).
3. Backward Pass (Backpropagation):
o The loss is propagated backward through the network to update the weights
using gradient descent.
4. Weight Update:
o The weights are updated based on the calculated gradients and the learning rate,
with the goal of reducing the loss.
5. Advantages of Multilayer Networks
Applications
Backpropagation is the fundamental algorithm for training neural networks, particularly those
with multiple layers (multilayer perceptrons). It involves calculating the gradient (rate of change)
of the loss function with respect to the weights and biases in the network and using this
information to update the parameters to minimize the error.
1. Forward Propagation
The process starts with forward propagation, where the input data is passed through the
network, layer by layer, to compute the output. Each layer performs computations by taking
the input, applying weights, and using an activation function to produce an output. The final
output is the network's prediction.
The output is then compared with the target output (the true value).
The loss (or error) is calculated by measuring the difference between the predicted output and
the actual output.
2. Error Calculation
Once we have the predicted output and the actual output, the error or loss is calculated. This
error indicates how far off the prediction is from the true value. A loss function, such as Mean
Squared Error (MSE) for regression or Cross-Entropy for classification, is used to quantify this
difference.
The core of backpropagation involves adjusting the weights and biases to reduce the error. The
algorithm works by propagating the error backward through the network, layer by layer.
Starting from the output layer, the error is used to compute the gradient of the loss with
respect to each weight and bias in the network.
This is done using the chain rule of calculus, which helps in breaking down the gradient
calculation into manageable pieces.
The gradients tell us how much each weight and bias contributed to the error and by how much
they need to be adjusted.
Once the gradients are computed, they are used to update the weights and biases using an
optimization method like gradient descent. The idea is to move the weights in the direction that
reduces the error, iteratively improving the network's predictions.
The weights are updated by moving in the opposite direction of the gradient (hence minimizing
the error).
This update is controlled by a factor known as the learning rate, which determines how large a
step to take in the direction of the gradient.
5. Repetition
The process of forward propagation, error calculation, backpropagation, and weight updates is
repeated for several iterations (or epochs). With each iteration, the network gets better at
making predictions as the weights are gradually adjusted to minimize the error.
Generalization refers to a model's ability to make accurate predictions on new, unseen data
after being trained on a given dataset. It is the goal of any machine learning model to not only
perform well on the training data but also to generalize effectively to new data that it has not
seen during training. This concept is critical because a model that only performs well on its
training data but fails on new data is said to overfit.
Overfitting vs Underfitting:
Overfitting occurs when a model learns the details and noise in the training data to the extent
that it negatively impacts the performance on new data. This happens when the model
becomes too complex.
Underfitting happens when a model is too simple to capture the underlying patterns in the
data, leading to poor performance on both training and unseen data.
Bias-Variance Trade-off:
Bias refers to errors due to overly simplistic models that make strong assumptions about the
data (underfitting).
Variance refers to errors due to models that are too complex and sensitive to the fluctuations in
the training data (overfitting).
The bias-variance trade-off is the balance between the two, where you want a model that is
complex enough to capture the underlying patterns but not so complex that it overfits.
Regularization:
Regularization techniques like L1 (Lasso) and L2 (Ridge) regularization are used to prevent
overfitting by adding penalties to the model for having excessively large weights, encouraging
simpler models.
Dropout in neural networks is another technique that helps prevent overfitting by randomly
turning off some neurons during training, making the model less reliant on specific features.
Model Complexity:
A model with too many parameters (such as a deep neural network with many layers) might fit
the training data very well but struggle to generalize to unseen data. On the other hand, a
model that is too simple may not be able to capture the complexity of the data, leading to poor
generalization.
The quality and quantity of the training data directly affect generalization. A model trained on a
large, diverse, and representative dataset is more likely to generalize well to new data than a
model trained on a small or biased dataset.
Generalization ensures that the model can perform well not just on the data it was trained on
but also on new data, making it more reliable and robust in real-world applications.
It is a sign that the model has learned the underlying patterns in the data and not just
memorized the training examples.
Unsupervised learning refers to a type of machine learning where the model learns from data
that has not been labeled or categorized. In unsupervised learning, the algorithm tries to find
hidden patterns, structures, or relationships in the data without specific output labels or
predefined categories. One of the most popular unsupervised learning algorithms is the Self-
Organizing Map (SOM).
Self-Organizing Map (SOM) Algorithm
A Self-Organizing Map (SOM), also known as Kohonen Map, is a type of artificial neural network
used to perform unsupervised learning. SOMs are used for clustering, dimensionality reduction,
and visualization of high-dimensional data. They map high-dimensional input data into a lower-
dimensional (usually 2D) grid, preserving the topological relationships between data points.
The key idea behind SOM is to learn a mapping from high-dimensional input space to a 2D grid
of neurons (also called nodes) such that similar inputs are mapped to nearby neurons.
Initialization:
The SOM starts with a grid of neurons (usually 2D), where each neuron has a weight vector of
the same dimensionality as the input data.
The weight vectors are initialized randomly or using a small subset of the input data.
Competition:
For each input vector, the algorithm calculates the similarity between the input and all neurons
using a distance metric (usually Euclidean distance).
The neuron with the smallest distance to the input vector is called the Best Matching Unit
(BMU) or winning neuron.
Cooperation:
The BMU and its neighboring neurons are updated to move closer to the input vector. This update
is done by adjusting the weights of the neurons using a learning rule.
The size of the neighborhood around the BMU decreases over time, so the neurons that are
farther away from the BMU are updated less.
Adaptation:
The weights of the BMU and its neighbors are adjusted toward the input vector.
The learning rate and neighborhood size decrease over time, which allows the map to converge
and stabilize.
Iteration:
The process of competition, cooperation, and adaptation is repeated for each input vector
multiple times. As training progresses, the map becomes better at representing the data's
underlying structure.
Deep Learning
Deep Learning is a subset of machine learning that involves the use of artificial neural
networks to model and solve complex problems. Unlike traditional machine learning, deep
learning leverages multi-layered (deep) neural networks to automatically learn representations
of data at various levels of abstraction, enabling the system to perform tasks such as image
recognition, natural language processing, and more without manual feature engineering.
Deep learning models are particularly powerful for tasks involving large amounts of data and
complex patterns, such as speech recognition, image classification, and autonomous driving.
Multi-layered Architecture:
Deep learning networks consist of multiple layers of neurons, where each layer transforms the
data and passes it to the next layer. These layers are often categorized into three types: input
layers, hidden layers, and output layers.
The depth (i.e., the number of hidden layers) in deep learning models is what differentiates
them from traditional machine learning algorithms.
One of the most powerful aspects of deep learning is that it can automatically learn features
from raw data (e.g., pixels of an image) instead of relying on manually defined features. This is
particularly useful in tasks where feature extraction is complex or not well-defined.
End-to-End Learning:
Deep learning models learn directly from the data and can be trained in an end-to-end manner,
meaning that the model is trained from raw input to the final output without human
intervention at any stage.
Large Data Sets:
Deep learning models typically require large datasets to achieve optimal performance, as they
need significant amounts of data to learn complex patterns. The availability of large data sets and
powerful computational resources has made deep learning increasingly practical.
Training deep learning models requires substantial computational resources, especially for
complex models with many layers and parameters. This is typically achieved using Graphics
Processing Units (GPUs) or Tensor Processing Units (TPUs), which are specialized hardware for
handling the massive parallel computations required in deep learning.
A Convolutional Neural Network (CNN) is a type of deep learning model designed primarily for
tasks involving images and visual data. It's often used for image recognition, object detection,
and other computer vision tasks. The key advantage of CNNs is that they are very effective at
recognizing patterns and features in images by learning spatial relationships between pixels.
Convolution Layer:
This is the first layer in a CNN, where the model starts looking at the image.The layer uses small
filters (also called kernels) that slide across the image, one small piece at a time. These filters are
designed to detect simple patterns like edges, shapes, or textures in the image. Think of the filter
as a window that looks at a small portion of the image, identifies specific features, and moves to
the next area of the image to repeat the process.
Pooling Layer:
After the convolution layer detects features, the pooling layer comes into play.
This layer reduces the size of the image (or feature maps) to make the model faster and more
efficient. It keeps only the most important information.
Max pooling is a common technique where the highest value from a small region of the image
is chosen, helping preserve the strongest features. Another technique is average pooling, where
the average value from a region is selected.
Pooling helps in reducing the number of calculations needed and also prevents overfitting
(when the model becomes too focused on specific details of the training images).
After extracting features and reducing the image size, the fully connected layer connects all the
neurons together (just like a regular neural network).
This layer takes the features and uses them to classify the image. For example, it determines
whether the image is of a cat, dog, or car, etc.
Feature Extraction:
First, the image goes through the convolution layer where the filters scan for patterns like
edges or textures.
Data Reduction:
Then, the pooling layer reduces the image size while preserving the important features (like
edges and shapes).
Classification:
Finally, the reduced data is sent to the fully connected layer, which uses the extracted features
to classify the image into a category (such as "cat", "dog", etc.).
Advantages of CNN
CNNs are great at recognizing spatial relationships in images. They can detect edges, corners,
textures, and complex patterns, which are essential for understanding what the image contains.
Efficient Processing:
CNNs require less processing power compared to traditional image processing techniques
because pooling reduces the image size and the filters are reused across the entire image.
Object Recognition:
CNNs are very powerful at recognizing objects, even if they appear in different positions,
orientations, or sizes. This makes them ideal for image classification tasks.
1D convolution is primarily used when dealing with sequential data such as time-series
data, audio signals, or text data (such as sentences in natural language processing). It operates
over one-dimensional data like a sequence of numbers.
A 1D filter (or kernel) is applied to a sequence of data (e.g., an audio signal or time-series).
The filter slides over the data, calculating the dot product between the filter values and the
values of the data it covers.
The output is a feature map, which helps to detect patterns, trends, or features like peaks,
valleys, or changes over time.
2D Convolution
Use Case: Used when the data has a spatial structure, such as:
Working:
A 2D filter (kernel) slides over the input image in both width and height.
At each position, the filter performs element-wise multiplication with the corresponding pixels
of the image and sums the results.
Example:
For an image, a 2D convolution might detect edges, corners, or textures.
Output: A new 2D array (feature map) representing the extracted spatial features.
Training of a Neural Network
The process of training a neural network involves teaching the network to make accurate
predictions or classifications by adjusting its weights based on the data it is provided. The
objective is to minimize the error between the predicted output and the actual target output
(known as the loss function). The training process typically involves the following steps:
1. Forward Propagation
Input Data: The training process begins by feeding the input data into the neural network. In an
image classification task, for example, this input would be pixel values of the image.
Layer-by-Layer Processing: The data passes through each layer of the network. Each layer
performs a mathematical operation on the data, using the current weights and biases. The
result is passed to the next layer, continuing until the final output layer.
Output Generation: The final layer generates the network's prediction, such as a class label for
classification tasks or a value for regression tasks.
The loss function measures the difference between the network's predicted output and the
actual target output. It quantifies the error in the network's prediction.
Cross-Entropy Loss: Used for classification tasks (especially with multiple classes).
Example:
For a simple binary classification task, if the network predicts a value close to 1, but the true
value is 0, the loss will be high. A smaller difference (e.g., predicted value close to the true
value) will result in a lower loss.
Backpropagation is the algorithm used to minimize the error by adjusting the weights of the
network. The idea is to propagate the error backward from the output layer through the hidden
layers to the input layer.
Steps in backpropagation:
Calculate Gradient: For each layer, the gradient of the loss function with respect to the weights
is calculated. This gradient tells the network how to adjust its weights to reduce the error.
Update Weights: The weights are updated in the direction that minimizes the loss using an
optimization technique (like Gradient Descent).
Gradient Descent is the most commonly used optimization technique for training neural
networks. It helps in minimizing the loss function by updating the weights based on the
gradients.
Batch Gradient Descent: Uses the entire dataset to compute the gradient and update the
weights in one go.
Stochastic Gradient Descent (SGD): Updates the weights for each training example individually.
This can make the process faster but more noisy.
Mini-Batch Gradient Descent: A combination of batch and stochastic gradient descent. The
dataset is divided into smaller batches, and the weights are updated after each batch.
The learning rate is a critical hyperparameter in gradient descent. If the learning rate is too
high, the network may overshoot the optimal point. If it is too low, the network might take a
long time to converge.
5. Epochs and Iterations
Epochs: One epoch refers to one complete pass through the entire training dataset. During
training, the data is usually passed through the network multiple times (i.e., several epochs) to
fine-tune the weights.
Iterations: An iteration refers to one update of the weights, usually after processing a mini-
batch of data.
The model is trained over several epochs, and with each epoch, the weights are updated to
reduce the loss function. The goal is for the loss to decrease over time, improving the network's
performance.
Validation: During training, a validation dataset is often used to monitor the network's
performance on unseen data. This helps detect issues like overfitting (when the model learns
the training data too well but fails to generalize to new data).
Early Stopping: Stop training if the validation loss begins to increase, even though the training
loss is still decreasing.
Regularization: Techniques like L2 regularization (weight decay) or Dropout are used to prevent
overfitting by adding penalties to the loss function or randomly disabling certain neurons during
training.
After training, the model's performance is tested on a separate test dataset that it has never
seen before. This provides an unbiased evaluation of the model's ability to generalize to new,
unseen data.
Convolutional Neural Networks (CNNs) have been widely used in various real-world applications
due to their ability to process and analyze visual data effectively. Here, we'll look at some
prominent examples where CNNs are applied in fields like healthcare, speech recognition, and
autonomous driving:
1. Diabetic Retinopathy Detection Using CNN
Problem:
Diabetic Retinopathy (DR) is a condition that affects the eyes of diabetic patients and
can lead to blindness if not diagnosed and treated early. Detecting DR early can prevent
severe damage to vision.
Traditional methods of diagnosing DR involve manual inspection of retinal images by
experts, which can be time-consuming and prone to human error.
Image Classification: CNNs are well-suited for classifying medical images, such as retinal
scans, into different categories (e.g., normal, mild, moderate, and severe DR).
Process:
1. Data Input: Retinal images of patients are fed into the CNN model.
2. Convolutional Layers: The CNN extracts important features such as blood vessels,
exudates, and hemorrhages from the images.
3. Pooling Layers: Pooling layers reduce the size of the image while retaining critical
information, making the model more efficient.
4. Fully Connected Layers: After feature extraction, the fully connected layers
classify the images into various stages of diabetic retinopathy.
Outcome: By training the CNN on a large dataset of labeled retinal images, the model
can automatically detect diabetic retinopathy, helping doctors make faster and more
accurate diagnoses.
Advantages:
o Automation of image analysis reduces the workload of medical professionals.
o High accuracy in detecting DR, sometimes even surpassing human performance.
Problem:
A smart speaker is a voice-activated device that can play music, answer questions,
control smart home devices, and more. The challenge is to accurately recognize voice
commands in real-time, even in noisy environments.
Solution Using CNN:
While CNNs are primarily known for their success in image processing, they can also be
used in speech recognition tasks, especially in the preprocessing step of sound feature
extraction.
Process:
1. Sound Data Input: The input is a raw audio waveform or a spectrogram (which is
a visual representation of the frequency spectrum of sound).
2. Convolutional Layers: CNNs are used to automatically extract features from the
spectrograms (e.g., recognizing patterns such as different phonemes or words in
the audio).
3. Pooling and Fully Connected Layers: Pooling layers reduce the data’s
dimensionality, and fully connected layers classify the speech commands into
specific actions (like "play music", "set alarm", etc.).
Outcome: The CNN processes the audio input, helping the smart speaker accurately
recognize speech commands and perform the corresponding actions. In noisy
environments, CNNs can still extract the most relevant features, improving robustness.
Advantages:
o Improved speech recognition by focusing on meaningful features in the audio.
o Enhanced noise tolerance, making the system more reliable in real-world
scenarios.
Problem:
CNNs are integral to processing visual data in self-driving cars, particularly for tasks like
object detection, lane detection, and semantic segmentation.
Process:
1. Data Input: The cameras mounted on the car capture images or video frames of
the environment.
2. Convolutional Layers: The CNN extracts features from the images, such as road
signs, lanes, pedestrians, and other vehicles.
3. Pooling Layers: The pooling layers help reduce the complexity of the image while
preserving critical spatial features, such as the positions of obstacles.
4. Fully Connected Layers: After feature extraction, the model classifies the objects
in the image and helps the car understand the surrounding environment.
Outcome: The CNN can identify objects like pedestrians, vehicles, traffic signs, and road
conditions. The car's control system then uses this information to make decisions, such
as stopping for a pedestrian or turning at an intersection.
Advantages:
o Real-time object recognition that helps the car navigate and make safe driving
decisions.
o Improved safety with the ability to quickly detect and react to potential hazards.