0% found this document useful (0 votes)
2 views

Alex Net

AlexNet is a convolutional neural network architecture developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, achieving a Top-5 error rate of 15.3% in the ImageNet challenge. It consists of 5 convolutional layers, 3 max-pooling layers, and 2 fully connected layers, utilizing ReLU activation and dropout for improved performance. The architecture significantly contributed to the resurgence of deep learning by demonstrating the effectiveness of CNNs in image classification tasks.

Uploaded by

BENAZIR AE
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Alex Net

AlexNet is a convolutional neural network architecture developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, achieving a Top-5 error rate of 15.3% in the ImageNet challenge. It consists of 5 convolutional layers, 3 max-pooling layers, and 2 fully connected layers, utilizing ReLU activation and dropout for improved performance. The architecture significantly contributed to the resurgence of deep learning by demonstrating the effectiveness of CNNs in image classification tasks.

Uploaded by

BENAZIR AE
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 26

AlexNet - Introduction

The convolutional neural network (CNN)


architecture known as AlexNet was created by
Alex Krizhevsky, Ilya Sutskever, and Geoffrey
Hinton.

That was the year ImageNet Large Scale Visual


Recognition Challenge (ILSVRC) was launched.

He and a few other researchers were proven


correct in two years with the publication of the
paper “Image Net Classification with Deep
Neural Networks” by Alex Krizhevsky, Ilya
Sutskever, and Geoffrey E. Hinton. The study
employed CNN to obtain a Top-5 error rate of
15.3% (percentage of not correctly identifying
an image’s genuine label among its top 5
guesses). The second-best outcome lagged far
behind (26.2%). Deep Learning became popular
once more after the dust settled.
Alex Krizhevsky, the architecture utilised in the
2012 study is known as AlexNet.
AlexNet Architecture

AlexNet Architecture- overview


Alexnet is a deep architecture, the authors
introduced padding to prevent the size of the
feature maps from reducing drastically.
The input to this model is the images of size
227X227X3.
For the first two convolutional layers, each
convolutional layers is followed by a
Overlapping Max Pooling layer.
Third, fourth and fifth convolution layers are
directly connected with each other.
The fifth convolutional layer is followed by
Overlapping Max Pooling Layer, which is then
connected to fully connected layers.
The fully connected layers have 4096 neurons
each and the second fully connected layer is
feed into a softmax classifier having 1000
classes.

 This was the first architecture that used


GPU to boost the training performance.
 AlexNet consists of 5 convolution layers, 3
max-pooling layers, 2 Normalized layers, 2
fully connected layers and 1 SoftMax layer.
 Each convolution layer consists of a
convolution filter and a non-linear activation
function called “ReLU”.
 The pooling layers are used to perform the
max-pooling function and the input size is
fixed due to the presence of fully connected
layers.
 The input size is mentioned at most of the
places as 224x224x3 but due to some padding
which happens it works out to be 227x227x3.
 AlexNet has over 60 million parameters.

Key Features:
 ‘ReLU’ is used as an activation function rather
than ‘tanh’
 Batch size of 128
 SGD Momentum is used as a learning
algorithm
 Data Augmentation is been carried out like
flipping, jittering, cropping, colour
normalization, etc.

AlexNet was trained on a GTX 580 GPU with


only 3 GB of memory which couldn’t fit the
entire network. So the network was split across
2 GPUs, with half of the neurons(feature maps)
on each GPU.

Convolution and Maxpooling Layers


Convolution and max-pooling layers are
fundamental building blocks of AlexNet.
These layers extract features and reduce
spatial dimensions, enabling efficient
processing while retaining critical image
information.
 Filters: 96 filters, each of size 11×11.
 Stride: 4.
 Activation: ReLU.
 Output Feature Map: 55x55x96.
Note: To calculate the output size of a
convolution layer, use the formula:

The number of filters becomes the number of


channels in the output feature map.
First Max-Pooling Layer
 Pool Size: 3×3.
 Stride: 2.
 Output Feature Map: 27x27x96.
Second Convolution Layer
 Filters: 256 filters, each of size 5×5.
 Stride: 1, with padding of 2.
 Activation: ReLU.
 Output Feature Map: 27x27x256.
Second Max-Pooling Layer
 Pool Size: 3×3.
 Stride: 2.
 Output Feature Map: 13x13x256.
Third Convolution Layer
 Filters: 384 filters, each of size 3×3.
 Stride: 1, with padding of 1.
 Activation: ReLU.
 Output Feature Map: 13x13x384.
Fourth Convolution Layer
 Filters: 384 filters, each of size 3×3.
 Stride and Padding: Both set to 1.
 Activation: ReLU.
 Output Feature Map: Remains 13x13x384.
Final Convolution Layer
 Filters: 256 filters, each of size 3×3.
 Stride and Padding: Both set to 1.
 Activation: ReLU.
 Output Feature Map: 13x13x256.
 Increasing Filters: The number of filters
increases as we go deeper, allowing for
more complex feature extraction.
 Decreasing Filter Size: The filter size
reduces in each layer, from larger filters at
the beginning to smaller ones deeper in
the architecture, resulting in a smaller
feature map shape.
Fully Connected and Dropout Layers
After this, we have our first dropout layer.
The drop-out rate is set to be 0.5.
Then we have the first fully connected layer
with a relu activation function. The size of
the output is 4096.
Next comes another dropout layer with the
dropout rate fixed at 0.5.
This followed by a second fully connected
layer with 4096 neurons and relu activation.
Finally, we have the last fully connected
layer or output layer with 1000 neurons as
we have 10000 classes in the data set.
The activation function used at this layer is
Softmax.
This is the architecture of the Alexnet
model. It has a total of 62.3 million
learnable parameters.

Max Pooling
The main idea behind a pooling layer is to
“accumulate” features from maps generated by
convolving a filter over an image.
Function : to progressively reduce the spatial
size of the representation to reduce the
number of parameters and computations in the
network. The most common form of pooling is
max pooling.
helps over-fitting by providing an abstracted
form of the representation.
Max pooling is done by applying a max filter
to (usually) non-overlapping sub-regions of
the initial representation.
AlexNet used pooling windows, sized 3×3
with a stride of 2 between the adjacent
windows.
Due to this overlapping nature of Max Pool,
the top-1 error rate was reduced by 0.4% and
the top-5 error rate was reduced by 0.3%
respectively.
If you compare this to using non-overlapping
pooling windows of size 2×2 with a stride of
2, that would give the same output
dimensions.

ReLU Non-Linearity
AlexNet demonstrates that saturating
activation functions like Tanh or Sigmoid can
be used to train deep CNNs much more
quickly.
The image below demonstrates that AlexNet
can achieve a training error rate of 25% with
the aid of ReLUs (solid curve).
Compared to a network using tanh, this is six
times faster (dotted curve). On the CIFAR-10
dataset, this was evaluated.
Data Augmentation
Overfitting can be avoided by showing Neural
Net various iterations of the same image.
It produces more data and compels the
Neural Net to memorise the main qualities.
Augmentation by Mirroring

Consider that our training set contains a picture


of a cat. A cat can also be seen as its mirror
image. This indicates that by just flipping the
image above the vertical axis, we may double
the size of the training datasets.

Data Augmentation by Mirroring

Augmentation by Random Cropping of Images

Randomly cropping the original image will also


produce additional data that is simply the
original data shifted.
For the network’s inputs, the creators of
AlexNet selected random crops with
dimensions of 227 by 227 from within the 256
by 256 image boundary. They multiplied the
size of the data by 2048 using this technique.

Data Augmentation by Random Cropping

Dropout
A neuron is removed from the neural
network during dropout with a probability of
0.5.
A neuron that is dropped does not make any
contribution to either forward or backward
propagation.
Each input is processed by a separate Neural
Network design.
The acquired weight parameters are
therefore more reliable and less prone to
overfitting.

AlexNet Summary
Architecture Implementation

Import Libraries and Load the Dataset


For the implementation process, we will be
taking a part of the ImageNet dataset by
scraping images over the internet using a
python library named ‘Beautiful Soup’ and will
be passing this dataset on our model to check
how is the performance of the AlexNet
architecture.
Pre-processing
Once we have scraped the images we will be
storing the images according to the data labels
and we will pre-process the data.

Define the Model.


We will be creating the AlexNet Architecture
from scratch, although there is a pre-defined
function in Keras that will help you to run the
AlexNet Architecture.
Initialize the training parameters

Train the model

Prediction
ResNet50

ResNet50 is a deep convolutional neural


network (CNN) architecture that was developed
by Microsoft Research in 2015.

It is a variant of the popular ResNet


architecture, which stands for “Residual
Network.”

The “50” in the name refers to the number of


layers in the network, which is 50 layers deep.

ResNet50 is a powerful image classification


model that can be trained on large datasets and
achieve state-of-the-art results.

One of its key innovations is the use of residual


connections, which allow the network to learn a
set of residual functions that map the input to
the desired output.
These residual connections enable the network
to learn much deeper architectures than was
previously possible, without suffering from the
problem of vanishing gradients.

The architecture of ResNet50 is divided into


four main parts: the convolutional layers, the
identity block, the convolutional block, and the
fully connected layers.

The convolutional layers are responsible for


extracting features from the input image, while
the identity block and convolutional block are
responsible for processing and transforming
these features.

Finally, the fully connected layers are used to


make the final classification.

The convolutional layers in ResNet50 consist of


several convolutional layers followed by batch
normalization and ReLU activation.
These layers are responsible for extracting
features from the input image, such as edges,
textures, and shapes.

The convolutional layers are followed by max


pooling layers, which reduce the spatial
dimensions of the feature maps while
preserving the most important features.

The identity block and convolutional block are


the key building blocks of ResNet50.

The identity block is a simple block that passes


the input through a series of convolutional
layers and adds the input back to the output.

This allows the network to learn residual


functions that map the input to the desired
output.

The convolutional block is similar to the identity


block, but with the addition of a 1x1
convolutional layer that is used to reduce the
number of filters before the 3x3 convolutional
layer.

The final part of ResNet50 is the fully connected


layers.

These layers are responsible for making the


final classification.

The output of the final fully connected layer is


fed into a softmax activation function to
produce the final class probabilities.
How it solved the problem of vanishing
gradients:

Skip Connections

Skip connections, also known as residual


connections, are a key feature of the ResNet50
architecture. They are used to allow the
network to learn deeper architectures without
suffering from the problem of vanishing
gradients.

Vanishing gradients is a problem that occurs


when training deep neural networks, where the
gradients of the parameters in the deeper
layers become very small, making it difficult for
those layers to learn and improve. This problem
becomes more pronounced as the network
becomes deeper.

Skip connections address this problem by


allowing the information to flow directly from
the input to the output of the network,
bypassing one or more layers. This allows the
network to learn residual functions that map
the input to the desired output, rather than
having to learn the entire mapping from
scratch.

In ResNet50, skip connections are used in the


identity block and convolutional block. The
identity block passes the input through a series
of convolutional layers and adds the input back
to the output, while the convolutional block
uses a 1x1 convolutional layer to reduce the
number of filters before the 3x3 convolutional
layer and then adds the input back to the
output.

The use of skip connections in ResNet50 allows


the network to learn deeper architectures while
still being able to train effectively and prevent
vanishing gradients.

Summary:
In summary, ResNet50 is a cutting-edge deep
convolutional neural network architecture that
was developed by Microsoft Research in 2015.
It is a variant of the popular ResNet architecture
and comprises of 50 layers that enable it to
learn much deeper architectures than
previously possible without encountering the
problem of vanishing gradients. The
architecture of ResNet50 is divided into four
main parts: the convolutional layers, the
identity block, the convolutional block, and the
fully connected layers. The convolutional layers
are responsible for extracting features from the
input image, the identity block and
convolutional block process and transform
these features, and the fully connected layers
make the final classification. ResNet50 has been
trained on the large ImageNet dataset,
achieving an error rate on par with human
performance, making it a powerful model for
various image classification tasks such as object
detection, facial recognition and medical image
analysis. Additionally, it has also been used as a
feature extractor for other tasks, such as object
detection and semantic segmentation.

“ResNet50, with its deep residual networks,


opened the door for the training of even deeper
architectures and helped push the boundaries
of what was possible in computer vision.”
— Yann LeCun, Director of AI Research at
Facebook

You might also like