0% found this document useful (0 votes)
32 views

Untitled document (2)

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Untitled document (2)

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

AlexNet: The First CNN to win Image

Net
By Great Learning Team 22959

Table of contents

1. AlexNet: History
2. CNN Architecture
3. AlexNet Architecture
4. Key Features of AlexNet
5. Data Augmentation
6. Results

This article is a AlexNet Tutorial which is focused on exploring AlexNet which became
one of the most popular CNN architectures.
History of AlexNet

AlexNet was primarily designed by Alex Krizhevsky. It was published with Ilya
Sutskever and Krizhevsky’s doctoral advisor Geoffrey Hinton, and is a Convolutional
Neural Network or CNN. Learn more about it in this CNN Course.

After competing in ImageNet Large Scale Visual Recognition Challenge, AlexNet shot to
fame. It achieved a top-5 error of 15.3%. This was 10.8% lower than that of runner up.
The primary result of the original paper was that the depth of the model was absolutely
required for its high performance. This was quite expensive computationally but was
made feasible due to GPUs or Graphical Processing Units, during training.

CNN Architectures

Before exploring AlexNet it is essential to understand what is a convolutional neural


network. Convolutional neural networks are one of the variants of neural networks
where hidden layers consist of convolutional layers, pooling layers, fully connected
layers, and normalization layers.

Convolution is the process of applying a filter over an image or signal to modify it. Now
what is pooling? It is a sample-based discretization process. The main reason is to
reduce the dimensionality of the input. Thus, allowing assumptions to be made about
the features contained in the sub-regions binned.

A detailed explanation of this can be found at Understanding Neural Networks.

A stack of distinct layers that transform input volume into output volume with the help of
a differentiable function is known as CNN Architecture. (e.g. holding the class scores)

In other words, one can understand a CNN architecture to be a specific arrangement of


the above-mentioned layers. Numerous variations of such arrangements have
developed over the years resulting in several CNN architectures. The most common
amongst them are:
1. LeNet-5 (1998)

2. AlexNet (2012)

3. ZFNet (2013)

4. GoogleNet / Inception(2014)

5. VGGNet (2014)

6. ResNet (2015)

AlexNet Architecture
AlexNet was the first convolutional network which used GPU to boost performance.

1. AlexNet architecture consists of 5 convolutional layers, 3 max-pooling layers, 2


normalization layers, 2 fully connected layers, and 1 softmax layer.

2. Each convolutional layer consists of convolutional filters and a nonlinear activation


function ReLU.

3. The pooling layers are used to perform max pooling.


4. Input size is fixed due to the presence of fully connected layers.

5. The input size is mentioned at most of the places as 224x224x3 but due to some
padding which happens it works out to be 227x227x3

6. AlexNet overall has 60 million parameters.

Model Details

The model which won the competition was tuned with specific details-

1. ReLU is an activation function

2. Used Normalization layers which are not common anymore

3. Batch size of 128


4. SGD Momentum as learning algorithm

5. Heavy Data Augmentation with things like flipping, jittering, cropping, color
normalization, etc.

6. Ensembling of models to get the best results.

AlexNet was trained on a GTX 580 GPU with only 3 GB of memory which couldn’t fit the
entire network. So the network was split across 2 GPUs, with half of the
neurons(feature maps) on each GPU.

This is the reason one can see a split in the architecture diagram.

Key Features

Overlapping Max Pooling


To down-sample an image or a representation, Max Pool is used. It reduces its
dimensionality by allowing assumptions to be made about features contained in the sub-
regions binned.

Overlapping Max Pool layers are similar to Max Pool layers except the adjacent
windows over which the max is calculated overlaps each other. The authors of AlexNet
used pooling windows, sized 3×3 with a stride of 2 between the adjacent windows. Due
to this overlapping nature of Max Pool, the top-1 error rate was reduced by 0.4% and
top-5 error rate was reduced by 0.3% respectively. If you compare this to using a non-
overlapping pooling windows of size 2×2 with a stride of 2, that would give the same
output dimensions.

ReLU Nonlinearity
Using ReLU non-linearity, AlexNet shows us that deep CNN’s can be trained much
faster with the help of saturating activation functions such as Tanh or Sigmoid. The
figure shown below shows us that with the help of ReLUs(solid curve), AlexNet can
achieve a 25% training error rate. This is six times faster than an equivalent network
that uses tanh(dotted curve). This was tested on the CIFAR-10 dataset.

Data Augmentation
When you show a Neural Net different variation of the same image, it helps prevent
overfitting. It also forces the Neural Net to memorize the key features and helps in
generating additional data.

Data Augmentation by Mirroring

Let’s say we have an image of a cat in our training set. The mirror image is also a valid
image of a cat. This mean that we can double the size of the training datasets by simply
flipping the image above the vertical axis.

Source: https://www.learnopencv.com/wp-content/uploads/2018/05/AlexNet-Data-Augmentation-
Mirror-Image.jpg
Data Augmentation by Random Crops

Also, cropping the original image randomly will lead to additional data that is just a
shifted version of the original data.

The authors of AlexNet extracted random crops sized 227×227 from inside the 256×256
image boundary, and used this as the network’s inputs. Using this method, they
increased the size of the data by a factor of 2048.
Source: https://www.learnopencv.com/wp-content/uploads/2018/05/AlexNet-Data-Augmentation-
Random-Crops.jpg

Dropout

During dropout, a neuron is dropped from the Neural Network with a probability of 0.5.
When a neuron is dropped, it does not contribute to forward propagation or backward
propagation. Every input goes through a different Neural Network architecture, as
shown in the image below. As a result, the learned weight parameters are more robust
and do not get overfitted easily.
Results

In the 2010 version of ImageNet challenge AlexNet vastly outpaced the second-best
model with 37.5% top -1 error vs 47.5% top-1 error , and 17.0% top-5 error to 37.55 top-
5 error. AlexNet was able to recognize off-center objects and most of its top 5 classes
for each image were reasonable. AlexNet won the 2012 competition with a top-5 error
rate of 15.3% compared to second place top-5 error rate of 26.2%.
Lecture 9 Stanford University: https://www.youtube.com/watch?v=DAOcjicFr1Y

The success of AlexNet is mostly attributed to its ability to leverage GPU for training and
being able to train these huge numbers of parameters.

In the following layers, there were multiple improvements over AlexNet resulting in
models like VGG, GoogleNet, and lately ResNet

You might also like