Alex Net
Alex Net
Key Features:
‘ReLU’ is used as an activation function rather
than ‘tanh’
Batch size of 128
SGD Momentum is used as a learning
algorithm
Data Augmentation is been carried out like
flipping, jittering, cropping, colour
normalization, etc.
Max Pooling
The main idea behind a pooling layer is to
“accumulate” features from maps generated by
convolving a filter over an image.
Function : to progressively reduce the spatial
size of the representation to reduce the
number of parameters and computations in the
network. The most common form of pooling is
max pooling.
helps over-fitting by providing an abstracted
form of the representation.
Max pooling is done by applying a max filter
to (usually) non-overlapping sub-regions of
the initial representation.
AlexNet used pooling windows, sized 3×3
with a stride of 2 between the adjacent
windows.
Due to this overlapping nature of Max Pool,
the top-1 error rate was reduced by 0.4% and
the top-5 error rate was reduced by 0.3%
respectively.
If you compare this to using non-overlapping
pooling windows of size 2×2 with a stride of
2, that would give the same output
dimensions.
ReLU Non-Linearity
AlexNet demonstrates that saturating
activation functions like Tanh or Sigmoid can
be used to train deep CNNs much more
quickly.
The image below demonstrates that AlexNet
can achieve a training error rate of 25% with
the aid of ReLUs (solid curve).
Compared to a network using tanh, this is six
times faster (dotted curve). On the CIFAR-10
dataset, this was evaluated.
Data Augmentation
Overfitting can be avoided by showing Neural
Net various iterations of the same image.
It produces more data and compels the
Neural Net to memorise the main qualities.
Augmentation by Mirroring
Dropout
A neuron is removed from the neural
network during dropout with a probability of
0.5.
A neuron that is dropped does not make any
contribution to either forward or backward
propagation.
Each input is processed by a separate Neural
Network design.
The acquired weight parameters are
therefore more reliable and less prone to
overfitting.
AlexNet Summary
Architecture Implementation
Prediction
ResNet50
Skip Connections
Summary:
In summary, ResNet50 is a cutting-edge deep
convolutional neural network architecture that
was developed by Microsoft Research in 2015.
It is a variant of the popular ResNet architecture
and comprises of 50 layers that enable it to
learn much deeper architectures than
previously possible without encountering the
problem of vanishing gradients. The
architecture of ResNet50 is divided into four
main parts: the convolutional layers, the
identity block, the convolutional block, and the
fully connected layers. The convolutional layers
are responsible for extracting features from the
input image, the identity block and
convolutional block process and transform
these features, and the fully connected layers
make the final classification. ResNet50 has been
trained on the large ImageNet dataset,
achieving an error rate on par with human
performance, making it a powerful model for
various image classification tasks such as object
detection, facial recognition and medical image
analysis. Additionally, it has also been used as a
feature extractor for other tasks, such as object
detection and semantic segmentation.