Architecture of Inception-ResnetV2
Architecture of Inception-ResnetV2
4 Architecture of Inception-ResnetV2
In this process, the feature extraction process is done by using modified CNN (Inception-
ResnetV2). The Inception-ResNet v2 model's initial layers are designed to extract low-level
features like dots, lines, and edges. The network's deeper layers will then extract middle-level
properties such as sharpness, texture and picture shadowing in specific portions of the image. At
last, the deepest layer will extract high-level features such as shape from the rice leaf image in
order to detect the presence of disease.
The Inception-ResNetv2 is a convolutional neural architecture that builds on the
Inception family of architectures but incorporates residual connections (replacing the filter
concatenation stage of the Inception architecture). It uses a new Inception Module, called the
Inception-ResNet Module, which combines the benefits of both Inception and Residual
networks. These Inception-ResNet Modules allow for a deeper network with fewer parameters
and better performance. It also uses a batch normalization layer after each convolutional layer,
which improved the stability and performance of the network.
3.4.1 Inception
Conventional convolutional neural networks typically use convolutional and pooling
layers to extract features from the input data. However, these networks are limited in capturing
local and global features, as they typically focus on either one or the other. An Inception Module
is a building block used in the Inception network architecture for CNNs. The inception blocks are
intended to solve the problem of learning a combination of local and global features from the
input data. The idea behind the inception module is to learn a variety of feature maps at different
scales and these feature maps are then concatenated together to form a more comprehensive
representation of the input data. This allows the network to capture a wide range of features,
including both low-level and high-level features, which can be useful for tasks such as image
classification. By using inception blocks, the InceptionNet architecture can learn a more
comprehensive set of features from the input data, which can improve the network's performance
on tasks such as image classification. Inception was designed to be more efficient and faster to
train than other deep convolutional neural networks. The basic structure of an Inception Module
is a combination of multiple convolutional filters of different sizes applied in parallel to the
input data. It improves performance by allowing multiple parallel convolutional filters. Output
of each filter is concatenated together to form a single output feature map. Inception Module
also includes a max pooling layer, which takes the maximum value from a set of non-
overlapping regions of the input data. This reduces the spatial dimensionality of the data and
allows for translation invariance. The use of multiple parallel filters and max pooling layers
allows the Inception Module to extract features at different scales and resolutions, improving
the network's ability to recognize patterns in the input data. Thus, the Inception module
improves feature extraction, improving the network's performance.
In the design of Inception-ResNetv2, feature extraction is done using inception structural
designs. The main benefit of inception design is that they provide a significant quality gain at a
variable increase in computing needs when compared to shallower and less wide networks. By
employing effective factorization techniques, inception design tries to reduce the limitation and
increase the computation complexity.
The inception module consists of convolutions of different sizes that allow the network to
process features at different spatial scales. For dimensionality reduction, 1x1 convolutions are
used before the more expensive 3x3 and 5x5 convolutions as shown in fig. 2 below. In many
problems, we need the deeper network to process features at different spatial scales. To cope
with our challenges, such flexibility can be incorporated in convolutional neural networks by
introducing inception blocks.
In feature extractor portion, the convolutional layers use different sizes of filters such as
1x1 is used for dimensionality reduction or restoring dimensions of the feature maps, and
factorized the filters into smaller (2x(3x3)) and asymmetric (1x3,3x1 and 1x7,7x1) types. The
distinctive inception blocks shown in Fig. 3(a), 3(b) and 3(c).
Moreover, they factorize convolutions of filter size nxn to a combination of 1xn and
nx1 convolutions. For example, a 3x3 convolution is equivalent to first performing a 1x3
convolution, and then performing a 3x1 convolution on its output. They found this method to
be 33% more cheaper than the single 3x3 convolution. The filter banks in the module
were expanded (made wider instead of deeper) to remove the representational bottleneck. If the
module was made deeper instead, there would be excessive reduction in dimensions, and hence
loss of information.
The above three principles were used to build three different types of inception modules
(Let’s call them modules A, B and C in the order they were introduced. These names are
introduced for clarity, and not the official names).
3.4.4 Inception-ResNet V2