0% found this document useful (0 votes)
14 views

Architecture of Inception-ResnetV2

Architecture
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Architecture of Inception-ResnetV2

Architecture
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

3.

4 Architecture of Inception-ResnetV2
In this process, the feature extraction process is done by using modified CNN (Inception-
ResnetV2). The Inception-ResNet v2 model's initial layers are designed to extract low-level
features like dots, lines, and edges. The network's deeper layers will then extract middle-level
properties such as sharpness, texture and picture shadowing in specific portions of the image. At
last, the deepest layer will extract high-level features such as shape from the rice leaf image in
order to detect the presence of disease.
The Inception-ResNetv2 is a convolutional neural architecture that builds on the
Inception family of architectures but incorporates residual connections (replacing the filter
concatenation stage of the Inception architecture). It uses a new Inception Module, called the
Inception-ResNet Module, which combines the benefits of both Inception and Residual
networks. These Inception-ResNet Modules allow for a deeper network with fewer parameters
and better performance. It also uses a batch normalization layer after each convolutional layer,
which improved the stability and performance of the network.

The Inception-ResNetV2 architecture is the combination of recent deep-learning models:


Residual connection and the Inception architecture. This hybrid deep learning model has the
advantages of a residual network and retains the unique characteristics of the multiconvolutional
core of the Inception network. The Inception-ResNet network is a hybrid network inspired both
by Inception and the performance of ResNet. The operations involved in Inception-ResNetv2
based model has been explained in the following subsections.

3.4.1 Inception
Conventional convolutional neural networks typically use convolutional and pooling
layers to extract features from the input data. However, these networks are limited in capturing
local and global features, as they typically focus on either one or the other. An Inception Module
is a building block used in the Inception network architecture for CNNs. The inception blocks are
intended to solve the problem of learning a combination of local and global features from the
input data. The idea behind the inception module is to learn a variety of feature maps at different
scales and these feature maps are then concatenated together to form a more comprehensive
representation of the input data. This allows the network to capture a wide range of features,
including both low-level and high-level features, which can be useful for tasks such as image
classification. By using inception blocks, the InceptionNet architecture can learn a more
comprehensive set of features from the input data, which can improve the network's performance
on tasks such as image classification. Inception was designed to be more efficient and faster to
train than other deep convolutional neural networks. The basic structure of an Inception Module
is a combination of multiple convolutional filters of different sizes applied in parallel to the
input data. It improves performance by allowing multiple parallel convolutional filters. Output
of each filter is concatenated together to form a single output feature map. Inception Module
also includes a max pooling layer, which takes the maximum value from a set of non-
overlapping regions of the input data. This reduces the spatial dimensionality of the data and
allows for translation invariance. The use of multiple parallel filters and max pooling layers
allows the Inception Module to extract features at different scales and resolutions, improving
the network's ability to recognize patterns in the input data. Thus, the Inception module
improves feature extraction, improving the network's performance.
In the design of Inception-ResNetv2, feature extraction is done using inception structural
designs. The main benefit of inception design is that they provide a significant quality gain at a
variable increase in computing needs when compared to shallower and less wide networks. By
employing effective factorization techniques, inception design tries to reduce the limitation and
increase the computation complexity.

The inception module consists of convolutions of different sizes that allow the network to
process features at different spatial scales. For dimensionality reduction, 1x1 convolutions are
used before the more expensive 3x3 and 5x5 convolutions as shown in fig. 2 below. In many
problems, we need the deeper network to process features at different spatial scales. To cope
with our challenges, such flexibility can be incorporated in convolutional neural networks by
introducing inception blocks.

Fig 2 dimensionality reduction using 1x1 convolutions


Each Inception block is followed by a 1×1 convolution filter without activation called
filter expansion layer. This is done to scale up (increase) the dimensionality of filter bank to
match the depth of input to next layer. It is required to give back the dimensionality
minimization generated by the inception block. The pooling layers inside the Inception blocks
were replaced by residual connections. However, pooling operations can be found in reduction
blocks. Introduce residual connections that add the output of the convolution operation of the
inception module, to the input. For residual addition to work, the input and output after
convolution must have the same dimensions. Hence, we use 1x1 convolutions after the original
convolutions, to match the depth sizes (Depth is increased after convolution). Inception modules
comprise a series of smaller convolutional and pooling layers, which are combined to allow the
network to learn spatial and temporal features from the input data.

In feature extractor portion, the convolutional layers use different sizes of filters such as
1x1 is used for dimensionality reduction or restoring dimensions of the feature maps, and
factorized the filters into smaller (2x(3x3)) and asymmetric (1x3,3x1 and 1x7,7x1) types. The
distinctive inception blocks shown in Fig. 3(a), 3(b) and 3(c).

3.4.2 Factorization into smaller convolutions


The various methods of factorizing convolutions in different contexts have been defined
in this section, with the purpose of improving the software's computing performance. Due to the
complete convolutional nature of the inception model, each weight reflects a multiplication per
activation. As a result, any reduction in computation cost results in a reduction of variable count.
It states that along with appropriate factorization, more extricated variables are obtained resulting
in greater training rate. Furthermore, the computation and memory savings can lead to a larger
filter bank while allowing each model copy to be trained on its own machine. The Inception-
ResNet architecture incorporates the use of 3 different inception modules and 2 reduction
blocks. However, as this network is a hybrid between the Inceptionv2 and ResNet, the key
functionality of the Inception-ResNet is that the output of the inception module is added to the
input (i. e. data from previous layer). For this to work, the dimensions of the output from the
inception module and the input from the previous layer must have the same dimensions.
Factorization hence becomes important here to match these dimensions. Inception v2 sought to
further reduce computational costs by factorizing the filters. However, too much factorization
could also result in loss of information. Since the filter might not be able to accommodate all the
data from the previous layer, some data will be lost or unfilterd if it lies outside the dimension of
the convolution of the filter. Hence, factorization methods were used to improve computational
complexity, while also producing efficient performance.
Factorization is simply reducing the convolution sizes of the filters to smaller sizes to
reduce the computational cost. For example, a 5x5 convolutional node can be broken down to 2
layers of 3x3 nodes, which reduced the computational cost by a little less than 3x (2.78x for
efficient to be precise). This makes the network deeper and even more so for larger dimension
convolutions, and too deep of the network will eventually result in loss of information, since
there will need to be more factorization as mentioned in the precious paragraph. Another
factorization method involves replacing nxn nodes with 1xn and nx1 nodes. For example, a 5x5
convolution will be replaced by a 1x5 convolution and a 5x1 convolution. This makes the
network broader instead of longer (deeper) and hence reduces the chance of loss of information
from layers of factorization at the same time reducing the computational cost in general.
Factorize 5x5 convolutions to two 3x3 convolution operations to improve computational
speed. Although this may seem counterintuitive, a 5x5 convolution is 2.78 times more
expensive than a 3x3 convolution. So stacking two 3x3 convolutions infact leads to a boost in
performance. This is illustrated in the fig. 4 given below.

Fig. 4 Shows factorization of 5x5 convolutions to two 3x3 convolution operations

Moreover, they factorize convolutions of filter size nxn to a combination of 1xn and
nx1 convolutions. For example, a 3x3 convolution is equivalent to first performing a 1x3
convolution, and then performing a 3x1 convolution on its output. They found this method to
be 33% more cheaper than the single 3x3 convolution. The filter banks in the module
were expanded (made wider instead of deeper) to remove the representational bottleneck. If the
module was made deeper instead, there would be excessive reduction in dimensions, and hence
loss of information.

The above three principles were used to build three different types of inception modules
(Let’s call them modules A, B and C in the order they were introduced. These names are
introduced for clarity, and not the official names).

3.4.4 Inception-ResNet V2

Fig. 5 shows the detailed architecture of our proposed network InceptionResNet-V2.


Inception-ResNetV2 architecture consists of three main blocks namely A, B, and C containing
different number of stacked inception blocks. To identify the optimal number of inception
modules to be used in each block, we tried various combinations and found that network with 3
inception block in A, 5 blocks in B, and 2 blocks in C outperforms the other combinations. The
first layer of Inception-ResNetv2 architecture referred as stem is introduced before the Inception
blocks A, B and C. Although Inception v2 sought to lengthen or broaden the network for more
efficient performance, reduction blocks were introduced, specifically for the same functionality
(to regulate the bredth and/or depth of the network).

Fig. 5 shows the detailed architecture of Inception-ResnetV2


For the Inception part of the network, we have 3 inception blocks A at the 35x35 with
288 filters each. This is then reduced to a 17x17 grid with 768 filters using the reduction
technique. This is followed by 5 inception blocks B of the factorized inception modules. This is
reduced to a 8x8 grid by 1280 filters with the reduction technique. At the coarsest 8 x 8 level, we
have 2 Inception blocks C with a concatenated output filter bank size of 2048 for each tile.

Fig. 6 Block diagram of Inception-ResNetv2


A minor variation between the residual and non-residual inception variants is that in case
of Inception-ResNetv2, batch-normalization is used only on top of the conventional layers,
however, not on the top of summations. It is sensible to assume that a detailed usage of batch
normalization is beneficial; however, every model replica needs to train under an individual
graphical processing unit (GPU). Through the avoidance of batch-normalization, the inception
block count can be increased in a substantial way. The architecture of Inception-ResNetv2
having three basic structures, convolutional layer, activation layer, and pooling layer.

You might also like