0% found this document useful (0 votes)

65 views

Light-Weight RetinaNet For Object Detection

The document discusses light-weight versions of RetinaNet for object detection that aim to improve the trade-off between accuracy and computational cost. It proposes reducing the computational complexity of only the heaviest layers in RetinaNet, rather than reducing the input image size or changing the backbone network. This approach shows a more linear degradation in accuracy as computational cost is reduced, compared to input image scaling. The proposed light-weight RetinaNets achieve a 0.1-0.3% improvement in mean average precision for 1.15-1.8x reductions in floating point operations.

Uploaded by

santosh kumar

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views

Light-Weight RetinaNet For Object Detection

Uploaded by

santosh kumar

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Y. LI AND F.

REN: LIGHT-WEIGHT RETINANET FOR OBJECT DETECTION 1

arXiv:1905.10011v1 [cs.CV] 24 May 2019
Light-Weight RetinaNet for Object Detection
Yixing Li School of Computing, Informatics, and
[email protected] Decision Systems Engineering
Fengbo Ren Arizona State University
[email protected]

Abstract
Object detection has gained great progress driven by the development of deep learn-
ing. Compared with a widely studied task – classification, generally speaking, object
detection even need one or two orders of magnitude more FLOPs (floating point oper-
ations) in processing the inference task. To enable a practical application, it is essen-
tial to explore effective runtime and accuracy trade-off scheme. Recently, a growing
number of studies are intended for object detection on resource constraint devices, such
as YOLOv1, YOLOv2, SSD, MobileNetv2-SSDLite [11, 14, 16], whose accuracy on
COCO test-dev [8] detection results are yield to mAP around 22-25% (mAP-20-tier).
On the contrary, very few studies discuss the computation and accuracy trade-off scheme
for mAP-30-tier detection networks. In this paper, we illustrate the insights of why Reti-
naNet gives effective computation and accuracy trade-off for object detection and how
to build a light-weight RetinaNet. We propose to only reduce FLOPs in computational
intensive layers and keep other layer the same. Compared with most common way –
input image scaling for FLOPs-accuracy trade-off, the proposed solution shows a con-
stantly better FLOPs-mAP trade-off line. Quantitatively, the proposed method result in
0.1% mAP improvement at 1.15x FLOPs reduction and 0.3% mAP improvement at 1.8x
FLOPs reduction.

1 Introduction
Object detection serves as an important role in computer vision-based tasks [10, 16, 17]. It
is the key module in face detection, tracking objects, video surveillance, pedestrian detection
etc [13, 19]. With the recent development of deep learning, it boosts the performance of ob-
ject detection tasks. However, regarding the computational complexity (in terms of FLOPs),
the detection network can possibly consume three orders of magnitude more FLOPs than
a classification network, which makes it much more difficult to move towards low-latency
inference.
Recently, there have been growing number of studies investigating into detection on
resource constraint devices, such as mobile platforms. As the main concern of resource
constraint devices is the memory consumption, the existing solutions, such as YOLOv1,
YOLOv2, SSD, MobileNetv2-SSDLite [11, 14, 16] have pushed it hard to reduce the mem-
ory consumption by trading off their accuracy performance. Their accuracy on large dataset
– COCO test-dev 2017 [8] detection results are yield to mAP of 22-25%. Here, we use
the mAP as the indicator to categorize these solutions as mAP-20-tier. On the other side,
in the mAP-30-tier, popular solutions can be listed as Faster R-CNN, RetinaNet, YOLOv3
The source code is available at https://github.com/PSCLab-ASU/LW-RetinaNet.
2 Y. LI AND F. REN: LIGHT-WEIGHT RETINANET FOR OBJECT DETECTION

[10, 15, 17] and their variants. As these solutions are commonly deployed on mid- or high-
end GPUs or FPGAs, the memory resource is probably enough for preloading the weights.
In addtion, [7] also verifies the linear relation between number FLOPs and inference run-
time for the same kind of network. By applying Faster R-CNN, RetinaNet and YOLOv3
[10, 15, 17] on the same task for COCO detection dataset, which takes an input image
around 600x600-800x800, the mAP will hit in the range of 33%-36%. However, the FLOPs
of Faster R-CNN [17] is around 850 GFLOPs (gigaFLOPs), which is at lease 5x more than
that of RetinaNet and YOLOv3 [15]. Apparently, Faster R-CNN is not competitive in the
computational efficiency. From YOLOv2 [14] to YOLOv3 [15], it is interesting that the
authors have aggressively increased the number of FLOPs from 30 to 140 GFLOPs to gain
mAP improvement from 21% to 33%. Even with that, its mAP is 2.5% lower than RetinaNet
with 150 GFLOPs. This observation inspires us to take the RetinaNet as the baseline to
explore a more light-weight version.
There are two common methods to reduce the FLOPs in a detection network. One way
is to switch to another backbone, while the other is reducing the input image size. The first
one definitely results in noticeable accuracy drop if one switches from one of the ResNet
backbones [5] to the other. Typically it won’t be consider as a good accuracy-FLOPs trade-
off scheme with small variation. With regard to reduce the input image size, it is an intuitive
way to reduce the FLOPs. However, the accuracy-FLOPs trade-off line shows degradation
in an exponentially trend [7]. There is an opportunity to find a more linear degradation
trendancy line for better accuracy-FLOPs trade-off. We propose to only replace the certain
branches/layers of the detection network with light-weight architecture and keep the rest of
the network unchanged. For the RetinaNet, the heaviest branch is the succeeding layers
of the finest FPN (P3 in Fig. 1), which takes up 48% of the total FLOPs. We propose
different light-weight architecture variants. More importantly, the proposed method can also
be applied to other blockwise-FLOPs-imbalance detection networks.
The contributions of this paper can be summarized as follows: (1) We proposed only to
reduce the heaviest bottleneck layer for the light-weight RetinaNet mAP-FLOPs trade-off.
(2) The proposed solution shows a constantly better mAP-FLOPs trade-off line in a linear
degradation trend, while the input image scaling method degrades in a more exponentially
trend. (3) Quantitatively, the proposed method result in 0.1% mAP improvement at 1.15x
FLOPs reduction and 0.1% mAP improvement at 1.15x FLOPs reduction and 0.3% mAP
improvement at 1.8x FLOPs reduction.

2 Related Works
2.1 High-end object detection networks (mAP-30-tier)
Faster RCNN [17] is an advanced architecture, which boosts both the accuracy and runtime
performance from R-CNN and Fast R-CNN [1, 2].
The main body of Faster RCNN [17] is composed of three parts – Feature Network,
Region Proposal Network (RPN) and Detection Network. As Faster RCNN [17] replaces
the selective search (used by Fast R-CNN) with RPN, it significantly reduces the runtime
of gernerate the region proposals. However, in the Faster R-CNN inference stage, there are
still around 256-1000 boxes will feed into the detection network, which is really expensive
to process this large batch of data. As for a Faster RCNN to process COCO dataset detection
task with Inception-ResNetV2 [18] backbone, the total numbers of FLOPs can comes up to
Y. LI AND F. REN: LIGHT-WEIGHT RETINANET FOR OBJECT DETECTION 3

38
FasterRCNN-
RetinaNet-800 InceptionResnetV2
36 RetinaNet-700
RetinaNet-600

34 YOLOv3-608
RetinaNet-500

32 RetinaNet-400 YOLOv3-416

30 mAP30-tier
mAP/%

YOLOv3-320

26 SSD

SSDLite-
24 MobileNetv2 mAP20-tier
YOLOv2-416
22
SSDLite-MobileNet
20
0 50 100 150 200 38250
GFLOPs
36
RetinaNe

34 RetinaNet-500

32
Figure 1: Example of a short caption, which should be centered. RetinaNet-400 YOLOv3

mAP/%
YOLOv3-320

over 800 GFLOPs. 28

Compared with Faster RCNN [17], the RetinaNet [10] targets a simpler
26 designSSDfor gain-

ing speedup. A feature pyramid network (FPN) [9] is attached to its backbone
SSDLite-
to generate
24 MobileNetv2
multi-scale pyramid features. Then, pyramid features go into classification and regression
YOLOv2-416
branches, whose weights can be shared across different levels of the 22
FPN. The focal loss
is applied to compensate the accuracy drop, which makes its accuracy performance to be
SSDLite-MobileNet
20
comparable with the Faster RCNN. 0 50 1

2.2 Light-weight object detection networks (mAP-20-tier)

The YOLO network family are among the most popular ones. The most distinguished feature
of YOLO is their predefined grid cell. The input image can be cut into SxS grid cells, and
each cell only predicts one object. On the good side, this idea apparently helps to reduce the
computation complexity. However, in the meantime, it increases the chances of undetected
objects and has relative bad performance on detecting small objects. The YOLOv1 [16] is
only evaluated on relative small datasets (PASCAL VOC), with the main contribution of
enabling real-time inference. From YOLOv2 [14] to YOLOv3 [15], the mAP performance
results on COCO test-dev2015 dataset is boosted from 21.6% to 33.0%. It’s worth noting
that the accuracy gain of YOLOv3 comes along with the FLOPs increment from 63 GFLOPs
to 141 GFLOPs. Interestingly, YOLOv3 [15] should not be categorized as a light-weight one
anymore, as the FLOPs counts and mAP is closed to those of RetinaNet-ResNet50-FPN (156
GFLOPs and mAP = 35.7%). We also include other light-weight objection detection network
such as SSD and SSDLite [11] in Fig.1 for providing an overview of mAP-20-tier detection
networks.
As the RetinaNet can win over YOLOv3 on both FLOPs and mAP, this inspire us to
4 Y. LI AND F. REN: LIGHT-WEIGHT RETINANET FOR OBJECT DETECTION

ResNet-50 FPN Detection

backend
1/32 Res5
1/32 1/64 D3-D7
0.5x
0.5x 1/32 M5 P5 P6
2x W×
1/16 Res4 0.5x W×H×256 H×KA
1/16
0.5x 1/16 M4 P4 P7
2x 1/128
1/8 Res3 W×H×256
W×H×4A
1/8
0.5x 1/8 M3 P3

1/4 Res2

0.5x
Image Res1 1/2

Figure 2: RetinaNet (ResNet50-FPN-800x800) network architecture.

take the RetinaNet as the baseline design to explore a better scheme for accuracy and FLOPs
tradeoff for the high-end detection tasks.

3 Light-weight RetinaNet
In Section 2, we have explained why we think the RetinaNet network architecture has the
potential to be tailored for a better accuracy and FLOPs traderoff. Here, in this Section, first
we will further analyze the RetinaNet network with a focus of the distribution of the number
of floating-point operations (FLOPs) across different layers in Section 3.1. Then Section 3.2
illustrates the scheme to help RetinaNet to lose weight.

3.1 RetinaNet Primer

The RetinaNet architecture is composed of three parts – a backbone, a feature pyramid net-
work (FPN) [9] and a detection back-end as shown in Fig. 2. The image will first be pro-
cessed by the backbone, which usually is the ResNet Architecture. Here, it is worthy to
notify that although MobileNet’s performance [6] is on a bar with ResNet in the classifica-
tion tasks, it is not true that MobileNet [6] can be an equivalence substitution for the ResNet.
From both [7] and our observation, use MobileNet [6] as the backbone for detection task will
suffer with much more accuracy drop than it does for the classification. One main reason
is that the confidence scores of a MobileNet-based backbone is reduced by trading off with
lower computation costs. Therefore, it wouldn’t be a desirable choice of the backbone for
high precision object detection networks. The backbone together with the succeeding FPN
forms an encoder-decoder-like network. The benefit of the FPN is that it merges the features
of consecutive layers from the coarsest to the finest level, which effectively propagate the
features in different level and different scale to the succeeding layer. After then, the multi-
scale pyramid features (P3-P7) will feed into the back-end where two detection branches
used for bounding box regression and object classification. Note that, for the detection and
Y. LI AND F. REN: LIGHT-WEIGHT RETINANET FOR OBJECT DETECTION 5

ResNet-50 FPN Detection backend

48.1

6.9 10.5 5.1 11.5

1.0 5.1 0.8 0.4 4.5 4.5 1.1
0.2 0.1 0.3

0.8 4.0 3.7 0.4 0.8 1.7 2.1 2.1 3.9 3.9 3.9 3.9 3.9
22.1
memory/% FLOPs/%
42.9

Figure 3: The FLOPs and memory (parameter) distribution of RetinaNet across different
blocks.

bounding box branches do not share weights. While the weights of each branch are shared
across pyramid features (P3-P7).
The FLOPs distribution of RetinaNet across different blocks is shown in Fig. 3. Here,
each block is corresponding to the same block in Fig. 2. For the detection backend D3-D7,
they refer to succeeding layers of P3-P7, respectively. As in the original design, D3-D7
share the same weight parameters, we just show the average memory cost of D3-D7 in Fig.3
. The FLOPs count of the D3 block significantly dominates the total FLOPs count. This
unbalanced FLOPs distribution is quite different from that of the ResNet architecture, which
has small variant across different blocks. The unbalanced FLOPs distribution gives us the
chance to get a good overall FLOPs reduction percentile by only reducing the cost of the
heaviest layer. Quantitatively, for example, if we can reduce the FLOPs of D3 by half, the
total FLOPs can be reduced by 24%. In the following subsections, we will discuss the main
insights of how to get a tiny back-end.

3.2 Tiny back-end solution

3.2.1 Light-weight block

Intuitively, we can reduce the filter size in order to get FLOPs reduction. As shown in Fig. 4,
we have propose different block design for the detection branches of ResNet. The D-block-
v1 is applying the MobileNet [6] building block here. A 3x3 depth-wise (dw) convolution
is followed by a 1x1 convolutional block to substitute one orginal layer. The D-block-v2
alternately places 1x1 and 3x3 kernels. This one is inspired by the YOLOv1 [16], which has
replaces the 3x3 kernels without introducing residual blocks. In our design, we even make
it simpler to keep number of filters fixed across different layers. The D-block-v3 is more
aggressive, which replaces all the 3x3 convolutions with 1x1 convolutions.
Apparently, the light-weight block will cause certain accuracy drop to tradeoff less com-
putation cost. Therefore, we introduce limited overheads to compensate the accuracy drop
here, which is stated in the next subsection.
6 Y. LI AND F. REN: LIGHT-WEIGHT RETINANET FOR OBJECT DETECTION

3x3 conv, dw 1x1 conv, 256 1x1 conv, 256

1x1 conv, 256
3x3 conv, 256 1x1 conv, 256
3x3 conv, dw
1x1 conv, 256 1x1 conv, 256 1x1 conv, 256

3x3 conv, dw 3x3 conv, 256 1x1 conv, 256

1x1 conv, 256
3x3 conv, dw
1x1 conv, 256

D-block-v1 D-block-v2 D-block-v3

Figure 4: light-weight blocks for detection backend.

3x3 conv D4-D7

P7
W×H×KA
W×H×256
P6
3x3 conv
D3-D7
P7 P5
W×H×4A
W×H×256
P6 W×H×KA P4
W×H×256
Shared weight for D4-D7
P5
D-block-v1/2/3 D3
P4 W×H×4A
W×H×256
W×H×KA
WxHx256
P3

Fully shared weight for D3-D7 P3

W×H×4A
WxHx256

Independent weight for D3

(a) (b)

Figure 5: Fully and partially shared weights for detection backend.

3.2.2 Partially shared weights

As illustrated in Section 3.2.1., the light-weight detection blocks are trading off lower com-
putational complexity with accuracy drop. To compensate the accuracy drop, we proposed
to replace the fully shared weight scheme in the original RetinaNet to a partial shard weight
scheme. As shown in Fig. 3, P3-P7 is the multi-scale feature maps outputs of FPN, which
then feed into detection backend D3-D7 respectively. Although D3-D7 shared the weight
parameters, D3-D7 have unique input size (P3-P7) and are processed in serial. Fig. 3(a)
is the original one that D3-D7 fully share the weights. In Fig. 3(b), only D4-D7 share the
weights with original configuration, while D3 will be processed by the light-weight D-block-
v1/v2/v3 that proposed in Section 3.2.1.
Y. LI AND F. REN: LIGHT-WEIGHT RETINANET FOR OBJECT DETECTION 7

Light-weight block scale mAP ∆mAP\% GFLOPs ∆FLOPs/%

original 800 35.7 0 156 0
D-block-v1 800 34.3 1.4 135 15.4
D-block-v2 800 35.6 0.1 135 6.4
D-block-v3 800 35.1 0.6 89 15.4

Table 1: Comparison between different light-weight block.

Partially shared weights scheme mainly has two advantages. For one thing, as D3 has
its independent weight parameters, it can learn more tailored features for its branch which
can compensate the accuracy drop brought by lower computational complexity. For another,
this enables us not to touch the rest of the network but only solving the heaviest bottleneck
block. Also, as the backbone (ResNet-50) dominate the memory consumption (as shown in
Fig. 3), the overhead of memory consumption here can be negligible. Quantitatively, weight
parameter increment introduced here is less than 1% of the total weights.

4 Results and discussion

4.1 Experimental setup
We perform our experiments in Caffe2 with 4 Titan X GPUs. We build upon the open source
code of RetinaNet in [3]. As the original work is training with 8 GPUs, we scales down
the base learning rate by 2x and extend the training epochs by 2x as suggested in [4]. Be-
sides, [12] proves the deep neural network is less easier to overfit when its computational
complextity is reduced by network compression. It also suggests to extend the training time
accordingly for better accuracy rate. As we reduce the total FLOPs in light-weight Reti-
naNet, we further extend the training epoch with the same factor of FLOPs reduction. In all
the experiments, we fix the network configuration as RetinaNet-ResNet50-FPN.

4.2 Performance on COCO dataset

The COCO dataset [8] is the considered as the most complicated one for object detection. As
we are targeting on discussing the trade-off for high-end object detection network, we only
perform experiments on COCO dataset (same as the original RetinaNet does [10]). We train
the light-weight RetinaNet on 2017 COCO training dataset and test it on COCO test-dev.
Table 1 shows the comparison among different light-weight blocks that we proposed in
Section 3.2.1. In this set of experiment, we only use the light-weight block in the regression
branch (for the bounding box) of detection backend, which is the upper branch shown in
Fig. 2 detection backend. The results of Table 1 show that the D-block-v1 – the one with
the MobileNet building block has 0.8% lower mAP compared with the D-block-v3, which
has the same FLOPs reduction percentile. It also aligns with our analysis in Section 3.2 that
although MobileNet is proved to a powerful light-weight classification network architecture,
MobileNet building block is not guaranteed to be the best building block substitution for
other vision-based tasks. Therefore, with the same scale of FLOPs reduction, we will choose
D-block-v3 instead of D-block-v1 in the following experiment. As the D-block-v2 performs
less aggressive FLOPs reduction, its mAP is only reduced by 0.1%, which is a good trade-off
for small scale FLOPs reduction (15%).
8 Y. LI AND F. REN: LIGHT-WEIGHT RETINANET FOR OBJECT DETECTION

Detection backend
Light-weight block
Classification Bouding box
LW-RetinaNet-v1 D-block-v2 √
LW-RetinaNet-v2 D-block-v3 √
LW-RetinaNet-v3 D-block-v3 √ √

Table 2: Configurations of different light-weight(LW) RetinaNet.

scale mAP AP50 AP75 APS APM APL GFLOPs ratio

RetinaNet 800 35.7 55.0 38.5 18.9 38.9 46.3 156 0.0
RetinaNet 700 35.1 54.2 37.7 18.0 39.3 46.4 119 1.3x
RetinaNet 600 34.3 53.2 36.9 16.2 37.4 47.4 88 1.8x
LW-RetinaNet-v1 800 35.4 54.4 38.2 18.3 38.7 46.0 135 1.1x
LW-RetinaNet-v2 800 35.1 54.3 37.7 17.9 38.4 45.7 114 1.4x
LW-RetinaNet-v3 800 34.6 53.1 37.3 15.7 38.7 44.6 89 1.8x

Table 3: Comparison of original RetinaNet and proposed light-weight RetinaNet.

The configurations for different versions of light-weight RetinaNet with D-block-v2 or

D-block-v3 light-weight blocks are shown in Table 2. Table 2 shows the information of
which light-weight block is applied to one or both of classification and bounding box (re-
gression) branch. The corresponding light-weight RetinaNet performance results are shown
in Table 3. As scaling down input image size is the most common practice for FLOPs and
accuracy trade-off, we also list the performance results of original RetinaNet with different
input scales (we directly cited the results from the RetinaNet paper). To give a more straight-
forward understanding of the proposed method versus input image scaling, we visualize the
FLOPs and accuracy trade-off in Fig. 6. Each data point in Fig. 6 is corresponding to one
row of results in Table 3. We mark the trending line of light-weight RetinaNet in red dot line
and that of original RetinaNet in blue dot line. As we stated before, the upper-left corner
means the best trade-off in this kind of mAP-FLOPs graph. As the red line is constantly
closer to the upper-left corner, it indicates that the proposed method has better mAP-FLOPs
trade-off than the conventional input image scaling method. The difference between these
two methods result in 0.1% mAP at the same number of FLOPs with low reduction ratio.
However, as we further reduce the number of FLOPs, the proposed method shows a trend in
linear degradation, while the input image scaling method degrade in a more exponentially di-
rection. Fig. 6 clearly shows a divergence at the GFLOPs around 90, where the conventional
method is yielded to 0.3% more accuracy drop than the proposed method.
As any detection methods with FPN structure can result in an imbalanced FLOPs distri-
bution, the proposed method can be potentially applied to such kinds of detection network
for a better choice for mAP-FLOP.

5 Conclusion
In this paper, We proposed only to reduce the FLOPs in the heaviest bottleneck layer for
a blockwise-FLOPs-imbalance RetinaNet to get its light-weight version. The proposed so-
lution shows a constantly better mAP-FLOPs trade-off line in a linear degradation trend,
while the input image scaling method degrades in a more exponentially trend. Quantita-
Y. LI AND F. REN: LIGHT-WEIGHT RETINANET FOR OBJECT DETECTION 9

35.8
LW-RetinaNet-
35.6 v1
RetinaNet-800
35.4 LW-RetinaNet-
v2 +0.1%
35.2 mAP
mAP/%
35 LW-RetinaNet-
RetinaNet-700
34.8 v3

34.6
+0.3%
34.4 mAP

34.2 RetinaNet-600
34
80 90 100 110 120 130 140 150 160
GFLOPs

Figure 6: FLOPs and mAP trade-off for input image size scaling versus the proposed method.

tively, the proposed method result in 0.1% mAP improvement at 1.15x FLOPs reduction and
0.3% mAP improvement at 1.8x FLOPs reduction. The proposed method can be potentially
applied to any FPN-based blockwise-FLOPs-imbalance detection network.

References
[1] Ross Girshick. Fast r-cnn. In Proceedings of the IEEE international conference on
computer vision, pages 1440–1448, 2015.
[2] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Region-based con-
volutional networks for accurate object detection and segmentation. IEEE transactions
on pattern analysis and machine intelligence, 38(1):142–158, 2016.
[3] Ross Girshick, Ilija Radosavovic, Georgia Gkioxari, Piotr Dollár, and Kaiming He.
Detectron. https://github.com/facebookresearch/detectron, 2018.
[4] Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo
Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, large minibatch
sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.
[5] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning
for image recognition. In Proceedings of the IEEE conference on computer vision and
pattern recognition, pages 770–778, 2016.
[6] Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun
Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Effi-
cient convolutional neural networks for mobile vision applications. arXiv preprint
arXiv:1704.04861, 2017.
[7] Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara,
Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, et al.
Speed/accuracy trade-offs for modern convolutional object detectors. In Proceedings
10 Y. LI AND F. REN: LIGHT-WEIGHT RETINANET FOR OBJECT DETECTION

of the IEEE conference on computer vision and pattern recognition, pages 7310–7311,
2017.
[8] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ra-
manan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in
context. In European conference on computer vision, pages 740–755. Springer, 2014.
[9] Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge
Belongie. Feature pyramid networks for object detection. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pages 2117–2125, 2017.

[10] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal loss
for dense object detection. In Proceedings of the IEEE international conference on
computer vision, pages 2980–2988, 2017.
[11] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-
Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In European
conference on computer vision, pages 21–37. Springer, 2016.
[12] Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, and Trevor Darrell. Rethinking
the value of network pruning. arXiv preprint arXiv:1810.05270, 2018.
[13] Omkar M Parkhi, Andrea Vedaldi, Andrew Zisserman, et al. Deep face recognition. In
bmvc, volume 1, page 6, 2015.
[14] Joseph Redmon and Ali Farhadi. Yolo9000: better, faster, stronger. In Proceedings
of the IEEE conference on computer vision and pattern recognition, pages 7263–7271,
2017.
[15] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint
arXiv:1804.02767, 2018.
[16] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once:
Unified, real-time object detection. In Proceedings of the IEEE conference on computer
vision and pattern recognition, pages 779–788, 2016.

[17] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-
time object detection with region proposal networks. In Advances in neural information
processing systems, pages 91–99, 2015.
[18] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna.
Rethinking the inception architecture for computer vision. In Proceedings of the IEEE
conference on computer vision and pattern recognition, pages 2818–2826, 2016.
[19] Xiaogang Wang. Intelligent multi-camera video surveillance: A review. Pattern recog-
nition letters, 34(1):3–19, 2013.

Biomedical Applications of Composite Materials
50% (2)
Biomedical Applications of Composite Materials
30 pages
Lon L. Fuller The Problem of The Grudge Informer
No ratings yet
Lon L. Fuller The Problem of The Grudge Informer
9 pages
CNN Models To Detect Multiple Leds For Multilateral Occ.: Project: Ieee P802.15 Ig Vat
No ratings yet
CNN Models To Detect Multiple Leds For Multilateral Occ.: Project: Ieee P802.15 Ig Vat
9 pages
2004 10934v1 PDF
No ratings yet
2004 10934v1 PDF
17 pages
Cam and Yolo
No ratings yet
Cam and Yolo
13 pages
Efficientdet: Scalable and Efficient Object Detection: Mingxing Tan Ruoming Pang Quoc V. Le Google Research, Brain Team (
No ratings yet
Efficientdet: Scalable and Efficient Object Detection: Mingxing Tan Ruoming Pang Quoc V. Le Google Research, Brain Team (
10 pages
Efficientdet: Scalable and Efficient Object Detection: Mingxing Tan Ruoming Pang Quoc V. Le Google Research, Brain Team (
No ratings yet
Efficientdet: Scalable and Efficient Object Detection: Mingxing Tan Ruoming Pang Quoc V. Le Google Research, Brain Team (
10 pages
He Deep Residual Learning 2016 CVPR Supplemental
No ratings yet
He Deep Residual Learning 2016 CVPR Supplemental
4 pages
Xie Oriented R-CNN For Object Detection ICCV 2021 Paper
No ratings yet
Xie Oriented R-CNN For Object Detection ICCV 2021 Paper
10 pages
Li 2021 J. Phys.: Conf. Ser. 1827 012085
No ratings yet
Li 2021 J. Phys.: Conf. Ser. 1827 012085
11 pages
Multiscale Object Detection in Remote Sensing Images Using 1qh06jan
No ratings yet
Multiscale Object Detection in Remote Sensing Images Using 1qh06jan
10 pages
Real Time Object Detection in Surveillance Cameras With 2xjeq74wam
No ratings yet
Real Time Object Detection in Surveillance Cameras With 2xjeq74wam
8 pages
Lecture Paola Object Detection
No ratings yet
Lecture Paola Object Detection
29 pages
Efficient Detection of Small and Complex Objects for Autonomous Driving Using Deep Learning
No ratings yet
Efficient Detection of Small and Complex Objects for Autonomous Driving Using Deep Learning
5 pages
3.1 Faster - R-CNN - Towards - Real-Time - Object - Detection - With - Region - Proposal - Networks
No ratings yet
3.1 Faster - R-CNN - Towards - Real-Time - Object - Detection - With - Region - Proposal - Networks
13 pages
SLICING AIDEDHYPERINFERENCEANDFINE-TUNING FORSMALLOBJECTDETECTION
No ratings yet
SLICING AIDEDHYPERINFERENCEANDFINE-TUNING FORSMALLOBJECTDETECTION
5 pages
DeFRCN Decoupled Faster R-CNN for Few-Shot Object Detection
No ratings yet
DeFRCN Decoupled Faster R-CNN for Few-Shot Object Detection
17 pages
An Efficient Object Detection Algorithm Based On Compressed Networks
No ratings yet
An Efficient Object Detection Algorithm Based On Compressed Networks
13 pages
Du_2018_J._Phys.__Conf._Ser._1004_012029
No ratings yet
Du_2018_J._Phys.__Conf._Ser._1004_012029
9 pages
Remotesensing 14 00984 v2
No ratings yet
Remotesensing 14 00984 v2
21 pages
Zeng 2020
No ratings yet
Zeng 2020
12 pages
A_Rich_Feature_Fusion_Single-Stage_Object_Detector
No ratings yet
A_Rich_Feature_Fusion_Single-Stage_Object_Detector
8 pages
Od Segment
No ratings yet
Od Segment
53 pages
Object Detection using ELAN
No ratings yet
Object Detection using ELAN
6 pages
Li Large Selective Kernel Network For Remote Sensing Object Detection ICCV 2023 Paper
No ratings yet
Li Large Selective Kernel Network For Remote Sensing Object Detection ICCV 2023 Paper
12 pages
Yang QueryDet Cascaded Sparse Query For Accelerating High-Resolution Small Object Detection CVPR 2022 Paper
No ratings yet
Yang QueryDet Cascaded Sparse Query For Accelerating High-Resolution Small Object Detection CVPR 2022 Paper
10 pages
jimaging-10-00197
No ratings yet
jimaging-10-00197
19 pages
SpineNet - Learning Scale-Permuted Backbone For Recognition and Localization
No ratings yet
SpineNet - Learning Scale-Permuted Backbone For Recognition and Localization
11 pages
Object and Face Detection Based On Center-Net 1
No ratings yet
Object and Face Detection Based On Center-Net 1
7 pages
REPORT Python
No ratings yet
REPORT Python
40 pages
Wang NAS-FCOS Fast Neural Architecture Search For Object Detection CVPR 2020 Paper
No ratings yet
Wang NAS-FCOS Fast Neural Architecture Search For Object Detection CVPR 2020 Paper
9 pages
L7 Detection
No ratings yet
L7 Detection
54 pages
Development of Framework For Detecting Smoking Scenes
No ratings yet
Development of Framework For Detecting Smoking Scenes
5 pages
Comparative Analysis of Deep Learning Image Detection Algorithms
No ratings yet
Comparative Analysis of Deep Learning Image Detection Algorithms
27 pages
Object Detection Using Deep Learning
No ratings yet
Object Detection Using Deep Learning
6 pages
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized For Non-GPU Computers
No ratings yet
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized For Non-GPU Computers
8 pages
I Jeter 039112021
No ratings yet
I Jeter 039112021
8 pages
cs231n 2018 ds06
No ratings yet
cs231n 2018 ds06
38 pages
Zhu CoupleNet Coupling Global ICCV 2017 Paper
No ratings yet
Zhu CoupleNet Coupling Global ICCV 2017 Paper
9 pages
Object Detection and Identification
67% (3)
Object Detection and Identification
20 pages
Fast YOLO: A Fast You Only Look Once System For Real-Time Embedded Object Detection in Video
No ratings yet
Fast YOLO: A Fast You Only Look Once System For Real-Time Embedded Object Detection in Video
3 pages
YOLO Evolution Through Time
No ratings yet
YOLO Evolution Through Time
5 pages
[2024-AEJ]Z-YOLOv8s-based approach for road object recognition in complex traffic scenarios
No ratings yet
[2024-AEJ]Z-YOLOv8s-based approach for road object recognition in complex traffic scenarios
14 pages
An Improved Rotation Invariant CNN-based Detector With Rotatable Bounding Boxes For Aerial Image Detection
No ratings yet
An Improved Rotation Invariant CNN-based Detector With Rotatable Bounding Boxes For Aerial Image Detection
5 pages
CornerNet Detecting Objects As Paired Keypoints
No ratings yet
CornerNet Detecting Objects As Paired Keypoints
14 pages
Ding 2018 IOP Conf. Ser. Mater. Sci. Eng. 322 062024
No ratings yet
Ding 2018 IOP Conf. Ser. Mater. Sci. Eng. 322 062024
6 pages
Object Detection Slides
No ratings yet
Object Detection Slides
90 pages
Dlcv2017d2l4objectdetection 170622143747
No ratings yet
Dlcv2017d2l4objectdetection 170622143747
50 pages
Nerf RPN
No ratings yet
Nerf RPN
13 pages
VoVNet 论文An - Energy - and - GPU-Computation - Efficient - Backbone - Network - for - Real-Time - Object - Detection
No ratings yet
VoVNet 论文An - Energy - and - GPU-Computation - Efficient - Backbone - Network - for - Real-Time - Object - Detection
9 pages
Chapter 2: Technologies: What Is Yolov4?
No ratings yet
Chapter 2: Technologies: What Is Yolov4?
6 pages
BTP Report Faster R CNN Compressed
No ratings yet
BTP Report Faster R CNN Compressed
32 pages
MMDetection Open MMLab Detection Toolbox and Benchmark
No ratings yet
MMDetection Open MMLab Detection Toolbox and Benchmark
13 pages
Yolo: You Only Look Once: Unified Real-Time Object Detection
No ratings yet
Yolo: You Only Look Once: Unified Real-Time Object Detection
60 pages
YOLOv12 Attention-Centric Real-Time Object Detectors
No ratings yet
YOLOv12 Attention-Centric Real-Time Object Detectors
13 pages
Backbone Search For Object Detection For Applications in Intrusion Warning Systems
No ratings yet
Backbone Search For Object Detection For Applications in Intrusion Warning Systems
10 pages
Maaz Assignment # 3 Deep Learning
No ratings yet
Maaz Assignment # 3 Deep Learning
5 pages
Peerj Cs 2470 - New
No ratings yet
Peerj Cs 2470 - New
23 pages
Advanced Deep Learning Based Object Detection Methods
No ratings yet
Advanced Deep Learning Based Object Detection Methods
36 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Optical Flow: Exploring Dynamic Visual Patterns in Computer Vision
From Everand
Optical Flow: Exploring Dynamic Visual Patterns in Computer Vision
Fouad Sabry
No ratings yet
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
From Everand
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
Fouad Sabry
No ratings yet
PAVE Standards and Policies
No ratings yet
PAVE Standards and Policies
6 pages
Toxocara, The Enigmatic Parasite
No ratings yet
Toxocara, The Enigmatic Parasite
314 pages
Final Project Technopreneurship Format
No ratings yet
Final Project Technopreneurship Format
6 pages
Report Writing Format
No ratings yet
Report Writing Format
2 pages
The Causes and Prevention of Crowd Disasters
No ratings yet
The Causes and Prevention of Crowd Disasters
10 pages
Metallurgy of Hot Rolling-2
No ratings yet
Metallurgy of Hot Rolling-2
22 pages
Article On Martin Moran
No ratings yet
Article On Martin Moran
4 pages
rocket base workshop
No ratings yet
rocket base workshop
12 pages
4000 TCD
100% (1)
4000 TCD
18 pages
SOLIS-Export-Limit-Settings-Using-a-Meter-V2
No ratings yet
SOLIS-Export-Limit-Settings-Using-a-Meter-V2
3 pages
Acc224 - Week Two - Constituional and Regulatory Framework - Tsa - Fin MGT Reforms
No ratings yet
Acc224 - Week Two - Constituional and Regulatory Framework - Tsa - Fin MGT Reforms
98 pages
Thesis 2013 Alharbi PDF
No ratings yet
Thesis 2013 Alharbi PDF
434 pages
Lattice
No ratings yet
Lattice
6 pages
Englishiisyllabus2016 2017
No ratings yet
Englishiisyllabus2016 2017
4 pages
9 Physical and Non Physical Determinants of City Form Pattern
100% (8)
9 Physical and Non Physical Determinants of City Form Pattern
31 pages
Zoe Vandrey: Arbor - Intern
No ratings yet
Zoe Vandrey: Arbor - Intern
1 page
2020 Bookmatter SensorsInWaterPollutantsMonito
No ratings yet
2020 Bookmatter SensorsInWaterPollutantsMonito
9 pages
Reproductive Health Bill (Philippines)
No ratings yet
Reproductive Health Bill (Philippines)
18 pages
Hepatopulmonary Syndrome (HPS) : by Alaa Haseeb, MS.C
No ratings yet
Hepatopulmonary Syndrome (HPS) : by Alaa Haseeb, MS.C
27 pages
Ciceros de Legibus PDF
0% (1)
Ciceros de Legibus PDF
33 pages
Essential Learnings Year 10 History
No ratings yet
Essential Learnings Year 10 History
40 pages
Technical Service Information: Nissan Rl4Ro1A
No ratings yet
Technical Service Information: Nissan Rl4Ro1A
2 pages
Case Study On Liner Programming
No ratings yet
Case Study On Liner Programming
13 pages
Strategic Intervention Material SIM
No ratings yet
Strategic Intervention Material SIM
53 pages
mgt301 37finaltermpapers Withref
No ratings yet
mgt301 37finaltermpapers Withref
174 pages
Nnouncement of Inners: The Putnam Fellows - The Five Highest Ranking Individuals
No ratings yet
Nnouncement of Inners: The Putnam Fellows - The Five Highest Ranking Individuals
18 pages
Shopify Dropshipping Course - Advance Ecommerce Courses
No ratings yet
Shopify Dropshipping Course - Advance Ecommerce Courses
7 pages
Reflections On Two Year B.ed. Curriculum of GGSIPU Challenges and Suggestions
No ratings yet
Reflections On Two Year B.ed. Curriculum of GGSIPU Challenges and Suggestions
6 pages

Light-Weight RetinaNet For Object Detection

Uploaded by

Light-Weight RetinaNet For Object Detection

Uploaded by

Y. LI AND F.

REN: LIGHT-WEIGHT RETINANET FOR OBJECT DETECTION 1

over 800 GFLOPs. 28

2.2 Light-weight object detection networks (mAP-20-tier)

ResNet-50 FPN Detection

Figure 2: RetinaNet (ResNet50-FPN-800x800) network architecture.

3.1 RetinaNet Primer

ResNet-50 FPN Detection backend

6.9 10.5 5.1 11.5

3.2 Tiny back-end solution

3x3 conv, dw 1x1 conv, 256 1x1 conv, 256

3x3 conv, dw 3x3 conv, 256 1x1 conv, 256

D-block-v1 D-block-v2 D-block-v3

Figure 4: light-weight blocks for detection backend.

3x3 conv D4-D7

Fully shared weight for D3-D7 P3

Independent weight for D3

Figure 5: Fully and partially shared weights for detection backend.

3.2.2 Partially shared weights

Light-weight block scale mAP ∆mAP\% GFLOPs ∆FLOPs/%

Table 1: Comparison between different light-weight block.

4 Results and discussion

4.2 Performance on COCO dataset

Table 2: Configurations of different light-weight(LW) RetinaNet.

scale mAP AP50 AP75 APS APM APL GFLOPs ratio

Table 3: Comparison of original RetinaNet and proposed light-weight RetinaNet.

The configurations for different versions of light-weight RetinaNet with D-block-v2 or

You might also like