Detection of Multiclass Objects in Optical Remote Sensing Images
Detection of Multiclass Objects in Optical Remote Sensing Images
Abstract— Object detection in complex optical remote sensing for object detection by structural feature selection and struc-
images is a challenging problem due to the wide variety of tural feature description, but they are difficult to use with
scales, densities, and shapes of object instances on the earth complicated remote sensing scenes with limited account of
surface. In this letter, we focus on the wide-scale variation
problem of multiclass object detection and propose an effective data. Yang et al. [6] proposed an effective fully convolutional
object detection framework in remote sensing images based on network-based airplane detection framework. Experiments
YOLOv2. To make the model adaptable to multiscale object show the high precision, recall, and location accuracy of the
detection, we design a network that concatenates feature maps detection framework. However, it is difficult for this method to
from layers of different depths and adopt a feature introducing detect objects in complex scenes because this method relies
strategy based on oriented response dilated convolution. Through
this strategy, the performance for small-scale object detection on Markov random field image segmentation. Liu et al. [7]
is improved without losing the performance for large-scale proposed an arbitrary-oriented ship detection framework based
object detection. Compared to YOLOv2, the performance of the on convolutional neural networks (CNNs). This framework
proposed framework tested in the DOTA (a large-scale data set performs well on small object detection. However, it has poor
for object detection in aerial images) data set improves by 4.4% performance on extremely large-scale objects. Cheng et al. [3]
mean average precision without adding extra parameters. The
proposed framework achieves real-time detection for 1024 ×1024 proposed a learning rotation-invariant CNN for object detec-
image using Titan Xp GPU acceleration.1 tion in very high-resolution (VHR) optical remote sensing
Index Terms— Feature introducing strategy, object detection, images. The experimental results demonstrate the excellent
optical remote sensing image, oriented response (OR) dilated performance on a publicly available 10-class VHR object
convolution. detection data set. Long et al. [8] proposed a multiclass object
I. I NTRODUCTION detection method for remote sensing images based on CNNs.
This method focuses on the accurate localization of detected
O BJECT detection (e.g., ship, airplane, and vehicle) in
optical remote sensing images is in high demand in
extensive earth observation applications. However, complex
objects and achieves good performance. However, the above
two methods are complex and may reduce the applicability of
scene conditions and massive data quantities make multiclass large-sized remote sensing object detection.
object detection a challenging problem. Furthermore, huge Overall, with the astonishing development of deep learning,
variations in the scale, orientation, and shape of the object state-of-the-art CNN-based object detection frameworks such
instances on the earth’s surface and imbalance of a wide as SSD [9], R-FCN [10], Faster R-CNN [11], and YOLOv2
variety of categories further increase the complexity of object [12], [13] perform excellently on the general object detection
detection in optical remote sensing images, which can be problem and, therefore, have been widely used to address the
reflected by existing annotated data sets [1]–[3]. object detection problem for remote sensing images. Recently,
Extensive studies have been devoted to object detection in Xia et al. [2] evaluated the performance of these methods on a
optical remote sensing images. Drawing upon recent advances large-scale remote sensing object detection data set (DOTA).
in computer vision, many researchers have pursued applying The pixel size of categories, aspect ratio, and orientations
the object detection method originally developed for natural of instances, instance density of images, and spatial resolu-
scenes to optical remote sensing images. Bai et al. [4] and tion in DOTA are various, which makes DOTA challenging.
Zhang et al. [5] proposed intuitive and effective methods Therefore, it is not surprising that these object detectors
perform poorly. Recently, the new version of YOLO (v3) [14]
Manuscript received July 10, 2018; revised October 12, 2018; accepted significantly improves the performance of object detection.
November 18, 2018. Date of publication December 12, 2018; date of current
version April 22, 2019. This work was supported in part by the Chang However, it comes at the cost of increasing model complexity.
Jiang Scholars Program under Grant T2012122 and in part by the Hundred In this letter, we focus on the huge-scale variation problem
Leading Talent Project of Beijing Science and Technology under Grant of multiclass object detection and propose a simple object
Z141101001514005. (Corresponding author: He Chen.)
W. Liu, J. Wang, and H. Chen are with the Beijing Key Laboratory of detection method that slightly modifies YOLOv2 and robustly
Embedded Real-Time Information Processing Technology, Beijing Institute and effectively detects objects in large-scale challenging data
of Technology, Beijing 100081, China (e-mail: [email protected]). sets.
L. Ma is with the School of Information Engineering, Zhengzhou University,
Zhengzhou 450001, China. The main contributions of this letter lie in the following two
Color versions of one or more of the figures in this letter are available aspects.
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/LGRS.2018.2882778
1) We present a simple multiclass object detection archi-
1 https://github.com/WenchaoliuMUC/Detection-of-Multiclass-Objects-in- tecture for a wide-scale varied object in optical remote
Optical-Remote-Sensing-Images sensing images.
1545-598X © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Ramco Institute of Technology-Rajapalayam. Downloaded on February 08,2024 at 10:49:30 UTC from IEEE Xplore. Restrictions apply.
792 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 16, NO. 5, MAY 2019
Authorized licensed use limited to: Ramco Institute of Technology-Rajapalayam. Downloaded on February 08,2024 at 10:49:30 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: DETECTION OF MULTICLASS OBJECTS IN OPTICAL REMOTE SENSING IMAGES 793
Authorized licensed use limited to: Ramco Institute of Technology-Rajapalayam. Downloaded on February 08,2024 at 10:49:30 UTC from IEEE Xplore. Restrictions apply.
794 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 16, NO. 5, MAY 2019
Fig. 5. Visual comparison results in typical scenes. (First column) Result of the method in [7]. (Second column) Result of the original YOLOv2.
(Third column) Result of the original YOLOv3. (Last column) Result of the proposed architecture.
The performance comparison metric (mAP) on the DOTA test set has a mAP of 65.7%. Compared with the model in [7],
set is obtained by the DOTA performance evaluation server. the performance on small object detection of YOLOv2 is
We train and test the model architecture in [7]. The model not satisfactory. For YOLOv3, epoch 52 performs best on
in [7] was originally designed for ship detection. For multiclass the validation set and is adopted as the final model. The
object detection, we modify the last layer to make it consistent experimental result tested on the DOTA test set has a mAP
with the last layer of the proposed model in this letter. For a of 60.0%. Compared to YOLOv2, YOLOv3 performs better
fair comparison, this model is pretrained by using the union on classes such as ships and bridges whose instance size is
of VOC 2007 and VOC 2012 trainval data sets. As shown small.
in Fig. 4, epoch 60 performs best on the validation set and The proposed model with OR convolution kernel (N = 4)
is adopted as the final model. As shown in Table II, the is adopted as a comparison model in this letter. Epoch 57 is
experimental result tested on the DOTA test set has a mAP adopted as the final model. As shown in Table III, the experi-
of 64.3%. This model performs well on small objects such mental result tested on the DOTA test set is 70.1% mAP. The
as ship, vehicle, and storage tank. However, the performance proposed model performs well on object detection of each
on the soccer field, baseball diamond, and ground track field, class. Compared with the model in [7], the proposed model
whose instances are large in size, is poor. has advantages in extremely large object detection. In addition,
Original YOLOv2 and YOLOv3 are also pretrained using compared with original YOLOv2, the performance on small
the union of the VOC 2007 and VOC 2012 trainval data sets. object detection is satisfactory.
As shown in Fig. 4, epoch 55 performs best on the validation The visual comparison results tested on the test set are
set and is adopted as the final model of YOLOv2. As shown shown in Fig. 5. The first row shows that YOLOv2 and the
in Table III, the experimental result tested on the DOTA test proposed model perform better than others do on large object
Authorized licensed use limited to: Ramco Institute of Technology-Rajapalayam. Downloaded on February 08,2024 at 10:49:30 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: DETECTION OF MULTICLASS OBJECTS IN OPTICAL REMOTE SENSING IMAGES 795
IV. C ONCLUSION
In this letter, we focused on the large-scale variation prob-
lem of multiclass object detection in optical remote sensing
images. To solve this problem, we proposed a multiclass
object detection framework for optical remote sensing images.
We adopted a feature introducing strategy based on OR dilated
convolution. Using this strategy, the performance for small-
scale object detection by the network is enhanced without
losing the performance for large-scale object detection. The
experiments confirmed that the proposed framework is efficient
and robust for multiscale objects in complex optical remote
sensing scenes.
TABLE II R EFERENCES
P ERFORMANCE C ONSUMPTION OF
[1] H. Zhu, X. Chen, W. Dai, K. Fu, Q. Ye, and J. Jiao, “Orientation
D IFFERENT M ETHODS ( IN AP)
robust object detection in aerial images using deep convolutional
neural network,” in Proc. IEEE Int. Conf. Image Process., Sep. 2015,
pp. 3735–3739.
[2] G.-S. Xia et al., “DOTA: A large-scale dataset for object detection in
aerial images,” in Proc. IEEE CVPR, Jun. 2016, pp. 3974–3983.
[3] G. Cheng, P. Zhou, and J. Han, “Learning rotation-invariant convo-
lutional neural networks for object detection in VHR optical remote
sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 12,
pp. 7405–7415, Dec. 2016.
[4] X. Bai, H. Zhang, and J. Zhou, “VHR object detection based on
structural feature extraction and query expansion,” IEEE Trans. Geosci.
Remote Sens., vol. 52, no. 10, pp. 6508–6520, Oct. 2014.
[5] H. Zhang, X. Bai, J. Zhou, J. Cheng, and H. Zhao, “Object detection
via structural feature selection and shape model,” IEEE Trans. Image
Process., vol. 22, no. 12, pp. 4984–4995, Dec. 2013.
[6] Y. Yang, Y. Zhuang, F. Bi, H. Shi, and Y. Xie, “M-FCN: Effective
fully convolutional network-based airplane detection framework,” IEEE
Geosci. Remote Sens. Lett., vol. 14, no. 8, pp. 1293–1297, Aug. 2017.
[7] W. Liu, L. Ma, and H. Chen, “Arbitrary-oriented ship detection frame-
work in optical remote-sensing images,” IEEE Geosci. Remote Sens.
Lett., vol. 15, no. 6, pp. 937–941, Jun. 2016.
[8] Y. Long, Y. Gong, Z. Xiao, and Q. Liu, “Accurate object localization in
remote sensing images based on convolutional neural networks,” IEEE
Trans. Geosci. Remote Sens., vol. 55, no. 5, pp. 2486–2498, May 2017.
[9] W. Liu et al. (2015). “SSD: Single shot multibox detector.” [Online].
TABLE III Available: https://arxiv.org/abs/1512.02325
[10] J. Dai, Y. Li, K. He, and J. Sun, “R-FCN: Object detection via
T IME C ONSUMPTION OF D IFFERENT region-based fully convolutional networks,” in Proc. Adv. NIPS, 2016,
M ETHODS ( IN M ILLISECONDS ) pp. 379–387.
[11] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards
real-time object detection with region proposal networks,” IEEE Trans.
Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017.
[12] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only
look once: Unified, real-time object detection,” in Proc. IEEE CVPR,
Jun. 2016, pp. 779–788.
detection. Unfortunately, neither model detected the ground [13] J. Redmon and A. Farhadi. (2016). “YOLO9000: Better, faster, stronger.”
track field whose size is large. We can find from the second [Online]. Available: https://arxiv.org/abs/1612.08242
[14] J. Redmon and A. Farhadi. (2018). “YOLOv3: An incremental improve-
row that all models except YOLOv2 perform well for small ment.” [Online]. Available: https://arxiv.org/abs/1804.02767
object detection. The third row shows that all models except [15] J. Wang, W. Liu, L. Ma, H. Chen, and L. Chen, “IORN: An effective
YOLOv3 are robust to object scale variety. Overall, this visual remote sensing image scene classification framework,” IEEE Geosci.
Remote Sens. Lett., vol. 15, no. 11, pp. 1695–1699, Nov. 2018.
evaluation results demonstrate the robustness of the proposed [16] Y. Zhou, Q. Ye, Q. Qiu, and J. Jiao, “Oriented response networks,” in
method. Proc. IEEE CVPR, Jul. 2017, pp. 4961–4970.
Authorized licensed use limited to: Ramco Institute of Technology-Rajapalayam. Downloaded on February 08,2024 at 10:49:30 UTC from IEEE Xplore. Restrictions apply.