0% found this document useful (0 votes)
24 views

Convolutional Neural Network-Based Real-Time ROV Detection Using Forward-Looking Sonar Image

The document discusses using a convolutional neural network to detect remotely operated vehicles (ROVs) in real-time using forward-looking sonar images. It proposes applying a state-of-the-art object detection algorithm called YOLO to sonar images to localize an agent vehicle as part of an armless underwater manipulation system. The method was tested through field experiments and was able to detect and track the agent vehicle in successive sonar images.

Uploaded by

GabrielZaman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Convolutional Neural Network-Based Real-Time ROV Detection Using Forward-Looking Sonar Image

The document discusses using a convolutional neural network to detect remotely operated vehicles (ROVs) in real-time using forward-looking sonar images. It proposes applying a state-of-the-art object detection algorithm called YOLO to sonar images to localize an agent vehicle as part of an armless underwater manipulation system. The method was tested through field experiments and was able to detect and track the agent vehicle in successive sonar images.

Uploaded by

GabrielZaman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

10.

Convolutional Neural Network-based Real-time ROV


Detection Using Forward-looking Sonar Image

Juhwan Kim and Son-Cheol Yu


Dept. of Creative IT Engineering
Pohang University of Science and Technology
Pohang, South-Korea
[email protected]

Abstract—Agent system is strategy to enhance the underwater make underwater maps or investigate the particular underwater
manipulation. The conventional manipulation is generally robot resources. The underwater robots’ control theory, navigation
arm-based configuration which has singular points. On the other methods and sensing techniques are continuously studied by
hand, the agent system is an armless manipulation that the agent many researchers. The underwater manipulation issue is also
vehicle works as the end-effector. If the location of the agent can rising field for some physical operations.
be measured, the end effector is able to be place to any position.
To implement this system, the method of an agent vehicle Underwater robots have some ways of underwater
localization is proposed. The method uses the sonar images of manipulation. The robot arm method is the most widely used
moving agent obtained by forward-looking sonar. To detect the tools [1]. The AUVs or ROVs are equipped with a robotic arm
location of the agent in the sonar images, the convolutional neural and conduct some underwater physical works. It can precisely
network is applied. We applied the state-of-art object-detection control the joint angle. However, it has limitation of moving,
algorithm to the agent vehicle system. The fast object-detection because joint structure which is called singular point takes heavy
algorithm based on neural network can fulfil the real-time volume and weight [2]. For this reason, the armless
detection and show the remarkable validity. It means the manipulation method is developed.
underwater robot can begin navigation under its feed-back.
Through field experiment, we confirm the proposed method can The small ROV is a deployable and maneuverable as end-
detect and track the agent in the successive sonar images. effector robot which is strap-on main AUV [3]. The ROV with
long tether is free from heavy battery and conduct many
Keywords—armless manipulation; agent vehicle; convolutional missions without particular embedded-intelligences. The AUVs
neural network; object detection; forward-looking sonar; sonar have extensive action radius related to freedom of tether, and
image processing. they can use variety of instruments and sensors. The small ROV
is beneficial to detail operations. It can perform armless
I. INTRODUCTION
underwater manipulation or execution of agent docking. It can
Exploring the deep sea is a fascinating field in that it has play a part of manipulation hand to grip something or do
unknown environments. The autonomous underwater vehicles precisely controlling works. We call its small ROV as “Agent
(AUVs) and the remotely operated vehicles (ROVs) were vehicle” and its system as ‘Agent vehicle system’ (Fig. 1). For
created for exploring deep seas where humans cannot explore this system, we should detect agent’s localization by its position
directly, and this technology developed rapidly over the past few sensor and main AUV’s forward looking sonar. The precise
decades. These robots collect a variety of sensor data and it can location data can lead to accurate manipulation.
In this study, we proposed neural network based real-time
object-detection for localization of the agent vehicle. The state-
of-art and fast object-detection algorithm You Only Look Once
(YOLO) shows the high-speed and exact detection [4]. We
conducted this algorithm to our forward-looking sonar data. As
a result, we found the possibility of using it as input of feed-back
control.
II. BACKGROUND
A. Forward-looking Sonar
The forward-looking sonar obtains the acoustic video images
in real-time [5]. It has longer visual range than that of optical
imaging, so it is a prospective solution for underwater object
detection. However, the image quality of the forward-looking
Fig. 1. The agent vehicle manipulation system. sonar is lower than that of optical images. Because of the

978-1-5090-2442-1/16/$31.00 ©2016 IEEE 396 AUV 2016


Authorized licensed use limited to: Academia Navala Mircea cel Batran Constanta. Downloaded on December 12,2022 at 11:13:38 UTC from IEEE Xplore. Restrictions apply.
B. Convolutional neural network based image classifier
Convolutional neural network is supervised machine
learning algorithm that conduct convolution on neural network,
and have locality and shared weights [7]. With increasing
computing power of Graphics Processing Unit (GPU)’s parallel
architecture, neural network modeling is the currently the hottest
trend in image processing area [8]. The researchers can train and
test the massive neural network model in short time [9].
Ordinary image processing used the feature matching to find
interesting area. The low-level features are particular fixed
shapes or post-processing algorithms. However, the
convolutional neural network used high-level features that are
determined by training [10]. The deeper with the heavy layers,
the feature-levels increase and the model cannot be analyzed
logically [8] [22]. In the end, the supervised machine learning
generates the black-box function of accurate classifier.
C. Object-detection
The image classifier can only show the possibility of
existence. Therefore, we cannot easily detect the location of
target object in the image. Most of all, setting the proper ROI is
important for object-detection. Several algorithms can find the
ROI examples. Scale Invariant Feature Transform (SIFT) or
Fig. 2. The forward-looking sonar images. They were taken by AUV Histogram of Gradient (HOG) algorithms were used for object
‘Cyclops’ [17]. detection which was low-level feature based [11]. However,
they had limitation of validation performance and neural
intrinsic limitation of acoustic beam, the forward-looking sonar network based object-detection algorithms were emerged.
delivers the low quality of acoustic images that only human’s
eyes can distinguish objects (Fig. 2). The images have low Regions with CNN features (R-CNN) algorithm increased
resolution and high noise. Moreover, the single image shows the the detection validity over twice of the best algorithm before
three parts: shadow, background, highlights. Its image topology [12]. After that, Spatial Pyramid Pooling in deep convolutional
shows the different shape by taking view heights and angles [6]. Networks for visual recognition (SPPNet) algorithm quicken the
These characteristics make it difficult to extract valid speed 24 ~ 104 times of R-CNN’s [13]. In addition, fast R-CNN
information by image processing techniques. Therefore, we and faster R-CNN improve the detection of validity and speed
cannot use the ordinary algorithms for processing sonar images. [14] [15]. However, their detecting speeds are somewhat slow to
integrate in embedded computing systems. Their models are

Fig. 3. The YOLO algorithm structure conducting our custom data-set [4].

978-1-5090-2442-1/16/$31.00 ©2016 IEEE 397 AUV 2016


Authorized licensed use limited to: Academia Navala Mircea cel Batran Constanta. Downloaded on December 12,2022 at 11:13:38 UTC from IEEE Xplore. Restrictions apply.
recurrently calculated CNN several times. Then it took much
time to find the ROI of targets
III. PROPOSED METHOD
We proposed the real-time object-detection strategy of the
underwater small ROV. However, we must proceed the object-
recognition before object-detection. It shows the existence
probabilities of the target in the sonar images. For this reason,
we designed the classifier model to separate the ‘positive’
images and the ‘negative’ images. We set the correctly cropped
ROV images as the ‘positive’ and miss-cropped images and
background images as the ‘negative’.
After designed the classifier, we validated the model and Fig. 4. The average loss function value of trainning.
revised the calculation weights. Therefore, we also trained the
previous collected forward-looking sonar images that not taken Each image has two classes of ROIs. One is the ‘positive’ and
the target as the ‘negative’. The post-processing lowers the miss- the other is the ‘negative’. We manually dragged the mouse in
detection of background’s any objects. the images for precise small ROV’s ROI and randomly cropped
When the model showed the high classifying rate, we can ROI. With this data, we also made the fake-data for revising the
apply the object-detection algorithms. It can find the Region Of model. We randomly cropped two ROIs in the 1,000 random
Interest (ROI) that bounds the target object. We tested the sliding sonar images and labeled them.
window algorithm and neural network based algorithm. The more we gather the fake-background images, the
classifier can lower the recognition error rates. Without revising
A. Convolutional Neural Network-based Object-Classifier
the model, it detected the ‘positive’ ROIs in the fake-
We used the machine learning algorithm that includes the backgrounds. After retraining of fake-data, it never detected the
training of a large number of image data. The model is ‘Darknet any other object in the images. In that case, we found that the
Reference Model [16]’ in the classical Convolutional Neural data-set makes strategy to explore particular underwater regions.
Networks (CNNs). It is small but powerful model. It has seven
convolution layers and six max-pooling layers. First, take sufficient images of small ROV. We gathered
1,152 images of the small ROV and labeled them. Next, pre-scan
The training data-set was gathered from real-sea experiment. the target region’s backgrounds. We used 455 images of
The hovering-type AUV ‘Cyclops’ took the forward-looking background. They can be used for training robust model. Then,
sonar images at the Jangil Bay, South Korea, 2016 [17] [19]. The make the data-set form and do the training by the powerful
AUV took the small ROV when they were launched together computation machine such as desktop computer with GPU [20].
[18]. We got the roughly 2,000 images. We spent most of time After training is finished, we saved the training-weight data.
for making data-set. It includes ROIs and class numbers as label-
data. We manually cropped the images and coded label data.

Fig. 5. The result of object-detection in the forward-looking sonar images.

978-1-5090-2442-1/16/$31.00 ©2016 IEEE 398 AUV 2016


Authorized licensed use limited to: Academia Navala Mircea cel Batran Constanta. Downloaded on December 12,2022 at 11:13:38 UTC from IEEE Xplore. Restrictions apply.
B. Object-detection station-keeping control. The station-keeping error was few
Recently, some state-of-art algorithms focused on speed up centimeter, so we neglect the error.
for real-time processing and control something. We found the We used dual-frequency identification sonar (DIDSON) and
You Only Look Once (YOLO) algorithm that is new approach it is equipped to the AUV in order to capture the sonar images
to object-detection [4]. It has single CNN model and predicts of the agent [19]. The frame rate of DIDSON was set to 5 frames
both bounding box and class probability. It divides the image 11 per second. The resolution of the sonar images was 512 x 96.
by 11 area on the first layer, and connects to classifier model. At
the end of classifier model, it is fully connected to divided ROIs Under the settings, the ROV was located in the field of view
and class probabilities (Fig. 3). We used their open source to of forward-looking sonar [18]. Then, the position of the ROV
apply our custom data-set. The form of data-set includes class was manually changed. The sonar images obtained while the
number and ROI. We retrained the YOLO model with the pre- ROV operation were corrected and tested by proposed method.
trained classifier weights. After that, we tested the 2,413 The number of sonar images are 1,000.
forward-looking sonar images and calculated ROIs and B. Data-set Trainning
trajectory.
We trained the total 1,607 images in YOLO model. We can
IV. RESULT check the training progress by the loss function (Fig. 4). With
the saturation of graph tendency, we can learn that the training
A. Experimental Set-up leads to positive way. It took about one hour to finish the
To verify proposed method, we conducted field experiment. training.
We used a hovering-type AUV ‘Cyclops’ as the main AUV [17].
While the experiment, its position was fixed with the help of C. Real-time Object-detection
After the training finished, we tested the total 1,000 images
to find location in the images. The YOLO neural network model
successfully detect the agent vehicles. Each agent vehicle was
precisely bounded by ROI boxes (Fig. 5). We inserted the certain
images that is randomly picked within the forward-looking sonar
images database. After that, it only showed the negative ROIs
and positive ROIs were not existed.
In addition, we recorded the trajectories of each image. It
stores x and y axis locations in the images. We deleted the non-
detected location and connected the series of images. In this
graph, we can figure out the route of agent vehicle.
The speed of process is greatly important because we must
use it for real-time underwater missions. The YOLO object-
detection algorithm showed the 107.7 Frames per second (FPS)
on the off-line processing by GPU [20]. However, simple sliding
window algorithm was 0.20 FPS which means too slow (Table
1). In the real-sea, we take forward-looking sonar images as 5
FPS. Therefore, if object-detection result speed is over the 5
FPS, it can be used for real-time controls or missions.
TABLE I. THE COMPARISON BETWEEN TWO ALGORITHMS ABOUT
FRAMES PER SECOND.

D. Discussion and Future work


We proposed the real-time object detection of forward-
looking sonar images for localization of agent vehicle. The
limited sonar images show the possibility of conducting agent
vehicle system. If we gather the more data-set, we can get the
reliability of the system. Then, it will conduct the real-sea trial.
If we use the embedded system rather than powerful PCs, the
Fig. 6. The trajectories of agent vehicle on the forward-looking sonar detection speed would be slow. The solution is using the
images. specialized embedded system that includes mobile GPU [21].
State-of-art embedded boards has enough capacity of processing

978-1-5090-2442-1/16/$31.00 ©2016 IEEE 399 AUV 2016


Authorized licensed use limited to: Academia Navala Mircea cel Batran Constanta. Downloaded on December 12,2022 at 11:13:38 UTC from IEEE Xplore. Restrictions apply.
neural networks and can assure long operation time. With these [6] H. Cho, J. Gu, H. Joe, A. Asada, SC. Yu, "Acoustic beam profile-based
designed system, we will proceed with the agent vehicle system rapid underwater object detection for an imaging sonar." Journal of
Marine Science and Technology 20.1 (2015): 180-197.
and validate the system’s advantages.
[7] Y. Le Cun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W.
Hubbard, and L. D. Jackel, "Handwritten digit recognition with a back-
V. CONCLUSION propagation network." Advances in neural information processing
This study verified the real-time object-detection using systems. 1990.
forward-looking sonar image that is based on CNN and YOLO. [8] Y. LeCun, Y. Bengio, G. Hinton, "Deep learning." Nature 521.7553
We generated custom data-set and conduct the object-detection (2015): 436-444.
algorithm. Then, we realized the localization of small ROV. We [9] A. Krizhevsky, I. Sutskever, GE. Hinton, "Imagenet classification with
deep convolutional neural networks." Advances in neural information
found that YOLO algorithm is much effective to process processing systems. 2012.
forward-looking sonar images. Finally, it shows applying
[10] S. Lawrence, CL. Giles, AC. Tsoi, "Face recognition: A convolutional
machine learning algorithms on processing sonar image is much neural-network approach." IEEE transactions on neural networks 8.1
more useful. (1997): 98-113.
[11] DG. Lowe, "Object recognition from local scale-invariant features."
ACKNOWLEDGMENT Computer vision, 1999. The proceedings of the seventh IEEE
This research was supported by the Office of Naval Research international conference on. Vol. 2. Ieee, 1999.
Global, US Navy (Grant No. N62909-14-1-N290) and the [12] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature
hierarchies for accurate object detection and semantic segmentation."
project titled "Gyeongbuk Sea Grant Program", funded by the Proceedings of the IEEE conference on computer vision and pattern
Ministry of Oceans and Fisheries, Korea, Ministry of Science, recognition. 2014.
ICT and Future Planning, Korea, under the “ICT Consilience [13] K. He, X. Zhang, S. Ren, J. Sun, "Spatial pyramid pooling in deep
Creative Program” (IITP-R0346-16-1007) supervised by the convolutional networks for visual recognition." European Conference on
Institute for Information & communications Technology Computer Vision. Springer International Publishing, 2014.
Promotion. [14] R. Girshick, "Fast r-cnn." Proceedings of the IEEE International
Conference on Computer Vision. 2015.
[15] S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards real-
time object detection with region proposal networks." Advances in neural
REFERENCES information processing systems. 2015.
[16] J. Redmon, “Darknet: Open source neural networks in c.”
[1] G. Marani, SK. Choi, and J. Yuh, "Underwater autonomous manipulation http://pjreddie.com/darknet/, 2013–2016.
for intervention missions AUVs." Ocean Engineering 36.1 (2009): 15-23. [17] J. Pyo, HG. Joe, JH. Kim, A. Elibol, and SC. Yu, "Development of
[2] P. Song, M. Yashima, and V. Kumar, "Dynamic simulation for grasping hovering-type AUV “cyclops” for precision observation." 2013
and whole arm manipulation." Robotics and Automation, 2000. OCEANS-San Diego. IEEE, 2013.
Proceedings. ICRA'00. IEEE International Conference on. Vol. 2. IEEE, [18] VideoRay, LLC The Global Leader In MicroROV Technology,
2000. http://www.videoray.com
[3] SC. Yu, "A Preliminary Test on Agent-based Docking System for [19] DIDSON sonar, Sound Metrics Corp, http://www.soundmetrics.com
Autonomous Underwater Vehicles." International Journal of Offshore [20] GTX 970, Nvidia Corp, http://www.nvidia.com
and Polar Engineering 19.01 (2009).
[21] Jetson TX1, Nvidia Corp, http://www.nvidia.com
[4] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once:
[22] H. Noh, PH. Seo, and B. Han. "Image question answering using
Unified, real-time object detection." arXiv preprint arXiv:1506.02640
convolutional neural network with dynamic parameter prediction." arXiv
(2015).
preprint arXiv:1511.05756 (2015).
[5] CD. Loggins, "A comparison of forward-looking sonar design
alternatives." OCEANS, 2001. MTS/IEEE Conference and Exhibition.
Vol. 3. IEEE, 2001.

978-1-5090-2442-1/16/$31.00 ©2016 IEEE 400 AUV 2016


Authorized licensed use limited to: Academia Navala Mircea cel Batran Constanta. Downloaded on December 12,2022 at 11:13:38 UTC from IEEE Xplore. Restrictions apply.

You might also like