Real-Time Marine Anumals Images Classification by Embedded System Based On Mobilenet and Transfer Learning
Real-Time Marine Anumals Images Classification by Embedded System Based On Mobilenet and Transfer Learning
Li Ma Qiaoqiao Sun
College of Information Science & Institute Fresnel
Engineering Ecole Centrale de Marseille
Ocean University of China ECM, Marseille, France
Country Qingdao, China [email protected]
[email protected]
978-1-7281-1450-7/19/$31.00 ©2019
Authorized licensed use limited to: IEEE
University of Tasmania. Downloaded on March 12,2021 at 23:12:55 UTC from IEEE Xplore. Restrictions apply.
method of transferring weights from a trained network to × × × × ×
= + (1)
another untrained network. It only needs a small amount of the × × × × × × ×
Authorized licensed use limited to: University of Tasmania. Downloaded on March 12,2021 at 23:12:55 UTC from IEEE Xplore. Restrictions apply.
Fig.2 Classification based on transfer learning.
source domain and target domain are very similar, sample verify the effectiveness of the MobileNet model, the Inception
migration fuses source samples and target samples, then adjusts V3 model combining with the transfer learning is evaluated in
the source domain weights to get the target domain weights. the experiment for comparison.
Feature migration finds feature association between source and Firstly, we enhance the data of the images collected by the
target domains by reconstructing features, minimizing the underwater vehicle. Then, the method of parameter transfer
difference between them. Parameter migration is to share the learning is applied to small-scale marine animal data sets. We
parameters between the source and the target domains, and use the model parameters of InceptionV3, MobileNetV1 and
automatically adjust the weights to get the optimal results [19]. MobileNetV2, which have been trained by the large samples of
Transfer learning can alleviate the problem of insufficient source data, to train the network of target data. In order to
data. Therefore, transfer learning has gradually become the improve the training accuracy, we adopt a fine-tuning method in
preferred technology for artificial intelligence (AI) projects with parameter transfer. If it is the underlying structure of the
insufficient data or computational power. As a branch of convolution module, its parameters could be kept. If it is a high-
machine learning, transfer learning is increasingly integrated level convolution module close to the classifier, these modules
with neural networks. For image recognition, migration learning are set as trainable modes, including matrix weight, bias term
is a method to solve the problem of less labeled sample data and and other regularization term coefficients. Then the model can
high cost of model training. Pre-training model is the model that be used for the target data and the optimum value can be
has been trained by large data sets. We find some network layers obtained by adjusting the parameters in a small range.
which can reuse feature vectors, and then transfer these network Taking the MobileNetV1 model as an example, the specific
layers and parameters to train networks with smaller data sets transfer learning method is shown in Fig. 2. According to the
[20]. Therefore, training costs are reduced and resource data set category, we replace the full connection layer of the
utilization is improved. source model with the 7-class Softmax classifier. According to
the structure of different networks, the weight of high-level
III. IMAGE CLASSIFICATION BASED ON TRANSFER LEARNING convolution module is set to be trainable for adaptive
A. Data Enhancement adjustment. Then the full connection layer of the model is
modified. Through experiments, it is found that InceptionV3
In this paper, we would like to: (1) train model from small- model can get the highest accuracy of validation set when
scale marine animal’s data sets. (2) do the real-time image training from the level 175, MobileNetV1 is from level 122 and
classification by downloaded the model to embedded devices. MobileNetV2 is from level 130.
Because the data set is small in scale, we first enhance the data
set. Data enhancement is a method to improve the overall IV. EXPERIMENT
performance of training network when the original image data
set is insufficient. The main methods of data enhancement are A. Data Set
rotation, flipping, translation, zooming, noise addition, etc. In Some of the data used in the experiment are obtained by
this paper, we combine rotation, flipping and translation to underwater cameras for underwater marine animals and others
expand the sample space. are collected by the Internet. The whole data set is divided into
seven categories: fish, shrimp, scallop, crab, lobster, abalone and
B. Transfer Parameter sea cucumber. Each category ranges from 1000 to 1400 sheets,
To solve the problem of insufficient data sets, transfer totaling 8455 sheets. 80% of them are training set and 20% are
learning is introduced for the classification in this paper. In validation set. We enhanced the training set data. Each original
convolutional neural networks, MobileNet is an efficient model image is generated into three deformed images by three
for mobile and embedded devices. Therefore, we propose a processing methods: rotation, translation and flipping. The
transfer learning method based on MobileNet model. In order to training set can be expanded to 27056 pictures. In addition, for
Authorized licensed use limited to: University of Tasmania. Downloaded on March 12,2021 at 23:12:55 UTC from IEEE Xplore. Restrictions apply.
the embedded devices, the testing data are seven types of images
collected by the network, each type of image includes five sizes,
each size has 10 pictures, a total of 350. When selecting training
data, a part of the image containing non-target samples is used
to simulate random noise in order to improve the generalization
ability of the model. Because the resolution of the image taken
by underwater camera is different from that collected by
network, we normalize the original image. According to the
requirements of different models, reshape is 224 × 224 or 299 ×
299.
Fig.3 Classification Fig.4 Classification loss of
In this paper, the experimental environment adopts a Ubuntu
16.04 version computer. It carries two GTX 1080ti video cards. accuracy of testing data validation data
The experiment could be completed in the framework of
TensorFlow + Keras. In addition, the embedded device is Jeston-
TX2 produced by NVIDIA. The device has a NVIDIA Pascal From Figure 3 and Figure 4, it can be seen that the accuracy
GUP, 256 NVIDIA CUDA cores, dual-core Denver 2, four-core of validation set increases with the number of iterations in the
ARM Cortex-A57 processor. All these configurations are network training process. The loss rate of validation set
designed to better adapt to convolution operations and improve gradually stabilizes in a certain interval with the increase of
the speed of operations. iterations. According to Table 1, Inception V3 model has the
best training set accuracy, but this does not mean that the best
B. Experimental Process verification set accuracy can be obtained. Under the same
Training Convolutional Neural Network conditions, MobileNetV2 model obtained the highest accuracy
We have selected the bath_size = 16 and epoch = 200 to train of validation set : 92.89%.
the network model. After the training, the values of accuracy and C. Classification Time Test in the Embedded Device
loss rate are recorded. The formulas for calculating the accuracy
and loss rate is as follows:
acc = (2)
where n represents the number of correctly classified
images and N represents the total number of images. Loss is
calculated by the following formulas with yi being the predicted
value and y_hati being the original target value:
The accuracy of training set, the loss rate of training set, the
accuracy rate of validation set and the loss rate of validation set
are illustrated in Table 1. The change process of the accuracy
rate and the loss rate of the validation set in the training process
are shown in Fig. 3 and Fig. 4.
Types of data Fig. 5 Some real-time classification results. On the top of each
Accuracy of Loss of Accuracy of Loss of
Model
training training validation validation
image, there are species, accuracy and classification time.
data data data data
InceptionV3 99.73% 0.0072 91.13% 0.372
MobileNetV1 99.29% 0.0199 89.92% 0.5113
MobileNetV2 99.58% 0.0085 92.89% 0.3165
Authorized licensed use limited to: University of Tasmania. Downloaded on March 12,2021 at 23:12:55 UTC from IEEE Xplore. Restrictions apply.
TABLE II . CLASSIFICATION ACCURACY AND TIME OF THE MARINE ANIMAL IMAGES BY EACH MODEL IN THE EMBEDDED SYSTEM
Species
Model
Abalone Crab Fish Lobster Scallop Sea_cucumber Shrimp Average
Time(s) 0.091 0.116 0.174 0.181 0.139 0.114 0.135 0.1361
InceptionV3
Accuracy 96.8% 95.6% 100% 98.2% 92.2% 92.6% 91.6% 93.6%
Time(s) 0.046 0.055 0.077 0.046 0.032 0.089 0.051 0.0569
MobileNetV1
Accuracy 96.2% 94.8% 100% 93.6% 96.4% 89.6% 86.8% 92.6%
Time(s) 0.052 0.052 0.052 0.073 0.053 0.075 0.045 0.0578
MobileNetV2
Accuracy 96.6% 96.4% 99.2% 96.8% 100% 89.6% 92.4% 95.0%
The trained model is downloaded into the embedded device [7] A. I. Kukharenko and A. S. Konushin, “Simultaneous classification of
Jeston TX2 and the test data is classified. Four classification several features of a person’s appearance using a deep convolutional
neural network,” Pattern Recognition and Image Analysis, vol. 25, pp.
results are listed in Fig. 5. The average classification time of 461-465, 2015.
each model are shown in Table 2. [8] Siyue Xie and Haifeng Hu, “Facial expression recognition with FRR-
From Table 2, the average classification time of the CNN,” Electronics Letters, vol. 53, pp. 235-237, 2017.
InceptionV3, MobilenetV1 and MobilenetV2 models for image [9] Yichao Wu, Fei Yin and Chenglin Liu, “Improving handwritten Chinese
text recognition using neural network language models and convolutional
are: 0.136s, 0.0569s and 0.0578s respectively. The neural network shape models,” Pattern Recognition, vol. 65, pp. 251-264,
MobileNetV1 model has the best performance for classifying 2017.
runtime and the speed of MobileNetV2 is close to that of [10] Lele Xie, Tasweerie, Lianwen Jin , Yuliang Liu and Sheng Zhang, “A
MobileNetV1. New CNN-Based Method for Multi-Directional Car License Plate
Detection,” IEEE Transactions on Intelligent Transportation Systems,
D. Discussion of the Experimental Results vol. 19, pp. 507-517, 2018.
In this experiment, three models are selected for transfer [11] Tajbakhsh Nima, Shin JaeY, Gurudu, Suryakanth R and et al,
learning and the parameters of marine animal images are “Convolutional Neural Networks for Medical Image Analysis: Fine
Tuning or Full Training?” IEEE Transactions on Medical Imaging, vol.
retrained to classify the images in the embedded devices. The 35, pp. 1299-1312, 2016.
experimental results show that the MobileNetV2 model trained [12] Keqing Zhu, Jie Tian and Haining Huang, “Underwater object Images
by transfer learning has the best validation set accuracy of Classification Based on convolutional neural network,” 2018 IEEE 3rd
92.89%. In terms of image classification speed, The International Conference on Signal and Image Processing (ICSIP), pp.
MobileNetV1 model achieved the least average classification 301-305, 2018.
time. But MobilenetV2 is only 0.001 seconds slower than [13] Christian Szegedy , Wei Liu, Yangqing Jia , et al, “Going Deeper with
MobileNetV1. Therefore, considering both the classification Convolutions,” Conference on Computer Vision and Pattern Recognition
(CVPR), pp. 1-9, 2014.
accuracy and computing time, it can be concluded that the
MobilenetV2 model plus transfer learning could be a better [14] Szegedy Christian, Vanhoucke Vincent, Ioffe Sergey and et al,
“Rethinking the inception architecture for computer vision,” Proceedings
choice for the real-time classification of the marine animal of the 2016 IEEE Conference on Computer Vision and Pattern
images than the other two considered models. Furthermore, the Recognition, pp. 2818-2826, 2016.
size of the MobilenetV2 model is only about 40M and much [15] Howard A G, Zhu Menglong, Chen Bo and et al. “MobileNets: Efficient
suitable for the embedded device. Convolutional Neural Networks for Mobile Vision Applications,” http://
arxiv.org/abs/1704.04861, 2017.
REFERENCES [16] Sandler M, Howard A, Zhu Menglong and et al, “MobileNetV2: Inverted
[1] Yajuan Wei, “Study of Zooplankton Automatic Recognition Method for Residuals and Linera bottlenecks,” http : //arxiv.org/abs/1801.04381,
Dark Field Image,” Ocean University of China, 2013. 2018.
[2] Xi Qiao, “Sea cucumber identification in real-time based on underwater [17] Xin Sun , Junyu Shi , Lipeng Liu and et al, “Transferring deep knowledge
machine vision techinque,” China Agricultural University, 2017. for object recognition in Low-quality underwater videos,”
Neurocomputing, vol. 275, pp. 897-908, 2017.
[3] Peng Wan, Hailong Pan, Changjiang Long, et al, “Design of the on-line
identification device of freshwater fish species based on machine vision [18] HooChang Shin, Roth, Holger Roth, Mingchen Gao and Ronald M
technology,” Food and Machinery, vol. 28, pp.164-167, 2012. Summers, “Deep convolutional neural networks for computer-aided
detection: CNN architectures, dataset characteristics and transfer
[4] Yihao Hsiao , ChaurChin Chen , Sunin Lin and Fangpang Lin, “Real- learning,” IEEE Transactions on Medical Imaging, vol. 35, pp. 1285-
world underwater fish recognition and identification, using sparse 1298, 2016.
representation,” Ecological Informatics, vol. 23, pp.13-21, 2014.
[19] Ling Shao, Fan Zhu and Xuelong Li, “Transfer Learning for Visual
[5] Hongwei Qin, Xiu Li, Jian Liang , Yigang Peng and Changshui Zhang, Categorization: A Survey,” IEEE Transactions on Neural Networks and
“DeepFish: Accurate Underwater Live Fish Recognition with a Deep Learning Systems, vol. 26, pp. 1019-1034, 2015.
Architecture,” Neurocomputing, 2015.
[20] Zhongling Huang, Zongxu, Pan and Bin Lei, “Transfer learning with deep
[6] Christian Szegedy, Wei Liu, Yangqing Jia, et al, “Going deeper with convolutional neural network for SAR target classification with limited
convolutions,” Proceedings of the 2015 IEEE Conference on Computer labeled data,” Remote sensing, vol. 9, pp. 1-21, 2017.
Vision and Pattern Recognition, pp. 1-9, 2015.
Authorized licensed use limited to: University of Tasmania. Downloaded on March 12,2021 at 23:12:55 UTC from IEEE Xplore. Restrictions apply.