0% found this document useful (0 votes)
80 views10 pages

Multi-Traffic Scene Perception Based On Supervised Learning

This document summarizes a research paper that proposes a method for classifying multi-traffic scenes based on visual features and supervised learning algorithms. The method extracts color, texture, and edge features from images to form an eight-dimension feature matrix. Then it uses five supervised learning algorithms (BP neural network, support vector machine, probabilistic neural network, S_Kohonen network and extreme learning machine) to train classifiers to accurately classify images into different weather conditions. The goal is to help enhance driver assistance systems and visibility for drivers under complex weather conditions like fog, rain, or low light conditions.

Uploaded by

Amit Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views10 pages

Multi-Traffic Scene Perception Based On Supervised Learning

This document summarizes a research paper that proposes a method for classifying multi-traffic scenes based on visual features and supervised learning algorithms. The method extracts color, texture, and edge features from images to form an eight-dimension feature matrix. Then it uses five supervised learning algorithms (BP neural network, support vector machine, probabilistic neural network, S_Kohonen network and extreme learning machine) to train classifiers to accurately classify images into different weather conditions. The goal is to help enhance driver assistance systems and visibility for drivers under complex weather conditions like fog, rain, or low light conditions.

Uploaded by

Amit Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

SPECIAL SECTION ON MULTIMEDIA ANALYSIS FOR INTERNET-OF-THINGS

Received December 1, 2017, accepted January 1, 2018, date of publication January 8, 2018, date of current version February 28, 2018.
Digital Object Identifier 10.1109/ACCESS.2018.2790407

Multi-Traffic Scene Perception Based on


Supervised Learning
LISHENG JIN1 , MEI CHEN 1, YUYING JIANG2 , AND HAIPENG XIA1
1 Transportation
College of Jilin University, Changchun 130022, China
2 China–Japan Union Hospital of Jilin University, Changchun 130033, China

Corresponding author: Yuying Jiang ([email protected])


This work was supported in part by the National Natural Science Foundation under Grant 51575229, in part by the National Key Research
and Development Project of China under Grant 2017YFB0102600, and in part by the Electric Intelligent Vehicle Innovation Team of the
Science and Technology Department of Jilin Province.

ABSTRACT Traffic accidents are particularly serious on a rainy day, a dark night, an overcast and/or rainy
night, a foggy day, and many other times with low visibility conditions. Present vision driver assistance
systems are designed to perform under good-natured weather conditions. Classification is a methodology to
identify the type of optical characteristics for vision enhancement algorithms to make them more efficient.
To improve machine vision in bad weather situations, a multi-class weather classification method is presented
based on multiple weather features and supervised learning. First, underlying visual features are extracted
from multi-traffic scene images, and then the feature was expressed as an eight-dimensions feature matrix.
Second, five supervised learning algorithms are used to train classifiers. The analysis shows that extracted
features can accurately describe the image semantics, and the classifiers have high recognition accuracy rate
and adaptive ability. The proposed method provides the basis for further enhancing the detection of anterior
vehicle detection during nighttime illumination changes, as well as enhancing the driver’s field of vision on
a foggy day.

INDEX TERMS Underlying visual features, supervised learning, intelligent vehicle, complex weather
conditions, classification.

I. INTRODUCTION in driver assistance systems [6]. Liu et al. propose a vision-


Highway traffic accidents bring huge losses to people’s based skyline detection algorithm under image brightness
lives and property. The advanced driver assistance sys- variations [7]. Fu et al. propose automatic traffic data col-
tems (ADAS) play a significant role in reducing traffic acci- lection under varying lighting conditions [8]. Fritsch et al.
dents. Multi-traffic scene perception of complex weather use classifiers for detecting road area under multi-traffic
condition is a piece of valuable information for assistance scene [9]. Wang et al. propose a multi-vehicle detection and
systems. Based on different weather category, specialized tracking system and it is evaluated by roadway video captured
approaches can be used to improve visibility. This will con- in a variety of illumination and weather conditions [10].
tribute to expand the application of ADAS. Satzoda and Trivedi propose a vehicle detection method on
Little work has been done on weather related issues for seven different datasets that captured varying road, traffic,
in-vehicle camera systems so far. Payne and Singh propose and weather conditions [11].
classifying indoor and outdoor images by edge intensity [1].
Lu et al. propose a sunny and cloudy weather classification II. PROBLEM STATEMENT
method for single outdoor image [2]. Lee and Kim propose A. IMPACT OF COMPLEX WEATHER ON DRIVER
intensity curves arranged to classfy four fog levels by a neural Low visibility conditions will bring the driver a sense of
network [3]. Zheng et al. present a novel framework for tension. Due to variations of human physiological and psy-
recognizing different weather conditions [4]. Milford et al. chological, driver’s reaction time is different with the dif-
present vision-based simultaneous localization and mapping ferent driver’s ages and individuals. The statistics show that
in changing outdoor environments [5]. Detecting critical driver’s reaction time in complex low visibility weather con-
changes of environments while driving is an important task ditions is significantly longer than on a clear day. In general,

2169-3536
2018 IEEE. Translations and content mining are permitted for academic research only.
VOLUME 6, 2018 Personal use is also permitted, but republication/redistribution requires IEEE permission. 4287
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
L. Jin et al.: Multi-Traffic Scene Perception Based on Supervised Learning

the driver’s reaction time is about 0.2s ∼ 1s. If the driver Yoo et al. present an image enhancement algorithm for low-
needs to make a choice in complex cases, driver’s reaction light scenes in an environment with insufficient illumina-
time is 1s ∼ 3s. If the driver needs to make complex judg- tion [17]. Jung propose a image fusion technique to improve
ment, the average reaction time is 3s ∼ 5s. imaging quality in low light shooting [18]. Zhou et al. present
The overall stopping distance can be defined as global and local contrast measurements method for single-
d = dR +db . It includes the distance dR = tR v0 that means the image defogging [19]. Liu et al. present single image dehaz-
driver’s reaction time and the stopping distance db = v20 /2a. ing by using of dark channel model [20]. Pouli and Reinhard
Hereby v0 describes the initial velocity, tR donate the reaction present a novel histogram reshaping technique to make color
time and a represent deceleration rate. image more intuitive [21]. Arbelot et al. present a framework
that uses the textural content of the images to guide the color
transfer and colorization [22]. In order to improve visilibity,
Xiang et al. propose a improved EM method to transfer selec-
tive colors from a set of source images to a target image [23].

FIGURE 1. Different braking distance caused by different reaction time at


different brake initial velocity.

As shown in Fig. 1, when the initial braking speed is


100km/h, if the driver’s reaction time is 1.5s, 3s, 5s respec-
tively, the braking distance is 93.11m, 134.77m, 190.33m
respectively [12]. This means that if driver’s response delay
one second, it may lead to serious traffic accidents. The data is
obtained on the dry road, the friction coefficient is 1.0, decel- FIGURE 2. Multi-traffic scene classification algorithm flow framework
eration rate is 7.5m/s2 . The mean deceleration originate from diagram.
an example of the Bavarian police, taken from their website
http://www.polizei.bayern.de/verkehr/studien/index.html/
31494 on 28th October 2011. C. FLOW FRAMEWORK
In this work, firstly, owing to classify multi-traffic scene road
B. ENHANCING THE DRIVER’s FIELD OF VISION images, underlying visual features (color features, texture
IN FOGGY DAY AND NIGHT features, edge features) are extracted from multi-traffic scene
Weather understanding plays a vital role in many real- images, and then the features expressed as eight-dimensions
world applications such as environment perception in self- feature matrix. The traffic scene classification problem is
driving cars. Automatic understanding weather conditions becoming the supervised learning problems. Secondly, BP
can enhance traffic safety. For instance, Xu et al. sum- neural network, support vector machine, probabilistic neural
mary image defogging algorithms and related studies on network, S_Kohonen network and extreme learning machine
image restoration and enhancement [13]. Gallen et al. pro- algorithms are used to train classifiers. In order to achieve
pose a nighttime visibility estimation method in the pres- weather images automatic classification, the main steps are
ence of dense fog [14]. Gangodkar et al. propose a vehicles shown in Fig. 2.
detection method under complex outdoor conditions [15]. This paper is organized as follows. An experimental image
Chen et al. propose night image enhancement method in set is constructed and global underlying visual features are
order to improve nighttime driving and reduce rear-end extracted in Section III. Five supervised learning classifica-
accident [12]. Kuang et al. present an effective nighttime tion algorithms are introduced in Section IV. Comparison and
vehicle detection system based on image enhancement [16]. analysis of the five supervised learning classification methods

4288 VOLUME 6, 2018


L. Jin et al.: Multi-Traffic Scene Perception Based on Supervised Learning

FIGURE 3. Sub-sample of 10 categories traffic road scene (from left to right, from top to bottom, the labels are 1-10).

are illustrated in Section V. Finally, we conclude the paper in propose a novel texture feature based multiple classifier tech-
Section VI. nique and applies it to roadside vegetation classification [30].
In this work, average gray xi1 , standard deviation xi2 , variance
III. CONSTRUCT AN EXPERIMENTAL IMAGE SET AND xi3 , average gradient xi4 , entropy xi5 , contrast xi6 , spatial
EXTRACT UNDERLYING VISUAL FEATURE frequency xi7 and edge intensity xi8 are extracted and they are
Image feature extraction is the premise step of supervised as shown in formula (1). The feature sample set with labels
learning. It is divided into global feature extraction and local can be expressed as C = [xi1 , xi2 , · · · xim ], i = 1, 2, 3 · · · N ,
feature extraction. In the work, we are interested in the entire m = 1, 2, 3 · · · 8, where, i represents the number of images
image, the global feature descriptions are suitable and con- in the image set. The process that extracted eight underlying
ducive to understand complex image. Therefore, multi-traffic visual features is simple, time saving and eight underlying
scene perception more concerned about global features, such visual can comprehensively describe the visual information
as color distribution, texture features. of the image. Table 1 represents eight features of ten cate-
Image feature extraction is the most important process in gories’ traffic scene images in Fig. 3. Table 2 is normalized
pattern recognition and it is the most efficient way to simplify data of eight underlying visual features in Table 1.
high-dimensional image data. Because it is hard to obtain  
x1,1 x1,2 · · · x1,m
some information from the M × N × 3 dimensional image  x2,1 x2,2 · · · x2,m 
matrix. Therefore, owing to perceive multi-traffic scene, xim =  .

.. .. 

(1)
the key information must be extracted from the image.  .. . . 
xi,1 xi,2 ··· xi,m
A. CREATE AN EXPERIMENTAL IMAGE SET
where, i = 1, 2, 3 · · · 1200, m = 1, 2, 3 · · · 8.
In the work, there are 1200 images are collected by use of
The underlying visual features are described as follows:
driving recorder and the image set D is established for training
and test. There are 10 categories traffic scene images are
1) AVERAGE GRAY
classified in the work, 120 images are chosen from each
The average gray can reflect the average brightness of image.
category at random. The camera system provides images with
According to the distribution of visual effect, average gray
a resolution of 856 ∗ 480 pixels. The images are obtained
value between 100 and 200 belongs to optimal visual. The
respectively under rainy day, night without street lamp, night
formula of average gray AG can be expressed as follows:
with street lamp, overcast, sunlight, rainy night , foggy day
and other low visibility road environment images, the clas- AG = k ∗ Pk (2)
sification labels is 1-10, sub-sample set is shown in Fig. 3.
The image set D = {(Img1 , y1 ), (Img2 , y2 ), · · · (ImgN , yN )}, where, Pk = w∗hNk
, k represents gray value of the input image,
N ∈ {1, 2 · · · 1200}, the category label can be expressed as k ∈ (0 . . . 255), Nk indicates the number of pixels with a
y = {y1 , y2 , · · · yN }, yi ∈ {1, 2, 3 · · · 10}i = 1, 2, 3 · · · N . gray value k, w represents image width, h represents image
height, Pk represents frequencies histogram of relative gray
B. UNDERLYING VISUAL FEATURES EXTRACTION value.
In order to train classifier, underlying visual features are
extracted that can describe color distribution and structure of 2) STANDARD DEVIATION
image. Such as, color features, texture features [24]–[26], and The image standard deviation denotes the discrete situation of
edge features. Han et al. propose a road detection method each pixel’s gray value relative to the average gray-value of
by extracting image features [27]. Zhou et al. propose a the images. In general, the larger the variance, the more abun-
automatic detection of road regions by extract distinct road dant the gray layer of the image, and the better the definition.
feature [28]. Bakhtiari et al. propose a semi automatic road According to the distribution of visual effect, the standard
extraction from digital images method [29]. Chowdhury et al. deviation value between 35 and 80 is the optimal visual.

VOLUME 6, 2018 4289


L. Jin et al.: Multi-Traffic Scene Perception Based on Supervised Learning

TABLE 1. Eight features of ten categories traffic scene images in Fig. 3

TABLE 2. Normalized data of eight underlying visual features in Table 1.

The formula to determine the standard deviation of the image 5) ENTROPY


can be expressed as follows: The entropy describes the gray value distribution. It is inde-
v pendent of the position of the pixels. It means that the posi-
u w P h
uP
(A − Aij )2 tion of pixels has no influence on the entropy within an
t i=1 j=1 ij
u
image. Information entropy of a clear image is greater than
SD = (3)
w∗h the information entropy of an unclear image. Furthermore,
where, Aij represents gray value of the image at the pixel point the color information entropy can distinguish the different
(i, j), Aij represents average gray value of the image. multi-traffic scene images. The calculating formula of the
image information entropy is as follows:
3) VARIANCE 255
X
The variance is the square of the standard deviation and it rep- EN = − Pk log2 (Pk ) (6)
resents the degree of discrete of the image pixels. If the stan- k=0
dard deviation results are not obvious, variance can enlarge
the distinction between features. The formula to determine 6) CONTRAST
the variance of the image can be expressed as follows: Contrast describes the variation of image values in image
space. In general, the better the image resolution, the larger
V = SD2 (4) the image contrast will be. The contrast of clear images
is usually larger than the contrast of unclear images. The
4) AVERAGE GRADIENT contrast of the narrow sense is the main factor to decide
The average gradient is an important technical characteristics different texture structure that can be used for image classifi-
indicator of the image structure. The average gradient of the cation and segmentation problems [31]. The contrast features
images can reflect the details and image definition. In general, are significant as a global textures description to distinguish
the larger images average gradient, the more abundant the multi-traffic scene image. Contrast varies widely depending
images marginal information in the images, and the clearer on the lighting conditions of the different scenes. So, contrast
the image will be. The average gradient formula for gray can be used as the typical feature to distinguish multi-traffic
images is as follows: scene image. Its formula is as follows:
w Ph q v
P (Aij −A(i+1)j )2 +(Aij −Aij+1 )2 u SD
2 C = us (7)
u
i=1 j=1 u 255
AG = (5) t P (k−AG)4 ∗Pk
w∗h k=1
where, Aij represents gray value of the image at the pixel point V

(i, j), A(i+1)j represent gray value of the image at the pixel where, SD represents image standard deviation, AG repre-
point (i + 1, j), Ai(j+1) represent gray value of the image at sents average gradient, V represent variance, Pk = wN∗kh , k
the pixel point (i, j + 1). represents gray value of the input image.

4290 VOLUME 6, 2018


L. Jin et al.: Multi-Traffic Scene Perception Based on Supervised Learning

7) SPATIAL FREQUENCY most widely used neural network model. Its learning rule is
Spatial frequency is a texture feature that reflects the overall constantly adjusting the whole network weight and threshold
activity of an image spatial domain and it describes the varia- through the reverse propagation in order to get the minimum
tion of image values in image space. Its formula is as follows: network error square sum. In our work, let m, M and n
v respectively stand for the number of the input layer nodes,
u w P h w P h
uP
hide layer nodes and output layer nodes. The process of multi-
(Aij − Ai(j−1) )2 (Aij − A(i−1)j )2
P
u
t i=1 j=1 i=1 j=1 traffic scene perception using BPNN is shown as follows:
SF = +
w·h w·h
(8)
where, Aij represents gray value of the image at the pixel point
(i, j), A(i+1)j represent gray value of the image at the pixel
point (i, j − 1), A(i−1)j represent gray value of the image at
the pixel point (i − 1, j).

8) EDGE INTENSITY
Edge intensity characterizes the edge of the image. It can
be known that the image edge intensity can be used as the
typical feature to distinguish multi-traffic scene image. The In our case, the classes correspond to weather situations
aim of extracting edge intensity is to identify points that which we divide into {clear weather, light rain, heavy rain,
the image brightness changes sharply or discontinuity in a night without street lamp, overcast, rainy night, foggy day}.
digital image. Edge feature extraction is a fundamental work Thus, the problem of classification can be thought of as
of image processing and feature detection in computer vision finding some function f that maps from descriptor space C
field. The formula is as follows: into the classes F.
q
ED = P2ij + Q2ij (9)

w P
h
1 P
where, Pij = 2 (Aij+1 − Aij + A(i+1)(j+1) − A(i+1)j )
i=1 j=1
represents the horizontal edge intensity at the pixel point
w P
h
(i, j), and Qij = 12
P
(Aij − A(i+1)j + Ai(j+1) − A(i+1)(j+1) )
i=1 j=1
represents the verticaledge intensity.

IV. INTRODUCTION OF SUPERVISED LEARNING


FIGURE 4. The result of the actual classification and predicted
CLASSIFICATION ALGORITHMS classification by BPNN (accuracy = 87.5%, elapsed time = 1.398 seconds).
In Section III, each image will be transformed into a learning
bag by extracting eight features. After extracted global fea-
tures, machine learning classification approaches come into In this section, BP network is used to train a
operation. In recent ten years, a variety of pattern recog- classifier. The sigmoidal function is chosen by test com-
nition methods have been proposed and proved is useful. mon kernels function. The number of iterations is 10000,
Maji et al. propose additive kernel svms for classifica- the learning rate is 0.1 and the target value is 0.00004.
tion [32]. A histogram intersection kernel and support The specific method include: Firstly, a total of 60 images
vector machine classifiers are presented for image classifica- are randomly selected from each category road environment
tion [33]. A deep neural networks image detection was pre- image. Secondly, in order to construct the training feature
sented in [34]. A review paper about fault and error tolerance set T, eight global underlying visual features are extracted
in neural network was presented in [35]. Another new related from 600 images. Thirdly, in order to construct the test
method was presented in [36]. A BP-NN and improved- feature set V, eight underlying visual features were extracted
adaboost algorithm was presented [37]. In this section, five from the remaining 600 images. Test result is shown in
supervised learning algorithms will be introduced to solve the Fig. 4. The X axis represents the 600 test images, and the
multi-traffic scene classify problem. Y axis represents the 10 categories traffic scene. BP network
recognition accuracy rate can reach 87.5% when the number
A. BACK PROPAGATION NEURAL NETWORK CLASSIFIER of hidden neurons at 240.
BP network was presented by Rumelhart and McCelland. The recognition rate represents the label of the actual
It is a multi-layer feed forward network that was trained by test image coincides with the label of predicted test image,
error back propagation algorithm. Currently, it is one of the it means the classification is correct. The accuracy rate is

VOLUME 6, 2018 4291


L. Jin et al.: Multi-Traffic Scene Perception Based on Supervised Learning

calculated as follows. solution under the bayesian criterion. Thirdly, PNN allow to
i=600
P increase or decrease training data and it is no need to re-train.
num(simlabeli − testlabeli = 0) Therefore, PNN is used for classifying image in this
i=1
Accuracy = × 100% section. Radical basis function is chosen as kernel function
num(testlabeli ) and we set the distribution density spread is 1.5. The specific
(10) method is as same with the BP network that described in
which, the testlabeli represents actual category label of the section IV-A. Test result is shown in Fig. 6.
test image and simlabeli represents the predicted category
label of test image after test process.

B. SUPPORT VECTOR MACHINE CLASSIFIER


The support vector machine was first proposed by Cottes
and Vapnik. The most classic DD-SVM [38] and MILES
algorithm [39] are proposed by Chen for image classifica-
tion. Based on statistical learning theory (SLT), SVM can
automatically find the support vector. The support vector
can distinguish different image category and it can maximize
the interval between class and class. As svm is simple, fast,
powerful and robust, we decided to use SVM as our learning FIGURE 6. The result of the actual classification and predicted
and classification method. classification by PNN (accuracy = 91.8%, elapsed time is 3.636 seconds).

FIGURE 5. The result of the actual classification and predicted


classification by SVM (accuracy = 89.833% (539/600), elapsed
time = 0.334 seconds).

FIGURE 7. The result of the actual classification and predicted


There are many toolboxes for implementing SVM, such classification by SKohonen (accuracy = 86.8% (521/600), elapsed time is
as LSSVM, SVMlight, Weka, SVMlin, SVM_SteveGunn 2.137 seconds).

LIBSVM-FarutoUltimate, , LS-SVMlab Toolbox, OSV SVM


Classifier Matlab Toolbox. LIBSVM package developed D. S_KOHONEN NETWORK CLASSIFIER
by Professor Lin Chih-Jen of Taiwan University in 2001. SKohonen neural network is a feed forward neural network.
Because LIBSVM package is a simple, fast and effective Let m, M and n respectively stand for the number of input
SVM toolbox, LIBSVM was used for classifying image in layer nodes, competitive layer nodes and output layer nodes.
this section. Radical basis function is chosen as kernel func- When SKohonen network is used for supervised learning,
tion. We set the scale factor g = 1 and the penalty factor radical basis function is used as kernel function.We set the
c = 2. The specific method is as same as the BP network in input node number m=8, competitive layer node M=8 and
Section IV-A. Test result is shown in Fig. 5. output layer node n=10. The maximum learning rate 0.01,
the learning radius is 1.5 and the number of iterations is
C. PROBABILISTIC NEURAL NETWORK CLASSIFIER 10000. The specific method is as same as the BPNN that
Probabilistic neural network (PNN) was first proposed by described in section IV-A. Test result is shown in Fig. 7.
Dr. D. F. Specht. In principle, although BP network as same
as PNN are calculated by neurons, the models are different. E. EXTREME LEARNING MACHINE CLASSIFIER
A newff function is used to create the BP network and a Extreme Learning Machine was first proposed in 2004 by
newpnn function is used to create a probabilistic neural net- Huang Guangbin in Nanyang Technological University.
work. There are some advantages for image classification by Huang et al. propose using extreme learning machine for
using PNN. Firstly, PNN is training fast. The training time is multiples classification [40]. ELM is a single hidden layer
only slightly larger than the time of read the data. Secondly, feed forward neural network learning algorithm. Let m,
no matter how complex the classification problem is, as long M and n respectively stand for the number of input layer
as there is enough training data, PNN can get the optimal nodes, hide layer nodes and output layer nodes. If g(x) is the

4292 VOLUME 6, 2018


L. Jin et al.: Multi-Traffic Scene Perception Based on Supervised Learning

TABLE 3. The correct classification number of ten categories traffic scene (60 images each class).

FIGURE 8. ELM network training model.

activate function of the hidden layer neurons. bi presents the


threshold. There are N different image samples and the fea-
ture set can be expressed as C = [xi1 , xi2 , · · · xim ]. Therefore FIGURE 9. The result of the actual classification and predicted
classification by ELM (elapsed time is 0.431 seconds. accuracy =
ELM network training model is shown in Fig. 8 [41], [42]. 100%(600/600)).
Mathematical formula of ELM network model can be
expressed as follow:
M
X
vi g(wi · xi + bi ) = oj , j = 1, 2, 3 · · · N (11)
i=1
which, wi = [w1i , w2i , · · · , wmi ] represent input weight vec-
tor that located between the network input layer node and the
hidden layer node. vi = [vi1 , vi2 , · · · , vin ] represent output
weight vector that located between the hidden layer node and
the network output node. oj = [oi1 , oi2 , · · · , oin ] represent
the network predicted output value
We propose ELM is used for classfying image. The
sigmoidal function is used as the kernel function. The number FIGURE 10. The influence of the number of hidden layer neurons on ELM
performance.
of hidden neurons is 200. The specific method is as same as
the BPNN that described in Section IV-A. The test result is
shown in Fig. 9.
The number of hidden layer neurons M is the only param- the experimental image database D, training feature set T and
eter of ELM. In order to verify the effect of the hidden layer test features set V are the same in the five supervised learn-
neurons M on the accuracy, there are 600 images are used ing frameworks. The feature extraction process is described
for training classifier, and the rest 600 images are used for in III-A and III-B. The experimental platform includes Intel
testing. The relationship between the number of hidden nodes Core i5 Processor, 8 GB RAM, Windows7 operating system,
and the accuracy is shown in Fig. 10. Accuracy of the ELM matlab 2010a test environment.
algorithm is increased with the increase of the hidden layer The test results are shown in Table 3. The conclusions are
neurons M. The prediction accuracy can reach 100% when as follows.
M is 200. When M more than 200, accuracy is not increased (1) Accuracy is the most important evaluation index for
with the number of hidden neurons. In short,when the number classification algorithms performance. In Table 3, the predic-
of hidden layer of neurons at 200, the classification result is tion accuracy rate of SVM classifier and BP neural network
the best. is similar (87.5%, 89.83%). However, compared to BPNN
classifier, SVM are relatively stable and faster.
V. COMPARISON AND ANALYSIS OF FIVE SUPERVISED (2) The predicted accuracy of ELM is slightly higher than
LEARNING METHODS other classifier that indicates ELM has better performance in
In order to verify the effectiveness of the classification result, classification.
BPNN, SVM, PNN, SKohonen and ELM are compared (3) The running time of ELM and SVM is respectively
by time and accuracy. Consider the comparison fairness, 0.431s and 0.334s, which indicate the running speed of ELM

VOLUME 6, 2018 4293


L. Jin et al.: Multi-Traffic Scene Perception Based on Supervised Learning

TABLE 4. Confusion matrix of ten categories traffic scene based on SVM supervised learning algorithm.

TABLE 5. Recall ratio and precision ratio of 10 categories traffic scene based on five supervised learning algorithm.

and SVM are almost the same, and the time is much higher
than PNN, SKohonen and BP neural network classifier.
In addition to the correct rate, error rate also can be used to
measure the performance of the algorithm in the traffic scene
perception. According to Table 3, the result of classification
error number of 10 categories traffic scenes is shown in
Fig.11.
We can conclude that in terms of image algorithms, PNN
and ELM are better than other classifiers in accuracy. In terms
of classification correct number, there are three categories
traffic scenes are below 50. It indicates the classification
effect of Skohonen and BP is poor. FIGURE 11. The error number of test samples.
The classification correct number of category 6 and cat-
egory 10 are below 50. This indicate that the extracted fea- A. CONFUSION MATRIX
tures cannot be described the images very well. In Fig. 3, Confusion matrix is a visualization tool in artificial intel-
the category 6 and 10 respectively represent overcast and ligence field. It is especially suitable for the analysis of
foggy image. The images are blurred and texture features supervised learning. The confusion matrix of the rest four
are not obvious. So the image enhancement algorithm can be supervised learning algorithm is similar with the SVM con-
considered to improve the visibility. fusion matrix. Therefore, SVM confusion matrix is discussed
The category 7 represents sunlight image, its classification here. As shown in Table 5. The diagonal data represent the
correct number are above 58 by five classifiers, it indicates classfy correct number of the corresponding category. The
that the 8 global underlying visual features can fully describe last row is SVM classifier prediction results of each category.
the images. In summary, ELM classifier has a stable recogni- The last column represents 60 images from each category
tion accuracy and performance. sample. The conclusions are as follows.

4294 VOLUME 6, 2018


L. Jin et al.: Multi-Traffic Scene Perception Based on Supervised Learning

B. ACCURACY, PRECISION AND RECALL on large datasets to verify the effectiveness of the proposed
The system measures the performance can depend on accu- method in Section V. It proved that the proposed eight fea-
racy, precision and recall. The accuracy represents the ratio tures not only can accurately describe image characteristics,
between the number of correctly classified image and the total but also have strong robustness and stability at the complex
number of images. The precision represents the ratio between weather environment and the ELM algorithm is superior to
the number of correctly classify an image and the number of other algorithms.
all marked positive. The recall is defined as the rate that the In the future, the proposed algorithms will need to be
number of a actual positive example is divided into a positive further verified by the larger image set. Integrated learning
example. is a new paradigm in machine learning field. It is worth to
be studied improve the generalization of a machine learning
TP + TN
Accuracy = (12) system. And visual image enhancement algorithms in fog and
TP + TN + FP + FN night time applied to general image are worth to be further
TP
Recall = (13) studied.
TP + FP
TP
Precision = (14) REFERENCES
TP + FN [1] A. Payne and S. Singh, ‘‘Indoor vs. outdoor scene classification in dig-
where, TP is the number that positive image is correctly ital photographs,’’ Pattern Recognit., vol. 38, no. 10, pp. 1533–1545,
Oct. 2005.
predicted as positive, FP is the number that positive image is [2] C. Lu, D. Lin, J. Jia, and C.-K. Tang, ‘‘Two-class weather classification,’’
predicted as negative, TN is the number that negative image is IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 12, pp. 2510–2524,
truly predicted as negative, and FN presents the number that Dec. 2017.
[3] Y. Lee and G. Kim, ‘‘Fog level estimation using non-parametric inten-
positive image is predicted as negative sample. sity curves in road environments,’’ Electron. Lett., vol. 53, no. 21,
From Table 4, we can obtain that category4 and category6, pp. 1404–1406, Dec. 2017.
category4 and category8, category7 and category10 are easy [4] C. Zheng, F. Zhang, H. Hou, C. Bi, M. Zhang, and B. Zhang, ‘‘Active dis-
criminative dictionary learning for weather recognition,’’ Math. Problems
to be confused. Because confused category image sample set Eng., vol. 2016, Mar. 2016, Art. no. 8272859.
possess similar characteristics , as shown in Fig. 3. Recall [5] M. Milford, E. Vig, W. Scheirer, and D. Cox, ‘‘Vision-based simultaneous
ratio that classify category 6 and category 4 is 50 / (50+10) = localization and mapping in changing outdoor environments,’’ J. Field
Robot., vol. 31, no. 5, pp. 814–836, Sep./Oct. 2014.
83.3%. Recall ratio that classify category 8 and category 4 is [6] C.-Y. Fang, S.-W. Chen, and C.-S. Fuh, ‘‘Automatic change detection of
36/(36+24) = 60%. Recall ratio that classify category 10 and driving environments in a vision-based driver assistance system,’’ IEEE
category 7 is 48/(48 + 10) = 82.8%. In order to prove the Trans. Neural Netw., vol. 14, no. 3, pp. 646–657, May 2003.
[7] Y. J. Liu, C. C. Chiu, and J. H. Yang, ‘‘A robust vision-based skyline
effectiveness of the five supervised learning algorithm, recall detection algorithm under different weather conditions,’’ IEEE Access,
ratio and precision ratio are used to measure the performance vol. 5, pp. 22992–23009, 2017.
of the algorithm. The recall ratio and precision ratio of each [8] T. Fu, J. Stipancic, S. Zangenehpour, L. Miranda-Moreno, and N. Saunier,
‘‘Automatic traffic data collection under varying lighting and temper-
category are shown in Table 5. ature conditions in multimodal environments: Thermal versus visible
spectrum video-based systems,’’ J. Adv. Transp., vol. 2017, Jan. 2017,
VI. CONCLUSIONS Art. no. 5142732.
[9] J. Fritsch, T. Kuhnl, and F. Kummert, ‘‘Monocular road terrain detection
Weather recognition based on road images is a brand-new and by combining visual and spatial information,’’ IEEE Trans. Intell. Transp.
challenging subject, which is widely required in many fields. Syst., vol. 15, no. 4, pp. 1586–1596, Aug. 2014.
Hence, research of weather recognition based on images [10] K. Wang, Z. Huang, and Z. Zhong, ‘‘Simultaneous multi-vehicle detection
and tracking framework with pavement constraints based on machine
is in urgent demand, which can be used to recognize the learning and particle filter algorithm,’’ Chin. J. Mech. Eng., vol. 27, no. 6,
weather conditions for many vision systems. Classification pp. 1169–1177, Nov. 2014.
is a methodology to identify the type of optical characteris- [11] R. K. Satzoda and M. M. Trivedi, ‘‘Multipart vehicle detection using
symmetry-derived analysis and active learning,’’ IEEE Trans. Intell.
tics for vision enhancement algorithms to make them more
Transp. Syst., vol. 17, no. 4, pp. 926–937, Apr. 2016.
efficient. [12] M. Chen, L. Jin, Y. Jiang, L. Gao, F. Wang, and X. Xie, ‘‘Study on leading
In this paper, eight global underlying visual features are vehicle detection at night based on multisensor and image enhancement
method,’’ Math. Problems Eng., vol. 2016, Art. no. 5810910, Aug. 2016.
extracted and five supervised learning algorithms are used to
[13] Y. Xu, J. Wen, L. Fei, and Z. Zhang, ‘‘Review of video and image defogging
perceive multi-traffic road scene. Firstly, our method extracts algorithms and related studies on image restoration and enhancement,’’
colour features, texture features and boundary feature which IEEE Access, vol. 4, pp. 165–188, 2016.
are used to evaluate the image quality. Thus, the extracted [14] R. Gallen, A. Cord, N. Hautière, É. Dumont, and D. Aubert, ‘‘Nighttime
visibility analysis and estimation method in the presence of dense fog,’’
features are more comprehensive. Secondly, the ten cate- IEEE Trans. Intell. Transp. Syst., vol. 16, no. 1, pp. 310–320, Feb. 2015.
gories traffic scene image are marked as labels 1-10. Owing [15] D. Gangodkar, P. Kumar, and A. Mittal, ‘‘Robust segmentation of moving
to the category label represents the whole image, there is vehicles under complex outdoor conditions,’’ IEEE Trans. Intell. Transp.
Syst., vol. 13, no. 4, pp. 1738–1752, Dec. 2012.
no need to mark the specific area or key point of image. [16] H. Kuang, X. Zhang, Y. J. Li, L. L. H. Chan, and H. Yan, ‘‘Nighttime
Thirdly, by using of five supervised learning that mentioned vehicle detection based on bio-inspired image enhancement and weighted
in Section IV, we can greatly simplify the manual annotation score-level feature fusion,’’ IEEE Trans. Intell. Transp. Syst., vol. 18, no. 4,
pp. 927–936, Apr. 2017.
process of feature sample and improve the classifier effi- [17] Y. Yoo, J. Im, and J. Paik, ‘‘Low-light image enhancement using adaptive
ciency. At last, experiments and comparisons are performed digital pixel binning,’’ Sensors, vol. 15, no. 7, pp. 14917–14931, Jul. 2015.

VOLUME 6, 2018 4295


L. Jin et al.: Multi-Traffic Scene Perception Based on Supervised Learning

[18] Y. J. Jung, ‘‘Enhancement of low light level images using color-plus-mono LISHENG JIN received the B.S. degree in con-
dual camera,’’ Opt. Exp., vol. 25, no. 10, pp. 12029–12051, May 2017. struction machinery, the M.S. degree in mechan-
[19] L. Zhou, D.-Y. Bi, and L.-Y. He, ‘‘Variational contrast enhancement guided ical design and theory, and the Ph.D. degree in
by global and local contrast measurements for single-image defogging,’’ mechatronic engineering from Jilin University,
J. Appl. Remote Sens., vol. 9, Oct. 2015, Art. no. 095049. Changchun, China, in 1997, 2000, and 2003,
[20] Y. Liu, H. Li, and M. Wang, ‘‘Single image dehazing via large sky region respectively. He is currently a Professor with the
segmentation and multiscale opening dark channel model,’’ IEEE Access, Transportation College of Jilin University.
vol. 5, pp. 8890–8903, 2017.
His research interests include vehicle safety and
[21] T. Pouli and E. Reinhard, ‘‘Progressive color transfer for images of
intelligent vehicle navigation technology, vehicle
arbitrary dynamic range,’’ Comput. Graph., vol. 35, no. 1, pp. 67–80,
Feb. 2011. ergonomics, and driver behavior analysis. He has
[22] B. Arbelot, R. Vergne, T. Hurtut, and J. Thollot, ‘‘Local texture-based color authored over 100 papers in the above research areas. He serves as a Reviewer
transfer and colorization,’’ Comput. Graph., vol. 62, pp. 15–27, Feb. 2017. for many international journals, including Transportation Research Part D,
[23] Y. Xiang, B. Zou, and H. Li, ‘‘Selective color transfer with multi-source Transportation Research Part F, and Accident Analysis & Prevention.
images,’’ Pattern Recognit. Lett., vol. 30, no. 7, pp. 682–689, May 2009.
[24] R. M. Haralick, K. Shanmugam, and I. Dinstein, ‘‘Textural features for
image classification,’’ IEEE Trans. Syst., Man, Cybern., vol. SMC-3, no. 6,
pp. 610–621, Nov. 1973.
[25] O. Regniers, L. Bombrun, V. Lafon, and C. Germain, ‘‘Supervised clas-
sification of very high resolution optical images using wavelet-based
textural features,’’ IEEE Trans. Geosci. Remote Sens., vol. 54, no. 6,
pp. 3722–3735, Jun. 2016. MEI CHEN received the B.S. degree in commu-
[26] G. Tian, H. Zhang, Y. Feng, D. Wang, Y. Peng, and H. Jia, ‘‘Green nications and transportation from the Shandong
decoration materials selection under interior environment characteristics: University of Technology, Zibo, China, in 2009.
A grey-correlation based hybrid MCDM method,’’ Renew. Sustain. Energy
She is currently pursuing the Ph.D. degree with
Rev., vol. 81, pp. 682–692, Jan. 2018.
the Transportation College of Jilin University.
[27] X. Han, H. Wang, J. Lu, and C. Zhao, ‘‘Road detection based on the fusion
of Lidar and image data,’’ Int. J. Adv. Robot. Syst., vol. 14, no. 6, p. 1, Her research interests include image enhance-
Nov. 2017. ment image, pattern recognition and vehicle safety
[28] H. Zhou, H. Kong, L. Wei, D. Creighton, and S. Nahavandi, ‘‘On detecting and intelligent vehicle navigation technology, deep
road regions in a single UAV image,’’ IEEE Trans. Intell. Transp. Syst., learning, and semantic comprehension.
vol. 18, no. 7, pp. 1713–1722, Jul. 2017.
[29] H. R. R. Bakhtiari, A. Abdollahi, and H. Rezaeian, ‘‘Semi automatic road
extraction from digital images,’’ Egyptian J. Remote Sens. Space Sci.,
vol. 20, no. 1, pp. 117–123, Jun. 2017.
[30] S. Chowdhury, B. Verma, and D. Stockwell, ‘‘A novel texture feature based
multiple classifier technique for roadside vegetation classification,’’ Expert
Syst. Appl., vol. 42, no. 12, pp. 5047–5055, Jul. 15 2015.
[31] H. Tamura, S. Mori, and T. Yamawaki, ‘‘Textural features corresponding
to visual perception,’’ IEEE Trans. Syst., Man, Cybern., vol. SMC-8, no. 6, YUYING JIANG received the B.S. degree in clin-
pp. 460–473, Jun. 1978.
ical medicine from Jilin University, Changchun,
[32] S. Maji, A. C. Berg, and J. Malik, ‘‘Efficient classification for additive
China, in 2002, the M.S. degree from the Depart-
kernel SVMs,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 1,
pp. 66–77, Jan. 2013.
ment of Ophthalmology, China–Japan Union Hos-
[33] J. Wu, ‘‘Efficient HIK SVM learning for image classification,’’ IEEE pital of Jilin University, Changchun, in 2005,
Trans. Image Process., vol. 21, no. 10, pp. 4442–4453, Oct. 2012. and the Ph.D. degree in ophthalmology from the
[34] Z. Yin, B. Wan, F. Yuan, X. Xia, and J. Shi, ‘‘A deep normalization and Second Hospital of Jinlin University, Changchun,
convolutional neural network for image smoke detection,’’ IEEE Access, in 2013. She is currently an Associate Chief
vol. 5, pp. 18429–18438, 2017. Physician with the Department of Ophthalmology,
[35] C. Torres-Huitzil and B. Girau, ‘‘Fault and error tolerance in neural net- China–Japan Union Hospital of Jilin University.
works: A review,’’ IEEE Access, vol. 5, pp. 17322–17341, 2017. Her research focuses on ocular fundus disease and image processing. She
[36] G. Tian, M. Zhou, and P. Li, ‘‘Disassembly sequence planning consider- has authored over ten journal and conference proceedings papers in the above
ing fuzzy component quality and varying operational cost,’’ IEEE Trans. research areas.
Autom. Sci. Eng., to be published.
[37] K. Lu, W. Zhang, and B. Sun, ‘‘Multidimensional data-driven life pre-
diction method for white LEDs based on BP-NN and improved-adaboost
algorithm,’’ IEEE Access, vol. 5, pp. 21660–21668, 2017.
[38] Y. Chen and J. Z. Wang, ‘‘Image categorization by learning and rea-
soning with regions,’’ J. Mach. Learn. Res., vol. 5, pp. 913–939,
Aug. 2004.
[39] Y. Chen, J. Bi, and J. Z. Wang, ‘‘MILES: Multiple-instance learning via
HAIPENG XIA received the B.S. degree in
embedded instance selection,’’ IEEE Trans. Pattern Anal. Mach. Intell.,
automobile service engineering from Shandong
vol. 28, no. 12, pp. 1931–1947, Dec. 2006.
[40] G.-B. Huang, H. Zhou, X. Ding, and R. Zhang, ‘‘Extreme learning machine
Jiaotong University, Jinan, China, in 2015. He
for regression and multiclass classification,’’ IEEE Trans. Syst., Man, is currently pursuing the M.S. degree with Vehi-
Cybern. B, Cybern., vol. 42, no. 2, pp. 513–529, Apr. 2012. cle Operation Engineering, Jilin University. His
[41] X. Chen and M. Koskela, ‘‘Skeleton-based action recognition with extreme research interests include image processing and
learning machines,’’ Neurocomputing, vol. 149, pp. 387–396, Feb. 2015. computer vision and steering control of four-
[42] J. Xin, Z. Wang, L. Qu, and G. Wang, ‘‘Elastic extreme learning machine wheel independent steering, and unmanned path
for big data classification,’’ Neurocomputing, vol. 149, pp. 464–471, planning.
Feb. 2015.

4296 VOLUME 6, 2018

You might also like