0% found this document useful (0 votes)
28 views

An Automatic Car Accident Detection Method Based On Cooperative Vehicle Infrastructure Systems

The document proposes an automatic car accident detection method using cooperative vehicle infrastructure systems and machine vision. It establishes a new image dataset to improve accident detection accuracy under different traffic conditions. It also develops a deep learning model called YOLO-CA that can detect accidents in images quickly and accurately by using multi-scale feature fusion and dynamic weighted losses.

Uploaded by

BARLA VAISHNAVI
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

An Automatic Car Accident Detection Method Based On Cooperative Vehicle Infrastructure Systems

The document proposes an automatic car accident detection method using cooperative vehicle infrastructure systems and machine vision. It establishes a new image dataset to improve accident detection accuracy under different traffic conditions. It also develops a deep learning model called YOLO-CA that can detect accidents in images quickly and accurately by using multi-scale feature fusion and dynamic weighted losses.

Uploaded by

BARLA VAISHNAVI
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Received August 19, 2019, accepted August 30, 2019, date of publication September 11, 2019,

date of current version September 19, 2019.


Digital Object Identifier 10.1109/ACCESS.2019.2939532

An Automatic Car Accident Detection Method Based


on Cooperative Vehicle Infrastructure Systems
DAXIN TIAN , (Senior Member, IEEE), CHUANG ZHANG, XUTING DUAN, AND XIXIAN WANG
Beijing Key Laboratory for Cooperative Vehicle Infrastructure Systems and Safety Control, Beijing Advanced Innovation Center for Big Data and Brain
Computing, School of Transportation Science and Engineering, Beihang University, Beijing 100191, China
Corresponding author: Xuting Duan ([email protected])
This work was supported in part by the National Natural Science Foundation of China under Grant 61672082 and Grant 61822101, and in
part by the Beijing Municipal Natural Science Foundation under Grant 4181002.

ABSTRACT Car accidents cause a large number of deaths and disabilities every day, a certain proportion
of which result from untimely treatment and secondary accidents. To some extent, automatic car accident
detection can shorten response time of rescue agencies and vehicles around accidents to improve rescue
efficiency and traffic safety level. In this paper, we proposed an automatic car accident detection method
based on Cooperative Vehicle Infrastructure Systems (CVIS) and machine vision. First of all, a novel image
dataset CAD-CVIS is established to improve accuracy of accident detection based on intelligent roadside
devices in CVIS. Especially, CAD-CVIS is consisted of various kinds of accident types, weather conditions
and accident location, which can improve self-adaptability of accident detection methods among different
traffic situations. Secondly, we develop a deep neural network model YOLO-CA based on CAD-CVIS and
deep learning algorithms to detect accident. In the model, we utilize Multi-Scale Feature Fusion (MSFF)
and loss function with dynamic weights to enhance performance of detecting small objects. Finally, our
experiment study evaluates performance of YOLO-CA for detecting car accidents, and the results show that
our proposed method can detect car accident in 0.0461 seconds (21.6FPS) with 90.02% average precision
(AP). In additionally, we compare YOLO-CA with other object detection models, and the results demonstrate
the comprehensive performance improvement on the accuracy and real-time over other models.

INDEX TERMS Car accident detection, CVIS, machine vision, deep learning.

I. INTRODUCTION such as acceleration and velocity. However, these methods


According to the World Health Organization, there are about based on single type of features cannot meet the perfor-
1.35 million deaths and 20-50 million injuries as a result of mance need of accident detection in the aspect of accuracy
the car accident globally every year [1]. Especially, a certain and real-time. With the development of computer and com-
proportion of deaths and injuries are due to untimely treat- munication technologies, Cooperative Vehicle Infrastructure
ment and secondary accidents [2], which results from that System and Internet of Vehicles have been developed rapidly
rescue agency and vehicles around accident cannot obtain in recent years [11]–[13]. Moreover, the image recognition
quick response about the accident [3], [4]. Therefore, it is vital based on video captured by intelligent roadside devices in
important to develop an efficient accident detection method, CVIS has become one of research hotspots in the field of
which can significantly reduce both the number of deaths and intelligent transportation system [14], [15]. For traffic situa-
injuries as well as the impact and severity of accidents [5]. tion awareness, image recognition technology has advantages
Under this background, many fundamental projects and stud- of high efficiency, flexible installation and low maintenance
ies to develop efficient detection method have been launched costs. Therefore, the image recognition has been applied to
for developing and testing [6]–[10]. detection pedestrian, vehicle, traffic sign and so on success-
The traditional methods utilize vehicle motion parameters fully [16]–[20]. In generally, there are many distinctive image
captured by vehicular GPS devices to detect car accident, and video features in traffic accidents, such as vehicle colli-
sion, rollover and so on. To some extent, these features can
The associate editor coordinating the review of this manuscript and
be used to detect or predict car accidents. Accordingly, some
approving it for publication was Xianye Ben. researchers apply the machine vision technology based on

VOLUME 7, 2019 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ 127453
D. Tian et al.: Automatic Car Accident Detection Method Based on CVIS

deep-learning into methods of car accident detection. These which monitors the change of acceleration to detect accident
methods extract and process complex image features instead and utilize V2X communication technologies to report it.
of single vehicle motion parameter, which improves the accu- To a certain extent, these methods can detect and report car
racy of detecting car accidents. However, the datasets of these accidents in short time, and improve the efficiency of car acci-
methods are mostly captured by car cameras or cell phones of dents warning. However, the vehicle running condition before
pedestrian, which is not suitable for roadside devices in CVIS. car accidents is complex and unpredictable, and the accuracy
In additionally, the reliability and real-time performance of of accident detection only based on speed and acceleration
these methods need to be improved to meet the requirements may be low. In addition, they rely too heavily on vehicu-
of car accident detection. lar monitoring and communication equipment, which may
In this paper, we propose a data-driven car accident detec- be unreliable or damaged in some extreme circumstances,
tion method based on CVIS, whose goal is improving effi- such as heavy canopy, underground tunnel, and serious car
ciency and accuracy of car accident response. With the goal, accidents.
we focus on such a general application scenario when there is
an accident on the road, roadside intelligent devices recognize B. METHOD BASED VIDEO FEATURES
and locate it efficiently. First, we build a novel dataset, Car With the development of machine vision and artificial neural
Accident Detection for Cooperative Vehicle Infrastructure network technology, more and more applications based on
System dataset (CAD-CVIS), which is more suitable for car video processing have been applied in transportation and
accident detection based on roadside intelligent devices in vehicle fields. Under this background, some researchers uti-
CVIS. Then, a deep learning model YOLO-CA based on lized video features of the car accident to detect it. Refer-
CAD-CVIS is developed to detect car accident. Especially, ence [25] presented a Dynamic-Spatial-Attention Recurrent
we optimize the network of traditional deep learning models Neural Network (RNN) for anticipating accidents in dashcam
YOLO [21] to build network of YOLO-CA, which is more videos, which can predict accidents about 2 seconds before
accurate and fast in detecting car accident. In additionally, they occur with 80% recall and 56.14% precision. Refer-
considering of wide shooting scope of roadside cameras in ence [26] proposed a car accident detection system based on
CVIS, multi-scale feature fusion method and loss function first-person videos, which detected anomalies by predicting
with dynamic weights are utilized to improve performance the future locations of car participants and then monitoring
of detecting small objects. the prediction accuracy and consistency metrics. These meth-
The rest of this paper is organized as follows: ods also have some limitations because of low penetration
Section 2 gives an overviews of related work. We present the of vehicular intelligent devices and shielding effects between
details of our proposed method in Section 3. The performance vehicles.
evaluation is discussed in Section 4. Finally, Section 5 con- There are also some other methods which use roadside
clude this paper. devices instead of vehicular equipments to obtain and process
video. Reference [27] proposed a novel accident detection
II. RELATED WORK system at intersection, which composed background images
The car accident detection and notification method is a from image sequence and detected accidents by using Hid-
challenging issue and has attracted a lot of attention from den Markov Model. Reference [28] outlined a novel method
researchers. They have proposed and applied various car for modeling of interaction among multiple moving objects,
accident detection methods. In generally, car accident detec- and used the Motion Interaction Field to detect and localize
tion methods are mainly divided into the following two car accidents. Reference [29] proposed a novel approach
kinds: vehicle running condition-based and accident video for automatic road accident detection, which was based
features-based. on detecting damaged vehicles from footage received from
surveillance cameras installed in roads. In this method, His-
A. METHOD BASED ON VEHICLE RUNNING CONDITION togram of gradients (HOG) and Gray level co-occurrence
When an accident occurs, the motion state of the vehicle matrix features were used to train support vector machines.
will change dramatically. Therefore, many researchers pro- Reference [30] presented a novel dataset for car accidents
posed the accident detection method by monitoring motion analysis based on traffic Closed-Circuit Television (CCTV)
parameters, such as acceleration, velocity and so on. footage, and combined Faster Regions-Convolutional Neural
Reference [22] used On Board Diagnosis (OBD) system to Network (R-CNN) and Context Mining to detect and predict
monitor speed and engine status to detect a crash, and uti- car accidents. The method in [30] achieved 1.68 seconds in
lized smart-phone to report the accident by Wi-Fi or cellular terms of Time-To-Accident measure with an Average Preci-
network. Reference [23] developed an accident detection and sion of 47.25%. Reference [8] proposed a novel framework
reporting system using GPS, GPRS, and GSM. The speed of for automatic car accident detection, which learned feature
vehicle obtained from High Sensitive GPS receiver is consid- representation from the spatio-temporal volumes of raw pixel
ered as the index for detecting accidents, and the GSM/GPRS intensity instead of traditional hand-crafted features. The
modem is utilized to send the location of the accident. Ref- experiments of method in [8] demonstrated it can detect on
erence [24] presented a prototype system called e-NOTIFY, average 77.5% accidents correctly with 22.5% false alarms.

127454 VOLUME 7, 2019


D. Tian et al.: Automatic Car Accident Detection Method Based on CVIS

FIGURE 2. Data collection and annotation for the CAD-CVIS dataset.

FIGURE 1. The application scenario of the automatic car accident


detection method based on CVIS.

Compared with the methods based on vehicle running


condition, these methods improve the detection accuracy and
some of them even can predict accidents about 2 seconds
before they occur. To some extent, these methods are sig-
nificant in decreasing the accident rate and improving traffic
safety. However, the detection accuracy of these methods is
low and the error rate is high, and the wrong accident infor-
mation will have a great impact on the normal traffic flow.
Concerning the core issue mentioned above, in order to avoid
the drawbacks of vehicular cameras, our proposed method FIGURE 3. Number of accident frames in CAD-CVIS categorized by
utilizes the roadside intelligent edge devices to obtain traffic different indexes. (a) Accident Type (b) Weather condition (c) Accident
time (d) Accident location.
video and process image. Moreover, for sake of improving the
accuracy of accident detection method based on intelligent because of many legal reasons. (2) Abnormality: car accidents
roadside devices, we establish the CAD-CVIS dataset based are rare in the road compared with normal traffic conditions.
on video sharing websites, which is consisted of various kinds In this work, we try to draw support from video sharing web-
of accident types, weather conditions and accident locations. sites to search the videos and images including car accidents,
Moreover, we develop the model YOLO-CA to improve the such as news report and documentary. In order to improve the
reliability and real-time performance among different traffic applicability of our proposed method to roadside edge device,
conditions by combining deep learning algorithms and MSFF we only pick out the videos and images captured from a traffic
method. CCTV footage.
Through the above steps, we obtain 633 car accidents
III. METHODS scenes, 3255 accident key frames and 225206 normal frames.
A. METHOD OVERVIEW Moreover, the car accident scene only occupies a small part
The Fig. 1 shows the application principle of our proposed car of each accident frame. We utilize LabelImg [31] to annotate
accident detection method based CVIS. Firstly, the car acci- the location of the accident in each frame in detail to enhance
dent detection application program with YOLO-CA model is the accuracy of locating accident. The high accuracy enables
deployed on the edge server, which is developed based on emergency message be sent to the vehicles that are in the same
CAD-CVIS and deep learning algorithms. Then edge server direction as accident more efficiently and decrease the impact
receives and processes the real-time image captured by road- to the vehicles that are in the opposite direction. The whole
side cameras. Finally, the roadside communication unit will steps of data collection and annotation are shown in Fig. 2.
broadcast the accident emergency messages to the relevant The CAD-CVIS dataset is made available for research use
vehicles and rescue agencies by DSRC and 5G networks. through https://github.com/zzzzzzc/Car-accident-detection.
In the rest of this section, we will present the details of
CAD-CVIS and YOLO-CA model. 2) STATISTICS OF THE CAD-CVIS
Statistics of the CAD-CVIS dataset can be found in Fig. 3.It
B. CAD-CVIS can be found that the CAD-CVIS dataset includes various
1) DATA COLLECTION AND ANNOTATION types of car accidents, which can improve the adaptability of
There are two major challenges in collecting car accidents our method to different conditions. According to the number
data:(1) Access: access to roadside traffic cameras data is of vehicles in the accident, the CAD-CVIS dataset includes
often limited. In addition, the accident data from transporta- 323 Single Vehicle Accident frames, 2449 Double Vehi-
tion administration is often not available for public uses cle Accidents frames and 483 Multiple Vehicle Accidents

VOLUME 7, 2019 127455


D. Tian et al.: Automatic Car Accident Detection Method Based on CVIS

TABLE 1. Comparison between CAD-CVIS and related datasets. TABLE 2. Composition of YOLO-CA network.

frames. Moreover, the CAD-CVIS dataset covers a variety


of weather conditions, such as 2769 accident frames under
sunny condition, 268 frames under foggy condition, 52 acci-
dent frames under rainy condition and 166 accident frames
under snowy condition. Besides, there are 2588 frames of training process of YOLO, each image is divided into S × S
accidents in the daytime and 667 accident frames at night. grids. If the center of an object falls into a grid cell, that
In addition, the CAD-CVIS dataset contains 2281 frames grid cell is responsible for detecting that object [39]. This
of accidents occurring at the intersection, 596 frames in the design can improve the detection speed dramatically and the
urban road, 189 frames in the expressway and 189 frames in detection accuracy with reference to global features. How-
the highway. ever, it also will cause serious detection error when there are
Comparison between CAD-CVIS and related datasets can more than one objects in one grids. Roadside cameras have
be found in Table. 1. The A in Table. 1 responses that there a wide scope of shooting, the accident area may be small in
is annotation of car accident in the dataset. R responses that the image. Inspired of the multi-scale feature fusion (MSFF)
the videos and frames captured from the roadside CCTV network, in order to improve the performance of model to
footage. M responses that there are multiple road condi- detect small objects, we utilize 24 layers to achieve image
tions in dataset. Compared with CUHK Avenue [32], UCSD upsampling and obtain two different dimensional output ten-
Ped2 [33] and DAD [25], CAD-CVIS contains more car sors. This new car accident detection model is called as
accident scenes, which can improve the adaptability of model YOLO-CA, and the network structure diagram of YOLO-CA
based on CAD-CVIS. Moreover, the frames of CAD-CVIS is shown as Fig. 4.
are all captured from roadside CCTV footage, which is more As shown in Fig. 4, YOLO-CA is composed of 228 neural
suitable for the accident detection methods based on intelli- network layers, and the number of each kind of layer is
gent roadside devices in CVIS. shown in Table. 2. These layers constitute many kinds of
basic components of YOLO-CA network, such as DBL and
C. OUR PROPOSED DEEP NEURAL NETWORK MODEL ResN. The DBL is the minimum components of YOLO-CA
In the task of car accident detection, we must not only judge network, which is composed of Convolution layer, Batch
whether there is a car accident in the image, but also accu- Normalization layer and Leaky ReLU layer. ResN consists of
rately locate the car accident. That’s because the accurate Zero Padding layer, DBL and N Resblock_units [40], which
location guarantees that the RSU can broadcast the emer- is designed to avoid neural network degradation caused by
gency message to the vehicles affected by the accident. The increased depth. Ups in Fig. 4 is upsampling layer, which is
classification and location algorithms can be divided into utilized to improve the performance of YOLO-CA to detect
two kinds:(1) Two stage model, such as R-CNN [34], Fast small objects. Concat is concatenate layer, which is used to
R-CNN [35], Faster R-CNN [36] and Faster R-CNN with concatenate the layer in Darknet-53 and upsampling layer.
FPN [37]. These algorithms utilize selective research and
Region Proposal Network (RPN) to select about 2000 pro-
2) DETECTION PRINCIPLE
posal regions in the image, and then detection objects by the
Fig. 5 shows the detection principle of YOLO-CA, which
features of these regions extracted by CNN. These region-
includes extracting feature map and predicting bounding box.
based models locate objects accurately, but extracting pro-
As shown in Fig. 5, YOLO-CA divides the input image into
posals take a great deal of time. (2) One stage model, such
13 × 13 grid and 26 × 26 grid. The first grid is responsible for
as YOLO [21](You Only Look Once) and SSD (Single Shot
detecting the large objects, whereas the second grid makes up
MultiBox Detector) [38]. These algorithms implement loca-
for the inaccuracy of small target detection in the first grid.
tion and classification by one CNN, which can provide end to
The feature extraction networks corresponding to these two
end detection service. Because of eliminating the process of
grids are different, but the detection models of the objects
selecting the proposal regions, these algorithms are very fast
is similar. For ease of presentation, we regard the first grid
and still has guaranteeing accuracy. Considering that accident
as example to explain the training steps of YOLO-CA. The
detection requires high real-time performance, we design the
center of car accident region falls into the grid cell (7, 5),
deep neural network based on one-stage model YOLO [21].
so this cell is responsible for detecting this car accident in the
1) NETWORK DESIGN whole training process. Then the cell (7, 5) will predict three
YOLO utilizes its particular CNN to complete classification bounding boxes, and each boxes includes six parameters:
and location of multiple objects in an image at one time. In the x, y, w, h, CS, p. The (x, y) is the center point of the bounding

127456 VOLUME 7, 2019


D. Tian et al.: Automatic Car Accident Detection Method Based on CVIS

FIGURE 4. The network structure of YOLO-CA.

• Loss of (w, h), which is calculated by (2). Where


SD is square difference function. Especially, the
(2 − areaTureij ) in (1) and (2) is utilized to increase the
error punishment of small objects. Because that the same
errors of x, y, w, h cause more serious impact on the
detection effect of small object than that of large object.
S×S
XX B
Losswh = Pr (Objects) (2 − areaTureij )
i=1 j=1
1  
FIGURE 5. The detection principle of YOLO-CA. ∗ SD wij + SD hij
2
  2
SD wij = wij − w∗ij
box, and the (w, h) is the ratio of width and height of the
2
bounding box to the whole image. The CS is confidence score  
SD hij = hij − h∗ij (2)
of bounding box, which represents how confident the model
is that the bounding box contains an object and how accurate • Loss of CS, which is calculated by (3). The loss of CS
it thinks the box is that it predicts. Lastly, each bounding box can be divided into two parts: the confidence loss of
will predict class probability of car accident p. foreground and confidence loss of background.
After the training of a batch of images, the loss of model S×S B
will be calculated, which is utilized to adjust the weights of
XX
Pr (Objects) ∗ BCL CSij

LossCS =
parameters. In the calculation of loss, let the ground truth of i=1 j=1
an object is x ∗ , y∗ , w∗ , h∗ , CS ∗ , p∗ . S × S is the number of
+ (1 − Pr (Objects)) ∗ BCL CSij

cells in grid, and B is the number of predicted bounding boxes
BCL CSij = CSij∗ log CSij

of each grid cell. For each grid cell, the Pr(Objects) equals
 
1 when the cell contains center of object, whereas it equals + 1 − CSij∗ log 1 − CSij

(3)
0 when there is not center of object in the cell. For each image,
the loss of YOLO-CA is divided into the following four parts: where CS ∗ is defined by (4). In additionally, the IoUpt
• Loss of (x, y), which is calculated by (1). Where BCL is is defined in Fig. 6, which equals the intersection over
binary cross entropy loss function, and the areaTure is union (IoU) between the predicted bounding box and the
defined as w∗ ∗ h∗ . ground truth.
S×S
XX B CS ∗ = Pr(Objects) ∗ IoUpt (4)
Lossxy = Pr (Objects) (2 − areaTureij )
i=1 j=1 • Loss of p, which is calculated by (5)
   S×S B
∗ BCL xij + BCL yij XX
Pr (Objects) BCL pij

  LossCS =
BCL xij = xij∗ log xij + 1 − xij∗ log 1 − xij
 
i=1 j=1
   
BCL yij = y∗ij log yij + 1 − y∗ij log 1 − yij BCL pij = p∗ij log pij + 1 − p∗ij log 1 − pij
   
(1) (5)

VOLUME 7, 2019 127457


D. Tian et al.: Automatic Car Accident Detection Method Based on CVIS

FIGURE 6. The definition of IoU.

For each image in training set, the total loss is defined as (6).
Especially, because that the multi-scale feature fusion is used
in YOLO-CA, the loss is the sum of conditions under S = 13
and S = 26. In additionally, the loss of each batch of images
is defined as (7).

Loss_img = Lossxy + Losswh + LossCS + Lossp (6) FIGURE 7. The training results of YOLO-CA. (a) Precision (b) Recall (c) IoU
(d) Loss.
b
1 X
Loss = Loss_imgk (7)
b TABLE 3. Distribution map of prediction results.
k=1

where the b in (7) is the size of batch.

IV. EXPERIMENT
In this section, we evaluate our proposed model YOLO-CA
on the CAD-CVIS dataset. First, we give the training results
of YOLO-CA, which include the change process of sev- in iteration process. In the training process of YOLO-CA,
eral performance indexes. Then, we show the results of we regard the prediction result with IoU over 0.5 and right
some comparative experiments between YOLO-CA and other classification as true result, and other predictions are all false
detection models. Finally, the visual results are demonstrated results. As shown in Table. 3, the prediction results can be
among various types and scales of car accident objects. divided into four parts: (1) TP: Truth Positive. (2) FP: False
Positive. (3) FN: False Negative. (4) TN: True Negative. The
TP
A. IMPLEMENTATION DETAILS precision is defined as precision = TP+FP and recall is
We implement our model in TensorFlow [41] under the oper- TP
defined as recall = TP+FN .
ating system Ubuntu 18.04 and perform experiments on a As shown in Fig. 7a, with the increasing of iterations,
system with Nvidia Titan Xp GPU. We divide the CAD-CVIS the precision of YOLO-CA is increasing gradually and
dataset into three parts: (1) Training set (80%), which is used converge over 90%. Moreover, recall eventually converges
to train the parameter weight of network. (2) Validation set to more than 95%. In terms of locating performance of
(5%), which is utilized to adjust hyperparameters, such as YOLO-CA in training set, IoU finally stabilizes above
learning rate and drop out rate. (3) Test set (15%), which is 0.8. The Fig. 7d shows the decreasing process of loss of
used to evaluate the performance of different algorithms for YOLO-CA in (7), and the final convergence of loss is less
detecting car accident. In additionally, each part of dataset than 0.2.
contains all types of accident in Fig. 3. The batch size is set to
64, and the models are trained for up to 30000 iterations. The 2) COMPARATIVE EXPERIMENTS AND VISUAL RESULTS
initial learning rate is set to 0.001, and updating with iteration The comparative experiments are conducted for comparing
parameter of 0.1/10000 iterations. The SGD optimizer with a seven detection models: (1) One-stage models: SSD, our
momentum of 0.9 is utilized to adjust parameters of network. proposed YOLO-CA, traditional YOLO-v3 and YOLO-v3
Moreover, we use a weight decay of 0.0005 to prevent model without MSFF (Multi-Scale Feature Fusion). (2) Two-stage
overfitting. models: Fast R-CNN, Faster R-CNN and Faster R-CNN with
FPN. In order to comparatively demonstrate the validation
B. RESULTS AND ANALYSIS of YOLO-CA as well as confirm its strength in terms of the
1) TRAINING RESULTS OF YOLO-CA comprehensive performance on the accuracy and real-time,
Fig. 7 shows the training results of YOLO-CA, including the following indexes are selected for comparison among the
the changes of precision, recall, IoU and loss of each batch seven models:

127458 VOLUME 7, 2019


D. Tian et al.: Automatic Car Accident Detection Method Based on CVIS

advantages in accuracy performance than the other mod-


els. In additionally, our-proposed YOLO-CA can achieve
90.02% of AP, which is slightly lower than that of Faster
R-CNN with FPN (90.66%) and higher than those of Fast
R-CNN (77.66%), Faster R-CNN (82.02%), SSD (83.40%),
YOLOv3 without MSFF (83.16%) and YOLOv3 (86.20%).
Average IoU is a vital important index to evaluate locating
performance of detection models. Moreover, accurate loca-
tion is critical to car accident detection and notification, and
higher locating performance can improve the safety of the
vehicles around accident. As shown in Fig. 8b, YOLO-CA
can achieve about 0.73 of Average IoU, which is lower than
that of Faster R-CNN with FPN (0.75) and higher than those
of Fast R-CNN (0.58), Faster R-CNN (0.66), SSD (0.69),
YOLOv3 without MSFF (0.65) and YOLOv3 (0.71).
In order to compare and analysis the performance of mod-
els in details, the objects of test set is divided into three parts
according to different scales of objects:(1) Large: the area of
object is larger than one tenth of image size. (2) Medium:
the area of object is over the interval [1/100, 1/10] of image
size. (3) Small: the area of object is less than one-hundredth
of image size.
The Table. 4 shows the AP and IoU results of the seven
models among different scales of object. We can intuitively
see that the scales of objects significantly affect the accuracy
and locating performance of detection models. It can be
found that our proposed YOLO-CA has obvious advantages
in AP and Average IoU than Fast R-CNN, Faster R-CNN
and YOLOv3 without MSFF, especially among small scale of
FIGURE 8. The AP and IoU results of different models. (a) Precision-Recall objects. There is not MSFF process in the above three models,
curve (b) Average IoU. which results in that they detection the objects only rely on the
top-level features. However, although there is rich semantic
• Average Precision (AP) that is defined as the average information in top-level features, the location information
value of precision under different recall, which can be of objects is rough, which does not benefit to locate the
changed by adjusting threshold of classification confi- bounding box of objects correctly. On the contrary, there is
dence. AP index evaluate the accuracy performance of little semantic information in low-level features with high
detection models. The average precision can be calcu- resolution, but the location information of objects is accurate.
lated by (8). For small scales of objects, they make up a small proportion of
Z 1 the whole frame, and their location information is easily lost
AP = precision(r) (8) through multiple convolution processes. YOLO-CA utilizes
0 MSFF to combine top-level features and low-level features,
where r is recall. and then makes a prediction in each fused feature layer. This
•Average Intersect over Union (Average IoU) that is process reserves the rich semantic information and accurate
utilized to evaluate the object locating performance of location information simultaneously, so YOLO-CA has better
detection models. The Average IoU is the average value performance in AP and Average IoU than Fast R-CNN, Faster
of IoUs between every prediction bounding box and R-CNN and YOLOv3 without MSFF. For SSD, it uses pyra-
corresponding ground truth. midal feature hierarchy to obtain multi-scale feature maps.
• Frames Per Second (FPS). Inference time is defined as But to avoid using low-level features SSD foregoes reusing
the average time cost of detecting a frame among test set. already computed layers and instead builds the pyramid start-
FPS is the reciprocal of inference time, which is defined ing from high up in the network and then by adding several
as the average number of frames that can be detected in new layers. So SSD misses the opportunity to reuse the
one second. This indexes evaluate real-time performance higher-resolution maps of feature hierarchy, which are vital
of detection models. important for detecting small objects [37]. Moreover, the per-
The Fig. 8a shows the Precision-Recall curve of detection formance of backbone of YOLO-CA (Darknet53) is better
models among test set. It can be found that Faster R-CNN than that of SSD (VGG-16) because of using residual net-
with FPN and our proposed YOLO-CA have obvious works to avoid degradation problem of deep neural network.

VOLUME 7, 2019 127459


D. Tian et al.: Automatic Car Accident Detection Method Based on CVIS

TABLE 4. AP and IoU results of different models among different scales of object.

Benefiting from RPN, Faster R-CNN achieve about 3.5 of


FPS among test set (Faster R-CNN:3.5, Faster R-CNN with
FPN:3.6).
Although Faster R-CNN obtains significantly improve-
ment of real-time performance compared with Fast R-CNN,
there is still a big gap with one-stage models. That is because
one-stage models abandon the process of selecting proposal
regions and utilize one CNN to implement location and clas-
sification of objects. As shown in Fig. 9, SSD can achieve
15.6 of FPS among test set. The other three models based on
YOLO utilize the backbone Darknet-53 instead of VGG-16 in
SSD, and computation of the former network is significantly
FIGURE 9. The FPS of different models. less than the latter because of using the residual networks.
Therefore, the real-time performance of SSD is lower than
Therefore, YOLO-CA can achieve better results of AP YOLO-based models in our experiments. In additionally,
and Average IoU than SSD. Compared with YOLOv3, our proposed YOLO-CA simplifies the MSFF networks of
YOLO-CA utilizes loss function with dynamic weights to YOLOv3. So YOLO-CA can achieve 21.7 of FPS, which is
balance the influence of location loss among different scales higher than that of YOLOv3 (about 19.1). Because of lacking
of objects. This process increases the error punishment of MSFF process in YOLOv3 without MSFF, it has better real-
small objects, because that the same errors of x, y, w, h cause time performance (about 23.6 of FPS) than YOLO-CA, but
more serious impact on the detection effect of the small this lacking results in serious performance penalties of AP.
object than that of the large object. Consequently, YOLO- Fig. 10 show some visual results of the seven models
CA has obvious advantages in AP and Average IoU of among different scales of objects. It can be found that there
small objects than YOLOv3. The MSFF processes of Faster is a false positive in the large objects detection results of Fast
R-CNN with FPN and YOLO-CA are similar, feature pyra- R-CNN, but the other six models all have high accuracy and
mid networks is used to extract feature maps of different locating performance in large objects in Fig. 10. However,
scales and fuse these maps to obtain features with high- the locating performance of Fast R-CNN, Faster R-CNN,
semantic and high-resolution. Faster R-CNN utilizes RPN SSD, and YOLOv3 without MSFF decrease significantly in
to select about 20000 proposal regions, whereas there are medium object frame (1), and the prediction bounding box
only 13 ∗ 13 ∗ 3 + 26 ∗ 26 ∗ 3 = 2535 candidate bound- cannot fitting out the contour of car accident. Moreover, Fast
ing boxes in YOLO-CA. This difference results in Faster R-CNN, SSD, and YOLO-without MSFF cannot detect the
R-CNN has slight advantages in accuracy performance than car accident in small object frame (1). In additionally, except
YOLO-CA, but also causes serious disadvantages in real-time for Faster R-CNN with FPN and YOLO-CA, other models
performance. have serious location error in small object frame (3).
Fig. 9 shows the FPS results of different models among
test set. It can be found that the FPS of one-stage models 3) COMPARISON OF COMPREHENSIVE PERFORMANCE
is obviously higher than that of two-stage models. This low AND PRACTICALITY
performance of the two-stage models results from a great deal As analyzed above, it can be found that our proposed
of time cost of selecting proposal regions. YOLO-CA has performance advantages of detecting car acci-
Fast R-CNN utilizes time-consuming selective research dent than Fast R-CNN, Faster R-CNN, SSD, and YOLOv3 in
algorithm to select proposal regions based on color and tex- terms of accuracy, locating and real-time performance. For
ture features, which results in that Fast R-CNN only achieves YOLOv3 without MSFF, the FPS of it (23.6) is higher than
0.4 of FPS. Faster R-CNN uses the RPN that share convolu- that of YOLO-CA (21.7), and this difference is acceptable
tional layers with state-of-the-art object detection networks in the practical application of detecting car accident. How-
instead of selective research to generate proposals. ever, the AP of YOLO-CA is significantly higher than that

127460 VOLUME 7, 2019


D. Tian et al.: Automatic Car Accident Detection Method Based on CVIS

FIGURE 10. Some visual results of the seven models among different scales of objects.

of YOLOv3 without MSFF, especially for small scales of so comparing them may not be fair at this stage. But still,
object (76.51% vs 58.89%). Compared with Faster R-CNN we list the performance achieved by these methods on their
with FPN, YOLO-CA can approach the AP of it (90.66% individual datasets. ARRS [3] achieve about 63% AP with
vs 90.03%) with an obvious speed advantage. Faster R-CNN 6% false alarms. The method of [27] achieve 89.50% AP.
cost about 277ms on average to detect one frame, whereas DSA-RNN [25] achieve about 80% recall and 56.14% AP.
YOLO-CA only need 46 ms, which illustrates the speed of The method in [30] achieve about 47.25% AP. The method
YOLO-CA is about 6× faster than Faster R-CNN with FPN. of [8] achieve 77.5% AP and 22.5% false alarms. Moreover,
Car accident detection in CVIS requires high real-time per- the number of accident scenes of the datasets utilized in these
formance because of the high dynamics of vehicles. To sum- methods is limited, which will result in poor adaptability for
marize, our proposed YOLO-CA have higher practicality and new scenarios.
comprehensive performance on accuracy and real-time. V. CONCLUSION
In this paper, we have proposed an automatic car accident
4) COMPARISON WITH OTHER CAR ACCIDENT detection method based on CVIS. First of all, we present
DETECTION METHODS the application principles of our proposed method in the
Although other car accident detection methods utilize a small CVIS. Secondly, we build a novel image dataset CAD-CVIS,
private collection of datasets and do not make them public which is more suitable for car accident detection method

VOLUME 7, 2019 127461


D. Tian et al.: Automatic Car Accident Detection Method Based on CVIS

based on intelligent roadside devices in CVIS. Then we [18] D. Dooley, B. McGinley, C. Hughes, L. Kilmartin, E. Jones, and M. Glavin,
develop the car accident detection model YOLO-CA based ‘‘A blind-zone detection method using a rear-mounted fisheye camera with
combination of vehicle detection methods,’’ IEEE Trans. Intell. Transp.
on CAD-CVIS and deep learning algorithms. In the model, Syst., vol. 17, no. 1, pp. 264–278, Jan. 2016.
we combine the multi-scale feature fusion and loss function [19] X. Changzhen, W. Cong, M. Weixin, and S. Yanmei, ‘‘A traffic sign
with dynamic weights to improve real-time and accuracy detection algorithm based on deep convolutional neural network,’’ in Proc.
IEEE Int. Conf. Signal Image Process. (ICSIP), Aug. 2016, pp. 676–679.
of YOLO-CA. Finally, we show the simulation experiments [20] S. Zhang, C. Bauckhage, and A. B. Cremers, ‘‘Efficient pedestrian detec-
results of our method, which demonstrates our proposed tion via rectangular features based on a statistical shape model,’’ IEEE
methods can detect car accident in 0.0461 seconds with Trans. Intell. Transp. Syst., vol. 16, no. 2, pp. 763–775, Apr. 2015.
[21] J. Redmon and A. Farhadi, ‘‘YOLOv3: An incremental improvement,’’
90.02% AP. Moreover, the comparative experiments results Aug. 2018, arXiv:1804.02767. [Online]. Available: https://arxiv.org/abs/
show that YOLO-CA has comprehensive performance advan- 1804.02767
tages of detecting car accident than other detection models, [22] J. Zaldivar, C. T. Calafate, J. C. Cano, and P. Manzoni, ‘‘Providing
accident detection in vehicular networks through OBD-II devices and
in terms of accuracy and real-time. Android-based smartphones,’’ in Proc. IEEE 36th Conf. Local Comput.
Netw. (LCN), Oct. 2011, pp. 813–819.
REFERENCES [23] M. S. Amin, J. Jalil, and M. B. I. Reaz, ‘‘Accident detection and reporting
system using GPS, GPRS and GSM technology,’’ in Proc. Int. Conf.
[1] WHO. Global Status Report on Road Safety 2018. Accessed: Dec. 2018. Inform., Electron. Vis., 2012, pp. 640–643.
[Online]. Available: https://www.who.int/violence_injury_prevention/ [24] M. Fogue, P. Garrido, F. J. Martinez, J.-C. Cano, C. T. Calafate, and
road_safety_status/2018/en/ P. Manzoni, ‘‘Automatic accident detection: Assistance through communi-
[2] H. L. Wang and M. A. Jia-Liang, ‘‘A design of smart car accident rescue cation technologies and vehicles,’’ IEEE Veh. Technol. Mag., vol. 7, no. 3,
system combined with WeChat platform,’’ J. Transp. Eng., vol. 17, no. 2, pp. 90–100, Sep. 2012.
pp. 48–52, Apr. 2017. [25] F.-H. Chan, Y.-T. Chen, Y. Xiang, and M. Sun, ‘‘Anticipating accidents
[3] Y. K. Ki and D. Y. Lee, ‘‘A traffic accident recording and reporting model at in dashcam videos,’’ in Computer Vision-ACCV 2016. Springer, 2017,
intersections,’’ IEEE Trans. Intell. Transp. Syst., vol. 8, no. 2, pp. 188–194, pp. 136–153.
Jun. 2007. [26] Y. Yao, M. Xu, Y. Wang, D. J. Crandall, and E. M. Atkins, ‘‘Unsu-
[4] W. Hao and J. Daniel, ‘‘Motor vehicle driver injury severity study under pervised traffic accident detection in first-person videos,’’ Mar. 2019,
various traffic control at highway-rail grade crossings in the United States,’’ arXiv:1903.00618. [Online]. Available: https://arxiv.org/abs/1903.00618
J. Saf. Res., vol. 51, pp. 41–48, Dec. 2014. [27] S. Kamijo, Y. Matsushita, K. Ikeuchi, and M. Sakauchi, ‘‘Traffic monitor-
[5] J. White, C. Thompson, H. Turner, H. Turner, and D. C. Schmidt, ing and accident detection at intersections,’’ IEEE Trans. Intell. Transp.
‘‘Wreckwatch: Automatic traffic accident detection and notification Syst., vol. 1, no. 2, pp. 108–118, Jun. 2000.
with smartphones,’’ Mobile Netw. Appl., vol. 16, no. 3, pp. 285–303, [28] K. Yun, H. Jeong, K. M. Yi, S. W. Kim, and J. Y. Choi, ‘‘Motion interaction
Jun. 2011. field for accident detection in traffic surveillance video,’’ in Proc. 22nd Int.
[6] S. Sadek, A. Al-Hamadi, B. Michaelis, and U. Sayed, ‘‘Real-time auto- Conf. Pattern Recognit., Aug. 2014, pp. 3062–3067.
matic traffic accident recognition using HFG,’’ in Proc. 20th Int. Conf. [29] V. Ravindran, L. Viswanathan, and S. Rangaswamy, ‘‘A novel approach to
Pattern Recognit., Aug. 2010, pp. 3348–3351. automatic road-accident detection using machine vision techniques,’’ Int.
[7] A. Shaik, N. Bowen, J. Bole, G. Kunzi, D. Bruce, A. Abdelgawad, and J. Adv. Comput. Sci. Appl., vol. 7, no. 11, pp. 235–242, 2016.
K. Yelamarthi, ‘‘Smart car: An IoT based accident detection system,’’ in [30] A. P. Shah, J.-B. Lamare, T. Nguyen-Anh, and A. Hauptmann, ‘‘CADP:
Proc. IEEE Global Conf. Internet Things (GCIoT), Dec. 2018, pp. 1–5. A novel dataset for CCTV traffic camera based accident analysis,’’ in
[8] D. Singh and C. K. Mohan, ‘‘Deep spatio-temporal representation for Proc. 15th IEEE Int. Conf. Adv. Video Signal Based Surveill., Nov. 2018,
detection of road accidents using stacked autoencoder,’’ IEEE Trans. Intell. pp. 1–9.
Transp. Syst., vol. 20, no. 3, pp. 879–887, Mar. 2019. [31] Tzutalin. (2015). Labelimg. Git Code. [Online]. Available: https://github.
[9] M. Zheng, T. Li, R. Zhu, J. Chen, Z. Ma, M. Tang, Z. Cui, and Z. Wang, com/tzutalin/labelImg
‘‘Traffic accident’s severity prediction: A deep-learning approach-based [32] C. Lu, J. Shi, and J. Jia, ‘‘Abnormal event detection at 150 FPS in MAT-
CNN network,’’ IEEE Access, vol. 7, pp. 39897–39910, 2019. LAB,’’ in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2013, pp. 2720–2727.
[10] L. Zheng, Z. Peng, J. Yan, and W. Han, ‘‘An online learning and unsu- [33] V. Mahadevan, W. Li, V. Bhalodia, and N. Vasconcelos, ‘‘Anomaly detec-
pervised traffic anomaly detection system,’’ Adv. Sci. Lett., vol. 7, no. 1, tion in crowded scenes,’’ in Proc. IEEE Comput. Soc. Conf. Comput.
pp. 449–455, 2012. Vis. Pattern Recognit., Jun. 2010, pp. 1975–1981. [Online]. Available:
[11] Y. Fangchun, W. Shangguang, L. Jinglin, L. Zhihan, and S. Qibo, http://ieeexplore.ieee.org/document/5539872/
‘‘An overview of Internet of vehicles,’’ China Commun., vol. 11, no. 10, [34] R. Girshick, J. Donahue, T. Darrell, and J. Malik, ‘‘Rich feature hierarchies
pp. 1–15, Oct. 2014. for accurate object detection and semantic segmentation,’’ in Proc. IEEE
[12] C. Ma, W. Hao, A. Wang, and H. Zhao, ‘‘Developing a coordinated Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2014, pp. 580–587.
signal control system for urban ring road under the vehicle-infrastructure [35] R. Girshick, ‘‘Fast R-CNN,’’ in Proc. IEEE Int. Conf. Comput. Vis.,
connected environment,’’ IEEE Access, vol. 6, pp. 52471–52478, 2018. Dec. 2015, pp. 1440–1448.
[13] S. Zhang, J. Chen, F. Lyu, N. Cheng, W. Shi, and X. Shen, ‘‘Vehicular [36] S. Ren, K. He, R. Girshick, and J. Sun, ‘‘Faster R-CNN: Towards real-time
communication networks in the automated driving era,’’ IEEE Commun. object detection with region proposal networks,’’ in Proc. Int. Conf. Neural
Mag., vol. 56, no. 9, pp. 26–32, Sep. 2018. Inf. Process. Syst., 2015, pp. 91–99.
[14] Y. Wang, D. Zhang, Y. Liu, B. Dai, and L. H. Lee, ‘‘Enhancing transporta- [37] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie,
tion systems via deep learning: A survey,’’ Transp. Res. C, Emerg. Technol., ‘‘Feature pyramid networks for object detection,’’ in Proc. IEEE Conf.
2018. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2016, pp. 936–944.
[15] G. Wu, F. Chen, X. Pan, M. Xu, and X. Zhu, ‘‘Using the visual intervention [38] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and
influence of pavement markings for rutting mitigation—Part I: Prelimi- A. C. Berg, ‘‘SSD: Single shot multibox detector,’’ in Proc. Eur. Conf.
nary experiments and field tests,’’ Int. J. Pavement Eng., vol. 20, no. 6, Comput. Vis., 2016, pp. 21–37.
pp. 734–746, 2019. [39] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, ‘‘You only look once:
[16] S. Ramos, S. Gehrig, P. Pinggera, U. Franke, and C. Rother, ‘‘Detecting Unified, real-time object detection,’’ in Proc. IEEE Conf. Comput. Vis.
unexpected obstacles for self-driving cars: Fusing deep learning and geo- Pattern Recognit., Jun. 2016, pp. 779–788.
metric modeling,’’ in Proc. IEEE Intell. Vehicles Symp. (IV), Jun. 2017, [40] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for image
pp. 1025–1032. recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
[17] T. Qu, Q. Zhang, and S. Sun, ‘‘Vehicle detection from high-resolution aerial Jun. 2016, pp. 770–778.
images using spatial pyramid pooling-based deep convolutional neural [41] M. Abadi et al., ‘‘TensorFlow: Large-scale machine learning on hetero-
networks,’’ Multimedia Tools Appl., vol. 76, no. 20, pp. 21651–21663, geneous distributed systems,’’ Mar. 2015, arXiv:1603.04467. [Online].
2017. Available: https://arxiv.org/abs/1603.04467

127462 VOLUME 7, 2019


D. Tian et al.: Automatic Car Accident Detection Method Based on CVIS

DAXIN TIAN is currently a Professor with the XUTING DUAN is currently a Lecturer with the
School of Transportation Science and Engineer- School of Transportation Science and Engineer-
ing, Beihang University, Beijing, China. His cur- ing, Beihang University, Beijing, China. His cur-
rent research interests include mobile computing, rent research interests include connected vehicles,
intelligent transportation systems, vehicular ad hoc vehicular ad hoc networks, and vehicular
networks, and swarm intelligent. localization.

CHUANG ZHANG is currently pursuing the XIXIAN WANG is currently pursuing the
master’s degree with the School of Transporta- master’s degree with the School of Transporta-
tion Science and Engineering, Beihang University, tion Science and Engineering, Beihang University,
Beijing, China. His current research interests Beijing, China. His current research interests
include multimedia communications and process- include image processing and machine learning.
ing and machine learning.

VOLUME 7, 2019 127463

You might also like