2 PDF
2 PDF
net/publication/281363833
CITATIONS READS
121 1,024
6 authors, including:
All content following this page was uploaded by Yang Zheng on 17 July 2020.
a r t i c l e i n f o a b s t r a c t
Article history: This paper considers a comprehensive naturalistic driving experiment to collect driving data under poten-
Received 5 November 2014 tial threats on actual Chinese roads. Using acquired real-world naturalistic driving data, a near-crash
Received in revised form 11 May 2015 database is built, which contains vehicle status, potential crash objects, driving environment and road
Accepted 3 July 2015
types, weather condition, and driver information and actions. The aims of this study are summarized
into two aspects: (1) to cluster different driving-risk levels involved in near-crashes, and (2) to unveil
Keywords:
the factors that greatly influence the driving-risk level. A novel method to quantify the driving-risk level
Naturalistic driving study
of a near-crash scenario is proposed by clustering the braking process characteristics, namely maximum
Driving risk
Near-crash
deceleration, average deceleration, and percentage reduction in vehicle kinetic energy. A classification
Classification and regression tree (CART) and regression tree (CART) is employed to unveil the relationship among driving risk, driver/vehicle char-
K-mean cluster acteristics, and road environment. The results indicate that the velocity when braking, triggering factors,
potential object type, and potential crash type exerted the greatest influence on the driving-risk levels
in near-crashes.
© 2015 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.aap.2015.07.007
0001-4575/© 2015 Elsevier Ltd. All rights reserved.
J. Wang et al. / Accident Analysis and Prevention 84 (2015) 54–64 55
significantly associated with accident severity. These studies pro- occur. In the experiments, near-crash events in naturalistic driv-
vided some insights into the factors that affect the likelihood of ing were identified by detecting unusual vehicle kinematics using
a vehicle accident. However, they were typically based on official accelerometers and gyroscopic sensors installed in the experimen-
traffic accident statistics, which have two major limitations: (1) tal vehicle (Wu and Jovanis, 2013; Wu et al., 2014).
lack of detailed driving data, and (2) difficult to collect and acquire Recently, a few studies have focused on the assessment of risk
(usually collected by traffic police agencies). Hence, the aforemen- in the driving environment, for example, individual driving risk
tioned studies usually do not consider the relationship between (Guo and Fang, 2013) and momentary risk perception of a driv-
the accident severity and detailed driving data (e.g., vehicle speed, ing situation (Lu et al., 2012; Charlton et al., 2014). These studies
acceleration, braking, and steering information). employed indicators such as driver attributes and vehicle kinetic
Recent developments in vehicle instrumentation techniques parameters to represent the risk level. Besides, critical braking
have made monitoring naturalistic driving behavior and obtaining and speed profiles were proposed to characterize the near-crashes
detailed driving data both technologically possible and economi- in Moreno and García (2013) and Bagdadi (2013). In the present
cally feasible. For instance, NHTSA sponsored the project “100-Car paper, we propose a novel method to quantify the driving-risk
Naturalistic Driving Study” which is a large-scale instrument- involved in a near-crash event. First, the driving-risk level is repre-
vehicle study to collect naturalistic driving data in the United States sented by the braking process characteristics, namely (1) maximum
(Dingus et al., 2006). A series of technology tests of safety equip- deceleration, (2) average deceleration, and (3) percentage reduc-
ment was conducted in Michigan using the naturalistic driving tion in vehicle kinetic energy. Then, the K-means cluster method is
technique (UMTRI and GMRDC, 2005). Takeda et al. (2011) reported employed to classify near-crashes into different-risk levels based
a comprehensive project involving collecting large amounts of on the three aforementioned braking process features. Then, CART
driving data on the actual road to study driver behavior and is employed for exploring the relationship among driving risk,
accident-causation-mechanism. With access to naturalistic driving driver/vehicle characteristics, and road environments. Identifying
data, traffic safety-related events could be observed and measured the factors associated with driving risk and further predicting high-
more precisely (Wu et al., 2014). Meanwhile, many researchers risk driving scenarios will enable the adoption of proper safety
have proposed new methods and gained new insights into traffic countermeasures to reduce probable hazardous situations for high-
safety (e.g., Malta et al., 2009; Aoude et al., 2012; Guo et al., 2010; risk groups, and thus improve overall driving comfort and safety. By
Jovanis et al., 2011; Jonasson and Rootzén, 2014). For instance, analyzing driver characteristics, road conditions, and vehicle char-
Malta et al. (2009) proposed a method to improve the under- acteristics using the near-crash database, we obtained new insights
standing of driver behavior under potential threats using a large into driving risk. The results indicate that the velocity when braking
real-world driving database. Guo et al. (2010) assessed the fac- (V BRA), triggering factors (T FAC), potential object type (O TYP),
tors associated with individual driver risk using naturalistic driving and potential crash type (P CRA) had the greatest influence on
data. For naturalistic driving data, crash surrogates have received the driving-risk level involved in near-crashes. These results can
extensive research attention (see Guo et al., 2010; Wu and Jovanis, improve our understanding of the factors that affect driving risk,
2012, 2013; Moreno and García, 2013, for examples), because and help create polices and countermeasures to improve driving
the number of crashes observed with naturalistic driving is typi- safety and comfort.
cally small. Near-crash is frequently used as a surrogate measure The remainder of this paper is organized as follows: Section 2
for assessing the safety impact. For instance, Guo et al. (2010) describes the near-crash database and presents some preparations,
employed two metrics, namely, precision and bias of risk estima- including experiment design, labeling protocol and driving-risk
tion, to assess near-crashes, and indicated that using near-crashes definition. The methodology employed in this study is presented
as a crash surrogate could provide definite benefit when data about in Section 3. Section 4 discusses the results, and some concluding
a sufficient number of crashes are not available. Recently, Wu remarks are given in Section 5.
and Jovanis (2013) proposed a multi-stage modeling framework
to search through naturalistic driving data and extract near-crash
2. Database and preparation
events. All of these studies have demonstrated that naturalistic
driving data could provide more controllable laboratory data as a
To build a firm foundation for the assessment of driving risk and
useful supplement for traffic safety studies, and has the potential
enhancing driving safety, two components are essential: (1) real-
to further our understanding of crash causality, as well as improve
driving data and (2) careful experimental design. Data collection is
road safety. Naturalistic driving data could not only provide more
performed using naturalistic and low-intervention methods under
detailed driving exposure data, but also present the probability
actual traffic conditions. This section introduces the experimen-
to identify more plausibly risky driving events and the associated
tal equipment and experiment design, describes the near-crash
factors.
database, and presents the definition and cluster analysis of driving
risk.
1.2. Preview of the key results
This study focuses on the analysis of factors that influence driv- 2.1. Data-collection equipment and experiment design
ing risk using a naturalistic driving database. This database was
obtained through designing a novel transcription protocol to code 2.1.1. Data-collection equipment
naturalistic driving data, which have two distinguishing features: The naturalistic driving experiments were conducted using
(1) drivers drive in their normal states and (2) the instruments a Honda Crosstour, which was provided by Honda. The vehi-
installed in vehicles can record drivers and road environments cle was equipped with instruments to collect driver, vehicular,
continuously during driving (Jovanis et al., 2011). The naturalistic and road data under real-world conditions. The data-collection
database used herein contains only near-crash events because no system installed in the experimental vehicle included two driv-
actual crashes happened during the naturalistic experiments con- ing recorders (DR) and four cameras (Fig. 1). The four cameras
ducted on actual Chinese roads. Near-crashes refer to cases where were used to record detailed video scenes including (1) forward
drivers execute rapid evasive maneuvers (i.e., emergency braking view, (2) right-side forward view, (3) left-side forward view, and
and/or steering operation) when facing a potential driving risk or a (4) driver’s facial expression. One DR recorded data obtained by
potential threat; in the absence of such an action, a real crash may sensors, including GPS, brake signal, steering signal, three-axis
56 J. Wang et al. / Accident Analysis and Prevention 84 (2015) 54–64
Table 1
Schedule of entire experiment.
Table 2
Road types in experiment.
Road type 1 2 3 4
1) Vehicle status.
2) Potential crash objects.
9
3) Driving environment and road types.
braking signal
Longitudinal Acc eleration
4) Weather condition.
6 5) Driver information and driver actions.
Lateral Acc eleration
Ac c eleration(m/s 2)
-3
Table 3
Near-crashes on different road types.
-6
-10 -5 0 5 10 Road type 1 2 3 4
Time(s) Number 39 246 489 138
Fig. 2. Example of recorded driving signals for typical near-crash case. 1, highway; 2, city ring road; 3, inner-city road; 4, rural road.
J. Wang et al. / Accident Analysis and Prevention 84 (2015) 54–64 57
Table 4
Definition of transcription protocol.
Vehicle status
Velocity when braking V BRA Continuous Vehicle speed when driver triggers braking signal or turn point of the
acceleration signal (m/s)
Maximum deceleration D MAX Continuous Maximum deceleration during emergency braking (m/s2 )
Time interval of braking T IN Continuous Time interval between braking signal trigger and time point of
maximum deceleration (s)
Velocity reduction V RED Continuous Vehicle-speed reduction from braking signal trigger to time point of
maximum deceleration (m/s)
Vehicle status before braking V STA Qualitative 1: Deceleration, 2: acceleration, 3: constant speed
Vehicle maneuver V MAN Qualitative 1: Straight motion, 2: right turn, 3: left turn, 4: lane change, 5: others
Potential crash object
Crash object type O TYP Qualitative 1: Vehicle, 2: single-track vehicle (motorcycle and bicycle), 3:
pedestrian, 4: others (e.g., barrier block)
Potential crash type P CRA Qualitative 1: Rear end, 2: conflict in intersection, 3: pedestrian conflict, 4:
opposite driving conflict, 5: cut-in conflict, 6: others
Triggering factors T FAC Qualitative 0: Sudden change of object status, 1: traffic light, 2: lane reduction, 3:
lane change, 4: active braking, 5: others
Driving environment and road type
Near-crash location N LOC Qualitative 1: Intersection, 2: non-intersection
Road Condition R CON Qualitative 1: Structure road, 2: normal road, 3: hybrid road, 4: rural road
Parking vehicle along road P VEH Qualitative 0: No, 1: yes
side
Safety barriers for opposing B OVE Qualitative 0: No, 1: yes
vehicles
Safety barriers for vehicles B VEH Qualitative 0: No, 1: yes
and pedestrians
Weather condition
Weather WEA Qualitative 1: Sunny, 2:cloudy, 3: others
Light condition L CON Qualitative 1: Daylight, 2: dusk
Driver information and actions
Gender GEN Qualitative 1: Male, 2: female
Age AGE Continuous Driver age (years). Further categorized into five groups, 1: 0–30, 2:
31–40, 3: 41–50, 4: 51–60, 5: >60
Time span with driving T DIR Continuous Time period of possessing valid driving license (years)
license
Steering light S LIG Qualitative 0: No, 1: yes
Vehicle horns V HON Qualitative 0: No, 1: yes
Second Task S TASK Qualitative 0: No, 1: talking, 2: others
environment. Graduate students with driving license served as characteristics. Intuitively, the driving risk is higher if the braking
volunteer taggers to manually label the recorded 912 near-crashes maneuver is performed with greater urgency in a near-crash. By
according to the designed transcription protocol. Finally, we devel- clustering braking process characteristics, this paper proposes a
oped the near-crash database. The transcription protocol is defined novel method to quantify the driving risk involved in a near-crash
in Table 4. It should be noted that the specific definitions of each event. Fig. 3 shows the key points for defining a typical deceleration
item in Table 4 are based on the actual characteristics of near- curve during braking. The following three features are adopted to
crash events and may have differences with the protocols for coding represent the driving-risk level of a typical near-crash case:
standard crash events, for example, that in Montella et al. (2013).
1) Maximum deceleration during braking process amin .
2.3. Definition and cluster of driving risks 2) Average deceleration aaverage from the braking trigger point t0 to
the point of maximum deceleration t1 .
The primary risk measure in vehicle safety evaluation is crash
occurrence. Many studies have been conducted to identify the fac-
tors that significantly influence the injury severity of crashes using
6
the logit-based model and some related data-mining techniques braking signal
such as decision tree and SVM. However, research on naturalistic Longitudinal Acc eleration
driving risk in the traffic and human-factor field has been limited. 3
Ac c eleration(m/s 2)
where v(t) and a(t) denote the vehicle’s velocity and acceleration, 0.6
High risk
respectively.
E
η
0.4 Moderate risk
The percentage reduction in vehicle kinetic energy E is calcu-
lated as follows: Low risk
Table 5
Characteristics of driving-risk groups.
Risk groups Number of near-crash cases Percentage Mean of braking process features
where p(Y = i) is the proportion of observations in node dataset Y 3.3. Variable importance
belonging to class i. If all observations in one node belong to one
class, the Gini index of that node is zero, which means that the node One of the outputs of the CART technique is variable importance,
is pure and has reached a homogenous state. which characterizes a variable’s ability to influence the model. The
The node-splitting criterion based on the Gini index aims to relative importance of variable xj is calculated as follows:
obtain the maximum decrease in the impurity of node dataset Y
by finding the best partition x* of observations, and then partition
T
nt
Vim(xj ) = Gini(Yt , xj ), (7)
node dataset Y into two child node subsets Yl and Yr , as follows: N
t=1
maxGini(Y, x)
x∈X (6) where Vim(xj ) denotes the relative importance of variable xj ;
Gini(Y, x) = Gini(Y ) − p(Yl )Gini(Yl ) − p(Yr )Gini(Yr ) Gini(Yt , xj ) is the reduction in the Gini index obtained by split-
ting variable xj at node t, according to (6); nt is the total number of
where Gini(Y, x) represents decrease in impurity, x ∈ X denotes the observations in dataset Yt belonging to node t; N is the total number
set of splits generated by all features, Yl and Yr are, respectively, the of observations; and T is the number of nodes in CART. The variable
left and right child nodes of node dataset Y; and p(Yl ) and p(Yr ) are with the largest number according to (7) is regarded as the most
the proportions of observations in node dataset Y belonging to the important variable with respect to the others.
left and right child nodes, respectively.
Tree growing is arrested based on two criteria: (1) minimum 4. Results and discussion
decrease of impurity equals 0.001; and (2) maximum number of
tree levels equals six. CART searches for the best split that maxi- 4.1. Data distribution of driving-risk level
mizes (6). From this procedure, CART can be created recursively,
which usually leads to saturation and overfitting of the training Nineteen predictor variables and one target variable (the
dataset. Saturated trees do not perform well when applied to a driving-risk level) are used in the CART model to identify the impor-
new case, which means that the tree structure overfits the infor- tant pattern that reflects the relationship among driving-risk level,
mation contained in the training data, including the useless noise driver/vehicle characteristics, and road environment. As can be
information, and it cannot reveal the real pattern behind the data. inferred from Table 4, these 19 predictor variables include vehicle
Hence, the data are usually divided into two subsets: (1) learning status (e.g., vehicle maneuver), potential crash object (e.g., crash-
(or training) set and (2) testing (or validation) set. The training set object type and triggering factors), driving environment and road
is used to construct the tree, and the testing set is used to validate types (e.g., near-crash locations), weather condition (e.g., weather
the tree performance. The saturated tree should be pruned accord- and light condition), and driver information and driver actions (e.g.,
ing to the cost-complexity algorithm that achieves a compromise driver gender and age).
between predictive accuracy and tree complexity. The main idea is Table 6 lists the information on driving-risk level in terms of
to remove the branches and merge the nodes that contribute little the predictor variables, which indicates that traffic light in the
to the predictive value of a tree. A more detailed description of the fifth predictor variable T FAC is an important factor affecting the
CART analysis and related applications can be found in Breiman driving-risk level because a relatively high proportion of near-
et al. (1984). Analyses were performed using the SPSS software crashes caused by sudden changes in traffic light status occurs in
application. the moderate- and high-risk groups (55.3% and 35.0%, respectively).
From the sixth predictor variable N LOC, we find similar statisti-
3.2. Rule extraction cal results, where the proportions of near-crashes at intersections
are relatively higher in the moderate- and high-risk groups (44.5%
The CART structure can be transformed into decision rules of and 12.6%, respectively) than those away from intersection (38.0%
the ‘IF–THEN’ type to extract potentially useful information, which and 5.2%, respectively). Other meaningful findings listed in Table 6
can be understood easily and intuitively by engineers and poli- include finding that as the braking speed increases, the proportions
cymakers. Many researchers using DTs to analyze traffic accident of near-crash cases in the moderate- and high- risk groups increase.
severity have extracted useful rules for discovering behaviors that The proportions in the moderate- and high-risk groups are, respec-
occur within a specified dataset (please see Montella et al., 2011, tively, 46.4% and 13.9%, when the speed at the braking point ranges
2012; de Oña et al., 2013; Abellán et al., 2013 and the references from 10 to 20 m/s, whereas those when the speed at the braking
therein). point ranges from 0 to 10 m/s are, respectively, 34.7% and 2.8%, as
The decision rules extracted from CART take a logic condi- shown under the 19th predictor variable V BRA.
tional structure ‘X → C’, where X denotes a set of statues of several The aforementioned preliminary statistical results are con-
attribute variables and C is the only statue of target variable, which sistent with the analysis result obtained from CART, which is
is driving-risk level in our case. In CART, rules (IF–THEN structure) presented in the next section.
begin with the tree-root node, and each variable used in the split-
ting criterion for node partition generates the IF of the rules, which 4.2. CART analysis
ends in leaf nodes with a THEN status. The THEN status is the status
of leaf nodes that take the largest number of observations, which, For the CART model, the 912 near-crashes are randomly divided
in our case, is the driving-risk level. into two subsets – one for learning and the other for testing.
60 J. Wang et al. / Accident Analysis and Prevention 84 (2015) 54–64
Table 6
Distribution of driving-risk levels by predictor variable.
Num Variable Description Count Driving risk level Num Variable Description Count Driving risk level
code code
LR MR HR LR MR HR
52.0% 40.2% 7.8% 52.0% 40.2% 7.8%
1 V STA Deceleration 265 48.7% 43.8% 7.5% 9 B OVE No 372 58.9% 37.4% 3.8%
Acceleration 531 55.4% 37.5% 7.2% Yes 540 47.2% 42.2% 10.6%
Constant speed 116 44.0% 44.8% 11.2% 10 B VEH No 537 55.9% 37.8% 6.3%
2 V MAN Straight motion 778 51.0% 40.4% 8.6% Yes 375 46.4% 43.7% 9.9%
Right turn 38 65.8% 31.6% 2.6% 11 WEA Sunny 727 51.2% 41.3% 7.6%
Left turn 41 65.9% 34.1% 0.0% Cloudy 147 55.8% 34.7% 9.5%
Others 9 44.4% 44.4% 11.1% 12 L CON Daylight 796 52.3% 40.2% 7.5%
3 O TYP Vehicle 596 55.0% 40.4% 4.5% Dusk 116 50.0% 40.5% 9.5%
Single-track 98 72.4% 21.4% 6.1% 13 GEN Male 661 51.0% 40.5% 8.5%
vehicle
Pedestrian 69 60.9% 37.7% 1.4% Female 251 54.6% 39.4% 6.0%
Others 149 22.1% 53.0% 24.8% 14 AGE 0–30 145 50.3% 41.4% 8.3%
4 P CRA Rear end 349 51.3% 45.0% 3.7% 31–40 291 54.0% 39.9% 6.2%
Conflict during 70 61.4% 32.9% 5.7% 41–50 232 48.7% 40.9% 10.3%
intersection
Pedestrian conflict 65 60.0% 36.9% 3.1% 51–60 202 56.9% 35.6% 7.4%
Others 191 63.4% 30.4% 6.3% 11–20 380 56.1% 37.1% 6.8%
5 T FAC Sudden change of 723 57.7% 37.6% 4.7% 21–30 157 46.5% 44.6% 8.9%
object status
Traffic light 103 9.7% 55.3% 35.0% >30 70 47.1% 48.6% 4.3%
Lane reduction 9 77.8% 22.2% 0.0% 16 S LIG No 784 51.3% 40.3% 8.4%
Lane change 33 48.5% 48.5% 3.0% Yes 128 56.3% 39.8% 3.9%
Active Braking 26 57.7% 42.3% 0.0% 17 V HON No 859 51.1% 41.0% 7.9%
6 N LOC Intersection 317 42.9% 44.5% 12.6% 18 S TASK No 784 52.0% 40.4% 7.5%
Non-intersection 595 56.8% 38.0% 5.2% Talking 125 51.2% 39.2% 9.6%
7 R CON Structured road 285 46.0% 43.5% 10.5% Others 3 66.7% 33.3% 0.0%
Normal road 238 46.2% 43.3% 10.5% 19 V BRA (0, 10] 501 62.5% 34.7% 2.8%
Hybrid road 251 62.5% 31.9% 5.6% (10, 20] 388 39.7% 46.4% 13.9%
Rural road 138 55.1% 43.5% 1.4% (10, + ∞] 23 30.4% 56.5% 13.0%
Note: Num denotes the index of predictor variables, and LR, low-risk group; MR, moderate-risk group; HR, high-risk group.
Fig. 5 shows the classification tree generated by CART, where a tree with 17 nodes and 9 terminal nodes. The decision rules
70% of the entire observation set is applied for learning and extracted from CART are listed in Table 7. All probabilities of deci-
the remaining observations (30%) are applied for testing, as in sion rules are higher than 52.0%, with 76.7% being the highest value
Montella et al. (2012) and de Oña et al. (2013). CART created (rule 1).
J. Wang et al. / Accident Analysis and Prevention 84 (2015) 54–64 61
Table 7
Description of rules obtained from CART.
Note: probability means percentage of observations in which the rule is accurate; LR, low-risk group; MR, moderate-risk group, HR, high-risk.
The root variable that generates the CART is T FAC (see Fig. 5), To further understand the performance of CART, comparisons of
indicating that the single best variable that classifies the driving- model predictions between the observed and predicted risk levels
risk level is the triggering factor that leads to the braking maneuver. for the learning and testing data are summarized in Table 8. The
CART directs the triggering factor that involves traffic light to the overall model prediction accuracy for the learning data is approx-
left, forming node 1, and directs the remaining triggering factors imately 66% and that for the testing data is approximately 62%,
to the right, forming node 2. For node 1 and depending on the which is within a reasonable range compared with the other stud-
braking speed (V BRA), nodes 3 and 4 are obtained with differ- ies on traffic accident severity in which classification methods were
ent driving-risk levels. Near-crashes are high-risk (probability of applied. For instance, Abdelwahab and Abdel-Aty (2001) used a
63.2%) if V BRA is greater than 13.7 m/s, (rule 4) and moderate-risk neural network method and achieved accuracies of 65.6% and 60.4%
(probability of 76.7%) if V BRA is less than 13.7 m/s (rule 3). This in the training and testing phases, respectively. de Oña et al. (2013)
result shows the direct relationship between moderate- and high- obtained 55% and 54% accuracy when they applied DT using dif-
risk near-crashes and sudden changes in traffic lights with high ferent algorithms (C4.5 and CART, respectively). The prediction
vehicle speeds. This result is consistent with the statistical results performance of CART demonstrates that the CART structure can
presented in the previous section. reflect the pattern hidden behind naturalistic data to some extent.
The rest of the rules are attributed to the triggering factors other The main objective is to identify the risk factors that affect the
than traffic lights (node 2). After this node, the CART is split accord- driving-risk level using CART in conjunction with the near-crash
ing to V BRA, and near-crashes with braking speeds of less than database. The statistical results listed in Table 6, CART structure
12.03 m/s are sent to the left, forming node 5; the remaining cases shown in Fig. 5, and rules listed in Table 7 present some clues and
are sent to the right, forming node 6. Based on the triggering factors relationships. The next section discusses in depth the risk factors
and braking speed in node 6, nodes 9 and 10 are obtained depend- that affect driving-risk level.
ing on the potential crash type (P CRA). If P CRA denotes cut-in
conflicts, the near-crashes are low-risk, with a probability of 52%
4.3. Risk factors affecting driving risk
(rule 10). However, if P CRA is of another type, the near-crashes
are of moderate risk with a probability of 63.2% (rule 9). In node 5,
The variable importance obtained from CART is used to quan-
the CART continues to grow according to the potential object type
tify the influence of potential risk factors on driving-risk level.
(O TYP). If O TYP is not a vehicle after node 5, the near-crash case
Table 9 lists the normalized importance of these variables. Sixteen
is low-risk with a probability of 76.0% (node 8 and rule 8). When
variables influencing the driving-risk level are detected, with val-
the O TYP is a vehicle (node 7), the CART is divided according to
ues varying from 100% to 0.1%. It is observed that four variables,
P CRA. From this point in the CART structure, rule interpretation
namely, (1) velocity when braking (V BRA), (2) triggering factor
is difficult because multiple variables are involved in near-crashes.
(T FAC), (3) potential object type (O TYP) and (4) potential crash
However, from the CART structure shown in Fig. 5, the following
type (P CRA), have the largest influence on the driving-risk level.
results are highlighted: if P CRA is opposite driving conflict, cut-in
Meanwhile, the other variables such as driver age (AGE), vehicle
conflict, or others, the near-crashes are low-risk with a probability
maneuver (V MAN), second task (S TASK), barriers for opposing
of 69.2% (node 12 and rule 12). If P CRA is rear end conflict, conflict
traffic flow (B OVE), and vehicles parked along the roadside (P VEH)
during intersection, or jump out, CART is divided by V BRA. At leaf
are considered to have relatively less effect in our study case.
node 14, if V BRA is higher than 6.67 m/s, the near-crashes are of
moderate risk (rule 14). For node 13, CART continues to split based
on the driver age into leaf nodes 15 and 16. At leaf node 15, if the 4.3.1. Velocity when braking
driver age is less than 30 years, the driving-risk level is moderate As shown in Table 9, V BRA is the most important variable affect-
with a probability of 57.1% (rule 15). For the rest of the driver char- ing driving-risk level, which apparently does not agree with the
acteristics, leaf node 16 predicts the driving-risk level involved in results of previous traffic accident severity analyses. For example,
near-crashes as low-risk with a probability of 67.3%. From this split- lighting condition was considered to have the most important effect
ting process, the driving-risk level in near-crashes can be predicted on the traffic accident severity (de Oña et al., 2013) and similar
by proceeding down the CART branches until a leaf node is reached. results were reported in Abdel-Aty (2003).
62 J. Wang et al. / Accident Analysis and Prevention 84 (2015) 54–64
Intuitively, the higher the vehicle speed, the higher is the kinetic accurate speed information about traffic accidents (please see the
energy of the lone-driver-vehicle system. If a potential threat is database used in Chang and Chen, 2005; Chang and Wang, 2006; Li
present or a sudden change in the object status occurs in the driv- et al., 2008; Harb et al., 2009; Montella et al., 2012; Abellán et al.,
ing environment, the lone-driver-vehicle system becomes more 2013).
unstable and risky, meaning that the driving-risk level involved in On the other side, many studies have indicated that driving
a near-crash case increases as the vehicle velocity increases. One speed is an important factor for road safety (Elvik et al., 2004;
direct explanation for these phenomena, in which vehicle velocity Wallén and Åberg, 2008). Elvik et al. (2004) pointed out that speed
is usually not among the main factors that affect traffic accident not only affects the severity of a crash but is also related to the risk
severity, is that most traffic accident databases do not contain of being involved in a crash. From this perspective, our finding that
Table 8
Prediction result of CART model.
Low-risk group 319 371 253 (79.3%) 155 167 113 (72.9%)
Moderate-risk group 254 219 138 (54.3%) 113 107 58 (51.3%)
High-risk group 55 38 24 (43.6%) 16 10 6 (37.5%)
The overall prediction accuracy is 66.1% for the learning data and 62.3% for the testing data.
J. Wang et al. / Accident Analysis and Prevention 84 (2015) 54–64 63
Table 9 The main objectives of this study are as follows: (1) cluster
Importance of the predictor variable with CART (VIM).
driving-risk level and (2) unveil the factors that influence the
Variables Normalized importance driving-risk level. Toward the first objective, we proposed a novel
V BRA 100.0% method to quantify the driving-risk levels in near-crash cases by
T FAC 96.7% clustering the braking process characteristics, namely, (1) max-
O TYP 82.9% imum deceleration, (2) average deceleration, and (3) percentage
P CRA 75.9% reduction in vehicle kinetic energy. K-means cluster analysis was
AGE 11.7%
applied to classify the near-crashes based on driving-risk level.
V MAN 9.0%
S TASK 5.9% Toward the second objective, CART is employed for unveiling the
B OVE 5.8% relationship among driving risk, driver/vehicle characteristics, and
P VEH 4.6% road environment using the near-crash database. CART provides
N LOC 2.7%
an alternative and appropriate approach for analyzing driving-risk
WEA 2.3%
V HON 2.1% levels in near-crashes owing to its ability to identify hidden pat-
B VEH 1.1% terns in the data without pre-establishing a functional relationship
R CON 0.5% among the variables.
GEN 0.1% Nine useful decision rules were obtained from the CART struc-
V STA 0.1%
ture (Table 7). The overall model prediction accuracy for the
learning data was approximately 66% and that for the testing data
V BRA is the most important variable affecting driving-risk level was approximately 62% (Table 8). These values are within the rea-
agrees with those of previous studies in road safety research. sonable range compared with other studies on traffic accident
severity. Furthermore, four variables, namely, (1) velocity when
braking (V BRA), (2) triggering factors (T FAC), (3) potential object
4.3.2. Triggering factors
type (O TYP) and (4) potential crash type (P CRA), from CART were
Triggering factor (T FAC) is the second most important variable
found to have the largest influences on the driving-risk level, which,
with a normalized importance of 96.7% in the CART model (Table 9).
to some extent, is in accordance with the results of some previ-
Table 6 shows that traffic light in the fifth predictor variable T FAC
ous studies. These results validate the method proposed in this
has a significant effect on the driving-risk level in near-crashes
paper. It should be noted that there are some limitations of our cur-
because a relatively high proportion of near-crashes caused by sud-
rent naturalistic driving experiment. First, it was only conducted
den changes in the traffic lights occurs in the moderate- or high-risk
in one city, i.e., Beijing, and we carefully designed the experi-
groups (55.3% and 35.0%, respectively). Rules 3 and 4 in Table 7 also
ment to include all the types of roads. Because of the actual road
support this finding. This result agrees with those of previous stud-
conditions in Beijing, however, there are few curves in our exper-
ies on vehicle crashes resulting from dilemma zones at signalized
imental routes. Hence, the influence of curve alignment could not
intersections (Rakha et al., 2007; Aoude et al., 2012).
be quantified in our current database. A few previous studies, for
example, Montella et al. (2012) and Montella and Liana (2015), have
4.3.3. Potential crash object and crash type pointed out that the curve alignment in road types was an impor-
Crash object type (O TYP) and potential crash type (P CRA) have tant factor affecting road safety. Second, the time-duration of the
82.9% and 75.9% normalized importance, respectively, in the CART current experiment was not very long (lasted for two months), and
model. Rules 9 and 10 in Table 7 demonstrate that at vehicle speeds the weather conditions were sunny or cloudy for the most part.
greater than 12.03 m/s, the near-crashes caused by cut-in conflict Intuitively, rainy weather would have a significant influence on
(P CRA is equal to 5, see definition in Table 4) are likely associated the traffic safety, as pointed out in previous studies, for instance,
with the moderate-risk level, whereas the near-crashes caused by Abellán et al. (2013). In our current database, the influence of
other factors are likely associated with the low-risk level group. weather conditions on the driving risk was not fully addressed.
Cut-in conflict usually occurs during lane change maneuvers. Lane Despite such limitations, however, it should be pointed out that
change is an important factor that affects driving-risk level, and a in this paper, the authors’ proposed a novel method to quantify
lane change would lead to a collision if the maneuver is not proper the driving risk in a near-crash event and to analyze the asso-
(Pande and Abdel-Aty, 2006). ciated risk-factors. The proposed method can be extrapolated to
specific studies on other datasets (i.e., other infrastructure, roads,
4.3.4. Other factors and countries).
Other factors such as driver age (AGE), vehicle maneuvers Future research will consider individual driving risk because
(V MAN), second task (S TASK), barriers for opposing vehicles driving risk substantially varies among drivers and identifying fac-
(B OVE), and vehicles parked along the roadside (P VEH) have rel- tors associated with individual driver risk will further facilitate the
atively small effects on the driving-risk level in our naturalistic identification of apt safety countermeasures (Guo and Fang, 2013).
driving experiment conducted on Chinese roads. It should be noted We would like to identify some factors such as age, gender, and
that this conclusion suits the driving environment in our natural- driver characteristics that affect an individual driver risk by using
istic driving experiment. As the driving context changes, factors the naturalistic driving data. Furthermore, O TYP and P CRA are
such as S TASK, B OVE, and P VEH may have significant influences also found to have important influences on the driving-risk level
on road safety. in near-crashes. One question worthy for further study is whether
the factors that affect the driving-risk level remain the same when
5. Conclusions sub-datasets related to vehicles or pedestrians are the focus.
No. 51475254) for its support. The authors would also like to thank Jovanis, P.P., Aguero-Valverde, J., Wu, K.F., Shankar, V., 2011. Analysis of naturalistic
those who participated in the driving experiments. driving event data. Transp. Res. Rec.: J. Transp. Res. Board 2236 (1), 49–57.
Kaufman, L., Rousseeuw, P.J., 2009. Finding Groups in Data: An Introduction to
Cluster Analysis, vol. 344. John Wiley & Sons.
References Li, X., Lord, D., Zhang, Y., Xie, Y., 2008. Predicting motor vehicle crashes using
support vector machine models. Accid. Anal. Prev. 40 (4), 1611–1618.
Abdel-Aty, M., 2003. Analysis of driver injury severity levels at multiple locations Lord, D., Mannering, F., 2010. The statistical analysis of crash-frequency data: a
using ordered probit models. J. Saf. Res. 34 (5), 597–603. review and assessment of methodological alternatives. Transp. Res. A: Policy
Abdelwahab, H.T., Abdel-Aty, M.A., 2001. Development of artificial neural network Pract. 44 (5), 291–305.
models to predict driver injury severity in traffic accidents at signalized Lu, G., Cheng, B., Lin, Q., Wang, Y., 2012. Quantitative indicator of homeostatic risk
intersections. Transp. Res. Rec.: J. Transp. Res. Board 1746 (1), perception in car following. Saf. Sci. 50 (9), 1898–1905.
6–13. Malta, L., Miyajima, C., Takeda, K., 2009. A study of driver behavior under potential
Abellán, J., López, G., de OñA, J., 2013. Analysis of traffic accident severity using threats in vehicle traffic. Intell. Transp. Syst. IEEE Trans. 10 (2), 201–210.
Decision Rules via Decision Trees. Expert Syst. Appl. 40 (15), 6047–6054. Montella, A.I., Liana, L., 2015. Safety performance functions incorporating design
Al-Ghamdi, A.S., 2002. Using logistic regression to estimate the influence of consistency variables. Accid. Prev. 74, 133–144.
accident factors on accident severity. Accid. Anal. Prev. 34 (6), 729–741. Montella, A., Aria, M., Dambrosio, A., Mauriello, F., 2011. Data-mining techniques
Aoude, G.S., Desaraju, V.R., Stephens, L.H., How, J.P., 2012. Driver behavior for exploratory analysis of pedestrian crashes. Transp. Res. Rec. 2237, 107–116.
classification at intersections and validation on large naturalistic data set. Montella, A., Aria, M., D’Ambrosio, A., Mauriello, F., 2012. Analysis of powered
Intell. Transp. Syst. IEEE Trans. 13 (2), 724–736. two-wheeler crashes in Italy by classification trees and rules discovery. Accid.
Bagdadi, O., 2013. Assessing safety critical braking events in naturalistic driving Anal. Prev. 49, 58–72.
studies. Transp. Res. F: Traffic Psychol. Behav. 16, 117–126. Montella, et al., 2013. Crash databases in Australasia, the European Union, and the
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A., 1984. Classification and United States: review and prospects for improvement. Transp. Res. Rec. 2386,
Regression Trees. CRC Press. 128–136.
Chang, L.Y., Chen, W.C., 2005. Data mining of tree-based models to analyze freeway Moreno, A.T., García, A., 2013. Use of speed profile as surrogate measure: effect of
accident frequency. J. Saf. Res. 36 (4), 365–375. traffic calming devices on crosstown road safety performance. Accid. Anal.
Chang, L.Y., Wang, H.W., 2006. Analysis of traffic injury severity: an application of Prev. 61, 23–32.
non-parametric classification tree techniques. Accid. Anal. Prev. 38 (5), Pande, A., Abdel-Aty, M., 2006. Assessment of freeway traffic parameters leading to
1019–1027. lane-change related collisions. Accid. Anal. Prev. 38 (5), 936–948.
Charlton, S.G., Starkey, N.J., Perrone, J.A., Isler, R.B., 2014. What’s the risk? A Rakha, H., El-Shawarby, I., Setti, J.R., 2007. Characterizing driver behavior on
comparison of actual and perceived driving risk. Transp. Res. F: Traffic Psychol. signalized intersection approaches at the onset of a yellow-phase trigger.
Behav. 25, 50–64. Intell. Transp. Syst. IEEE Trans. 8 (4), 630–640.
Chen, H., Cao, L., Logan, D.B., 2012. Analysis of risk factors affecting the severity of Sepulcre, M., Gozalvez, J., Hernandez, J., 2013. Cooperative vehicle-to-vehicle
intersection crashes by logistic regression. Traffic Inj. Prev. 13 (3), active safety testing under challenging conditions. Transp. Res. C: Emerg.
300–307. Technol. 26, 233–255.
de Oña, J., López, G., Abellán, J., 2013. Extracting decision rules from police accident Takeda, K., Hansen, J.H., Boyraz, P., Malta, L., Miyajima, C., Abut, H., 2011.
reports through decision trees. Accid. Anal. Prev. 50, 1151–1160. International large-scale vehicle corpora for research on driver behavior on the
Dingus, T.A., Klauer, S.G., Neale, V.L., Petersen, A., Lee, S.E., Sudweeks, J.D., Knipling, road. Intell. Transp. Syst. IEEE Trans. 12 (4), 1609–1623.
R.R., 2006. The 100-Car Naturalistic Driving Study Phase II-Results of the UMTRI and GMRDC, 2005. Automotive Collision Avoidance System Field
100-Car Field Experiment (No. HS-810 593). Operational Test Report: Methodology And Results. Final research report.
Donmez, B., Boyle, L.N., Lee, J.D., 2009. Differences in off-road glances: effects on NHTSA, US Department of Transportation.
young drivers’ performance. J. Transp. Eng. 136 (5), 403–409. Wallén, Warner H., Åberg, L., 2008. Drivers’ beliefs about exceeding the speed
DTM-China (Ministry of Public Security, Department of Traffic Management), limits. Transp. Res. F: Traffic Psychol. Behav. 11 (5), 376–389.
2010. Annual Report of Road Traffic Accidents Statistics in P.R. China. Scientific Wang, J.Q., Li, S.E., Zheng, Y., Lu, X.Y., 2015. Longitudinal collision mitigation via
Research Institute of Traffic Management, Ministry of Public Security, Beijing coordinated braking of multiple vehicles using model predictive control.
(in Chinese). Integr. Comput. Aided Eng. 22 (2), 171–185.
Elvik, R., Christensen, P., Amundsen, A., 2004. Speed and road accidents. An Wu, K.F., Jovanis, P.P., 2012. Crashes and crash-surrogate events: exploratory
evaluation of the Power Model. TØI Rep. 740, 2004. modeling with naturalistic driving data. Accid. Anal. Prev. 45, 507–516.
Guo, F., Fang, Y., 2013. Individual driver risk assessment using naturalistic driving Wu, K.F., Jovanis, P.P., 2013. Screening naturalistic driving study data for
data. Accid. Anal. Prev. 61, 3–9. safety-critical events. Transp. Res. Rec.: J. Transp. Res. Board 2386 (1), 137–146.
Guo, F., Klauer, S.G., Hankey, J.M., Dingus, T.A., 2010. Near crashes as crash Wu, K.F., Aguero-Valverde, J., Jovanis, P.P., 2014. Using naturalistic driving data to
surrogate for naturalistic driving studies. Transp. Res. Rec.: J. Transp. Res. explore the association between traffic safety-related events and crash risk at
Board 2147 (1), 66–74. driver level. Accid. Anal. Prev. 47, 210–218.
Harb, R., Yan, X., Radwan, E., Su, X., 2009. Exploring precrash maneuvers using Young, W., Sobhani, A., Lenné, M.G., Sarvi, M., 2014. Simulation of safety: a review
classification trees and random forests. Accid. Anal. Prev. 41 (1), 98–107. of the state of the art in road safety simulation modelling. Accid. Anal. Prev. 66,
Jarašūniene, A., Jakubauskas, G., 2007. Improvement of road safety using passive 89–103.
and active intelligent vehicle safety systems. Transport 22 (4), 284–289. Zheng, Y., Li, S., Wang, J., Wang, L., Li, K., 2014. Influence of information flow
Jonasson, J.K., Rootzén, H., 2014. Internal validation of near-crashes in naturalistic topology on closed-loop stability of vehicle platoon with rigid formation. In:
driving studies: a continuous and multivariate approach. Accid. Anal. Prev. 62, 17th Intelligent Transportation System Conference, IEEE, pp. 2094–2100.
102–109.