0% found this document useful (0 votes)

15 views

2 PDF

Uploaded by

kasinayana baweeti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

2 PDF

Uploaded by

kasinayana baweeti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/281363833

Driving risk assessment using near-crash database through data mining of

tree-based model

Article in Accident Analysis & Prevention · August 2015

DOI: 10.1016/j.aap.2015.07.007 · Source: PubMed

CITATIONS READS

121 1,024

6 authors, including:

Jianqiang Wang Yang Zheng

Tsinghua University University of California, San Diego
357 PUBLICATIONS 10,810 CITATIONS 143 PUBLICATIONS 5,164 CITATIONS

SEE PROFILE SEE PROFILE

Chenfei Yu Kenji Kodaka

Tsinghua University 7 PUBLICATIONS 254 CITATIONS
6 PUBLICATIONS 270 CITATIONS
SEE PROFILE
SEE PROFILE

All content following this page was uploaded by Yang Zheng on 17 July 2020.

The user has requested enhancement of the downloaded file.

Accident Analysis and Prevention 84 (2015) 54–64

Contents lists available at ScienceDirect

Accident Analysis and Prevention

journal homepage: www.elsevier.com/locate/aap

Driving risk assessment using near-crash database through data

mining of tree-based model
Jianqiang Wang a , Yang Zheng a , Xiaofei Li a , Chenfei Yu a , Kenji Kodaka b , Keqiang Li a,∗
a
State Key Laboratory of Automotive Safety and Energy, Tsinghua University, Beijing 10084, China
b
Honda R&D Co. Ltd., Automobile R&D Center, Tochigi 321-3393, Japan

a r t i c l e i n f o a b s t r a c t

Article history: This paper considers a comprehensive naturalistic driving experiment to collect driving data under poten-
Received 5 November 2014 tial threats on actual Chinese roads. Using acquired real-world naturalistic driving data, a near-crash
Received in revised form 11 May 2015 database is built, which contains vehicle status, potential crash objects, driving environment and road
Accepted 3 July 2015
types, weather condition, and driver information and actions. The aims of this study are summarized
into two aspects: (1) to cluster different driving-risk levels involved in near-crashes, and (2) to unveil
Keywords:
the factors that greatly influence the driving-risk level. A novel method to quantify the driving-risk level
Naturalistic driving study
of a near-crash scenario is proposed by clustering the braking process characteristics, namely maximum
Driving risk
Near-crash
deceleration, average deceleration, and percentage reduction in vehicle kinetic energy. A classification
Classification and regression tree (CART) and regression tree (CART) is employed to unveil the relationship among driving risk, driver/vehicle char-
K-mean cluster acteristics, and road environment. The results indicate that the velocity when braking, triggering factors,
potential object type, and potential crash type exerted the greatest influence on the driving-risk levels
in near-crashes.
© 2015 Elsevier Ltd. All rights reserved.

1. Introduction As the responsibility for trafﬁc accidents involves the vehicles,

drivers, and roadways, we must not only improve the safety per-
1.1. Background formance of vehicles, but also better understand the factors that
influence driving risk and identify the factors that result in acci-
In the past two decades, significant progress has been made in all dents to make road transportation much safer. Many studies have
aspects of vehicle safety systems, and experts from both academia attempted to better understand the factors that affect the proba-
and industry have conducted extensive research on vehicle safety bility and injury severity of crashes (Lord and Mannering, 2010).
(Young et al., 2014; Sepulcre et al., 2013; Takeda et al., 2011; Zheng From a methodological standpoint, logit-based models are some of
et al., 2014). Efforts that aim to advance vehicle safety systems can the most practical tools used for analyzing accident severity (Chen
mainly be divided into two areas (Jarašūniene and Jakubauskas, et al., 2012; Al-Ghamdi, 2002). Recently, non-parametric methods
2007): (1) active safety, which aims to avoid accidents and (2) pas- and data-mining techniques have been widely used to identify the
sive safety, which helps reduce injuries in an accident. The active factors associated with accident severity (Chang and Chen, 2005;
safety approach forecasts future driving states based on vehicle Chang and Wang, 2006; Montella et al., 2011, 2012; Li et al., 2008;
dynamics, infrastructure, and driver awareness (Wang et al., 2015), Harb et al., 2009). For example, Chang and Chen (2005) and Chang
whereas the passive safety approach mainly focuses on enhancing and Wang (2006) proposed a classification and regression tree
vehicular safety systems such as seat belts, airbags and strong body (CART) model to establish the relationship among injury sever-
structures (Jarašūniene and Jakubauskas, 2007). Although many ity, driver/vehicle characteristics, and accident variables, indicating
encouraging achievements have been made, the number of road that vehicle type is a very important variable associated with cash
fatalities still remains unacceptably high, and traffic accidents are severity. Li et al. (2008) evaluated the application of a support vec-
considered a major public health problem (DTM-China, 2010). tor machine (SVM) model for predicting motor vehicle crashes, and
showed that SVM models performed better than traditional nega-
tive binomial models. Montella et al. (2012) employed a decision
tree and association rules to analyze accidents involving powered
∗ Corresponding author. two-wheelers, and demonstrated that the curve alignment, rural
E-mail address: [email protected] (K. Li). areas, run-off-the-road crashes, night time, and rainy weather were

http://dx.doi.org/10.1016/j.aap.2015.07.007
0001-4575/© 2015 Elsevier Ltd. All rights reserved.
J. Wang et al. / Accident Analysis and Prevention 84 (2015) 54–64 55

significantly associated with accident severity. These studies pro- occur. In the experiments, near-crash events in naturalistic driv-
vided some insights into the factors that affect the likelihood of ing were identified by detecting unusual vehicle kinematics using
a vehicle accident. However, they were typically based on official accelerometers and gyroscopic sensors installed in the experimen-
traffic accident statistics, which have two major limitations: (1) tal vehicle (Wu and Jovanis, 2013; Wu et al., 2014).
lack of detailed driving data, and (2) difficult to collect and acquire Recently, a few studies have focused on the assessment of risk
(usually collected by traffic police agencies). Hence, the aforemen- in the driving environment, for example, individual driving risk
tioned studies usually do not consider the relationship between (Guo and Fang, 2013) and momentary risk perception of a driv-
the accident severity and detailed driving data (e.g., vehicle speed, ing situation (Lu et al., 2012; Charlton et al., 2014). These studies
acceleration, braking, and steering information). employed indicators such as driver attributes and vehicle kinetic
Recent developments in vehicle instrumentation techniques parameters to represent the risk level. Besides, critical braking
have made monitoring naturalistic driving behavior and obtaining and speed profiles were proposed to characterize the near-crashes
detailed driving data both technologically possible and economi- in Moreno and García (2013) and Bagdadi (2013). In the present
cally feasible. For instance, NHTSA sponsored the project “100-Car paper, we propose a novel method to quantify the driving-risk
Naturalistic Driving Study” which is a large-scale instrument- involved in a near-crash event. First, the driving-risk level is repre-
vehicle study to collect naturalistic driving data in the United States sented by the braking process characteristics, namely (1) maximum
(Dingus et al., 2006). A series of technology tests of safety equip- deceleration, (2) average deceleration, and (3) percentage reduc-
ment was conducted in Michigan using the naturalistic driving tion in vehicle kinetic energy. Then, the K-means cluster method is
technique (UMTRI and GMRDC, 2005). Takeda et al. (2011) reported employed to classify near-crashes into different-risk levels based
a comprehensive project involving collecting large amounts of on the three aforementioned braking process features. Then, CART
driving data on the actual road to study driver behavior and is employed for exploring the relationship among driving risk,
accident-causation-mechanism. With access to naturalistic driving driver/vehicle characteristics, and road environments. Identifying
data, traffic safety-related events could be observed and measured the factors associated with driving risk and further predicting high-
more precisely (Wu et al., 2014). Meanwhile, many researchers risk driving scenarios will enable the adoption of proper safety
have proposed new methods and gained new insights into traffic countermeasures to reduce probable hazardous situations for high-
safety (e.g., Malta et al., 2009; Aoude et al., 2012; Guo et al., 2010; risk groups, and thus improve overall driving comfort and safety. By
Jovanis et al., 2011; Jonasson and Rootzén, 2014). For instance, analyzing driver characteristics, road conditions, and vehicle char-
Malta et al. (2009) proposed a method to improve the under- acteristics using the near-crash database, we obtained new insights
standing of driver behavior under potential threats using a large into driving risk. The results indicate that the velocity when braking
real-world driving database. Guo et al. (2010) assessed the fac- (V BRA), triggering factors (T FAC), potential object type (O TYP),
tors associated with individual driver risk using naturalistic driving and potential crash type (P CRA) had the greatest influence on
data. For naturalistic driving data, crash surrogates have received the driving-risk level involved in near-crashes. These results can
extensive research attention (see Guo et al., 2010; Wu and Jovanis, improve our understanding of the factors that affect driving risk,
2012, 2013; Moreno and García, 2013, for examples), because and help create polices and countermeasures to improve driving
the number of crashes observed with naturalistic driving is typi- safety and comfort.
cally small. Near-crash is frequently used as a surrogate measure The remainder of this paper is organized as follows: Section 2
for assessing the safety impact. For instance, Guo et al. (2010) describes the near-crash database and presents some preparations,
employed two metrics, namely, precision and bias of risk estima- including experiment design, labeling protocol and driving-risk
tion, to assess near-crashes, and indicated that using near-crashes definition. The methodology employed in this study is presented
as a crash surrogate could provide definite benefit when data about in Section 3. Section 4 discusses the results, and some concluding
a sufficient number of crashes are not available. Recently, Wu remarks are given in Section 5.
and Jovanis (2013) proposed a multi-stage modeling framework
to search through naturalistic driving data and extract near-crash
2. Database and preparation
events. All of these studies have demonstrated that naturalistic
driving data could provide more controllable laboratory data as a
To build a firm foundation for the assessment of driving risk and
useful supplement for traffic safety studies, and has the potential
enhancing driving safety, two components are essential: (1) real-
to further our understanding of crash causality, as well as improve
driving data and (2) careful experimental design. Data collection is
road safety. Naturalistic driving data could not only provide more
performed using naturalistic and low-intervention methods under
detailed driving exposure data, but also present the probability
actual traffic conditions. This section introduces the experimen-
to identify more plausibly risky driving events and the associated
tal equipment and experiment design, describes the near-crash
factors.
database, and presents the definition and cluster analysis of driving
risk.
1.2. Preview of the key results

This study focuses on the analysis of factors that inﬂuence driv- 2.1. Data-collection equipment and experiment design
ing risk using a naturalistic driving database. This database was
obtained through designing a novel transcription protocol to code 2.1.1. Data-collection equipment
naturalistic driving data, which have two distinguishing features: The naturalistic driving experiments were conducted using
(1) drivers drive in their normal states and (2) the instruments a Honda Crosstour, which was provided by Honda. The vehi-
installed in vehicles can record drivers and road environments cle was equipped with instruments to collect driver, vehicular,
continuously during driving (Jovanis et al., 2011). The naturalistic and road data under real-world conditions. The data-collection
database used herein contains only near-crash events because no system installed in the experimental vehicle included two driv-
actual crashes happened during the naturalistic experiments con- ing recorders (DR) and four cameras (Fig. 1). The four cameras
ducted on actual Chinese roads. Near-crashes refer to cases where were used to record detailed video scenes including (1) forward
drivers execute rapid evasive maneuvers (i.e., emergency braking view, (2) right-side forward view, (3) left-side forward view, and
and/or steering operation) when facing a potential driving risk or a (4) driver’s facial expression. One DR recorded data obtained by
potential threat; in the absence of such an action, a real crash may sensors, including GPS, brake signal, steering signal, three-axis
56 J. Wang et al. / Accident Analysis and Prevention 84 (2015) 54–64

Table 1
Schedule of entire experiment.

Time period Morning Afternoon Night

Hours 140 220 50

Table 2
Road types in experiment.

Road type 1 2 3 4

Kilometers 1800 1210 4100 1650

1, highway; 2, city ring road; 3, inner-city road; 4, rural road.

2.1.2. Experiment design

The naturalistic driving route contained all road types, i.e., inter-
Fig. 1. Experimental vehicle and equipments. city highways (all structured road and usually low traffic volume),
city ring road (mostly structured road and may have congestion),
inner-city road (mixed traffic conditions and may be crowded with
bicycles and motorcycles), and rural road (poor road structure and
acceleration information, and detailed video collected by the facial- may be crowded with pedestrians). A total of 31 drivers, who signed
expression and forward-view cameras. The other DR recorded the the informed consent form, participated in the naturalistic driving
video collected by both the left-side and the right-side forward- experiments in their normal driving states. The experiment lasted
view cameras to ensure convenient coding of the incidents. 60 days, 6–7 h/day, resulting in naturalistic driving time and nat-
In the present study, we focused on analyzing the driving risk of uralistic driving range of approximately 400 h and over 8500 km,
near-crash scenarios in the naturalistic driving experiments. Near- respectively. The schedule of the entire experiment plan is sum-
crash implies that the driver performs a rapid evasive maneuver marized in Table 1. Table 2 lists the naturalistic driving distance on
(i.e., emergency braking and/or steering operation), failing which the different road types considered herein.
a real crash may occur. In the experiments, near-crash events in Among the 31 drivers, 9 were female and 22 were male; all had
naturalistic driving were identified by detecting unusual vehicle regular driving licenses. The participants’ average age was 43 years
kinematics. For the experimental data collection under Chinese (ages ranging from 25 to 67 years) and they possessed a driving
traffic environment, when the acceleration of the vehicle reached license for a mean period of 16 years (ranging from 3 to 48 years).
a threshold value (longitudinal: −1.5 m/s2 , lateral: −1 m/s2 ), the
data-collection system recorded the vehicle state (i.e., speed, brake 2.2. Labeling of near-crash database
signal, steering signal, and three-axis acceleration), video sequence
of the driver’s expression, and video sequence of the events hap- Altogether, 912 near-crash events were recorded throughout
pening at the time. The recording time started approximately 10 s the aforementioned 60-day naturalistic driving experiment. The
before the trigger point and lasted until 5 s after the trigger point. distribution of these near-crashes by road type is summarized
This means that each typical near-crash case has an approximately in Table 3. Deciding the protocol for labeling the multi-modal
15 s signal and video sequence. Fig. 2 shows a typical example of information is critical for properly associating near-crash driving
recorded driving signals. Note that the recorded video data should situations with recorded driving state signals and videos. Follow-
necessarily be reviewed manually to decide whether an event trig- ing previous studies (Wu et al., 2014; Montella et al., 2012; Takeda
gered by kinematic thresholds is actually safety-critical. If not, then et al., 2011) and considering actual traffic situations on Chinese
such an event should be not defined as a near-crash and deleted roads, a novel data-transcription protocol that considers a compre-
from the dataset. For our experimental data, the recorded cases hensive cross section of the factors that could affect the drivers and
were checked mutually by Tsinghua and Honda. their responses is proposed in this paper. The proposed protocol
comprises the following five major categories:

1) Vehicle status.
2) Potential crash objects.
9
3) Driving environment and road types.
braking signal
Longitudinal Acc eleration
4) Weather condition.
6 5) Driver information and driver actions.
Lateral Acc eleration
Ac c eleration(m/s 2)

3 The designed transcription protocol is comprehensive and con-

tains important attributes that describe the conditions contributing
to driving risk, providing potential for analyzing the relation-
0 ship among driving risk, driver/vehicle characteristics, and road

-3
Table 3
Near-crashes on different road types.
-6
-10 -5 0 5 10 Road type 1 2 3 4
Time(s) Number 39 246 489 138

Fig. 2. Example of recorded driving signals for typical near-crash case. 1, highway; 2, city ring road; 3, inner-city road; 4, rural road.
J. Wang et al. / Accident Analysis and Prevention 84 (2015) 54–64 57

Table 4
Deﬁnition of transcription protocol.

Variable Code Type Description

Vehicle status
Velocity when braking V BRA Continuous Vehicle speed when driver triggers braking signal or turn point of the
acceleration signal (m/s)
Maximum deceleration D MAX Continuous Maximum deceleration during emergency braking (m/s2 )
Time interval of braking T IN Continuous Time interval between braking signal trigger and time point of
maximum deceleration (s)
Velocity reduction V RED Continuous Vehicle-speed reduction from braking signal trigger to time point of
maximum deceleration (m/s)
Vehicle status before braking V STA Qualitative 1: Deceleration, 2: acceleration, 3: constant speed
Vehicle maneuver V MAN Qualitative 1: Straight motion, 2: right turn, 3: left turn, 4: lane change, 5: others
Potential crash object
Crash object type O TYP Qualitative 1: Vehicle, 2: single-track vehicle (motorcycle and bicycle), 3:
pedestrian, 4: others (e.g., barrier block)
Potential crash type P CRA Qualitative 1: Rear end, 2: conflict in intersection, 3: pedestrian conflict, 4:
opposite driving conflict, 5: cut-in conflict, 6: others
Triggering factors T FAC Qualitative 0: Sudden change of object status, 1: traffic light, 2: lane reduction, 3:
lane change, 4: active braking, 5: others
Driving environment and road type
Near-crash location N LOC Qualitative 1: Intersection, 2: non-intersection
Road Condition R CON Qualitative 1: Structure road, 2: normal road, 3: hybrid road, 4: rural road
Parking vehicle along road P VEH Qualitative 0: No, 1: yes
side
Safety barriers for opposing B OVE Qualitative 0: No, 1: yes
vehicles
Safety barriers for vehicles B VEH Qualitative 0: No, 1: yes
and pedestrians
Weather condition
Weather WEA Qualitative 1: Sunny, 2:cloudy, 3: others
Light condition L CON Qualitative 1: Daylight, 2: dusk
Driver information and actions
Gender GEN Qualitative 1: Male, 2: female
Age AGE Continuous Driver age (years). Further categorized into five groups, 1: 0–30, 2:
31–40, 3: 41–50, 4: 51–60, 5: >60
Time span with driving T DIR Continuous Time period of possessing valid driving license (years)
license
Steering light S LIG Qualitative 0: No, 1: yes
Vehicle horns V HON Qualitative 0: No, 1: yes
Second Task S TASK Qualitative 0: No, 1: talking, 2: others

environment. Graduate students with driving license served as characteristics. Intuitively, the driving risk is higher if the braking
volunteer taggers to manually label the recorded 912 near-crashes maneuver is performed with greater urgency in a near-crash. By
according to the designed transcription protocol. Finally, we devel- clustering braking process characteristics, this paper proposes a
oped the near-crash database. The transcription protocol is defined novel method to quantify the driving risk involved in a near-crash
in Table 4. It should be noted that the specific definitions of each event. Fig. 3 shows the key points for defining a typical deceleration
item in Table 4 are based on the actual characteristics of near- curve during braking. The following three features are adopted to
crash events and may have differences with the protocols for coding represent the driving-risk level of a typical near-crash case:
standard crash events, for example, that in Montella et al. (2013).
1) Maximum deceleration during braking process amin .
2.3. Definition and cluster of driving risks 2) Average deceleration aaverage from the braking trigger point t0 to
the point of maximum deceleration t1 .
The primary risk measure in vehicle safety evaluation is crash
occurrence. Many studies have been conducted to identify the fac-
tors that significantly influence the injury severity of crashes using
6
the logit-based model and some related data-mining techniques braking signal
such as decision tree and SVM. However, research on naturalistic Longitudinal Acc eleration
driving risk in the traffic and human-factor field has been limited. 3
Ac c eleration(m/s 2)

In the present paper, driving risk is deﬁned as a potential threat

that may cause vehicle crashes or other accidents. Usually, the
consequence of driving risk for a driver in his/her normal state 0
is mainly reﬂected by rapid evasive maneuvers (i.e., emergency
braking and/or steering operation), which are employed by many
studies on naturalistic driving to identify near-crashes, for exam- -3
ple, Guo et al., 2010; Wu and Jovanis, 2013; Wu et al., 2014; Moreno t0 amin
and García (2013). In the naturalistic driving experiments con- t1
ducted on Chinese roads, we found that nearly all the near-crashes -6
-10 -5 0 5 10
had large longitudinal deceleration, implying the drivers tended to Time(s)
adopt the rapid baking maneuver to a avoid potential crash. Hence,
the driving-risk level was represented by the braking process Fig. 3. Key features of driving-risk level.
58 J. Wang et al. / Accident Analysis and Prevention 84 (2015) 54–64

3) Percentage reduction in vehicle kinetic energy E from t0 to t1 .

The average deceleration aaverage is calculated as follows:

t1
1
1 1
aaverage = a(t)dt = [v(t1 ) − v(t0 )] , (1)
t1 − t0 t0
t1 − t0 0.8

where v(t) and a(t) denote the vehicle’s velocity and acceleration, 0.6
High risk
respectively.

E
η
0.4 Moderate risk
The percentage reduction in vehicle kinetic energy E is calcu-
lated as follows: Low risk

(1/2)mv2 (t0 ) − (1/2)mv2 (t1 ) v(t1 )

2 0.2
E = =1− , (2)
(1/2)mv2 (t0 ) v(t0 ) 0
8
where m denotes the vehicle mass. 6 8
Hence, the main criterion in evaluating the driving-risk level, 6
4
namely, the braking process features, is obtained for each near- 4
2 2
crash incident.
T - aaverage (m/s 2) 0 0
- amin (m/s 2)
X = amin , aaverage , E . (3)

Cluster analysis is a valid approach for classifying driving risks

Fig. 4. Cluster result of driving risk.
involved in different near-crashes into different risk levels and
has been used for assessing individual driver risk (Guo and Fang,
2013; Donmez et al., 2009). In this study, the K-means cluster 3.1. Decision trees
method, which is used widely for cluster analysis in data mining,
is employed to classify the driving risks involved in near-crashes Decision tree (DT) models are nonlinear and non-parametric
into different risk groups based on the proposed feature X. Using data-mining tools, which can be used for supervised classification
a pre-determined number of clusters, the K-means cluster method and regression problems. DTs are usually presented graphically as
partitions the observations into k clusters, where each observation hierarchical structures, making them easy to understand. The main
belongs to a cluster whose mean is closest to its value (Kaufman and idea is to generate a DT using the known independent and tar-
Rousseeuw, 2009). The K-means method minimizes the within- get variables of a training dataset and then use the generated DT
cluster sum of squares: to predict the target variable of a new dataset. The DT structure
can provide some insights into the relationship between the inde-

k
pendent and target variables. Depending on the target variable, a
argmin ||Xj − i 2 ||, (4) classification tree (the target variable is discrete) or a regression
S
i=1 Xj ∈Si tree (the target variable is continuous) is generated. This paper aims
to model the driving risk involved in a near-crash event into dis-
where X = [X1 , X3 , . . ., Xn ] is the set of observed data, which rep-
crete levels (low, moderate, and high), as discussed in Section 2.
resents the feature Xi = [amin , aaverage , E ]Ti in the context of this
Hence, a classification tree is developed.
paper; S = [S1 , S3 , . . ., Sn ] represents the set of k clusters and i
denotes the mean point of cluster set Si .
3.1.1. Decision tree structure
The driving-risk level in each near-crash case is placed in one of
The main components of a DT include decision nodes, branches,
the following three groups: (1) low-risk group, (2) moderate-risk
and leaf nodes. Within a DT structure, each decision node repre-
group, and (3) high-risk group. Near-crashes in the cluster with the
sents a feature variable, and each branch stands for one of the states
highest maximum deceleration are placed in the high driving-risk
of this feature variable, which are based on the decision rules. The
group. The output of cluster analysis is shown in Fig. 4. Table 5
leaf node specifies the expected value of the target variable.
summarizes the statistical characteristics of the three driving-risk
DTs are built recursively by partitioning a full dataset (noted by
groups. The distribution of near-crashes belonging to the differ-
the root node) into a few small subsets using split criteria. The split
ent risk groups follows a pyramid structure, which means that the
criteria usually maximize the “purity” of the node dataset. Each sub-
high-risk group has the minimum events, whereas the low-risk
set is split until a pure state in the subset is reached such that its
group has the maximum number of events. We can observe that
“purity” cannot be improved or the “purity” has reached a desired
the maximum deceleration of the high-risk group is more than two
value. Pure subsets have no branches and no successor nodes. Thus,
times that of the low-risk group and the maximum deceleration of
these subsets are called terminal or leaf nodes. When a new case
the moderate-risk group is much higher than that of the low-risk
or instance occurs, we can make a decision or prediction about the
group, which make the cluster result reasonable.
state of the case using its features and the tree structure. This proce-
dure explains that the DT can be used to classify new cases, and the
3. Methodology
model structure can help us better understand the pattern behind
the raw data.
The aims of this study are as follows: (1) cluster near-crash
cases by driving-risk level, and (2) assess the factors that influence
the driving-risk level. Toward the first objective, feature extraction 3.1.2. Gini index and pruning
and K-means analysis were discussed in Section 2. For the second Different splitting indexes are available to show the main dif-
objective, classification and regression tree (CART) is employed to ferences among DT-building procedures. One of the most famous
explore the relationship among driving risk, driver/vehicle charac- splitting indexes is the Gini index, which is adopted in CART system.
teristics, and road environment by using the obtained naturalistic Given node dataset Y, the Gini index is calculated as follows:
driving database in Section 2. The details of the decision tree model
2
Gini(Y ) = 1 − [p(Y = i)] , (5)
and analysis techniques are discussed in this section. i
J. Wang et al. / Accident Analysis and Prevention 84 (2015) 54–64 59

Table 5
Characteristics of driving-risk groups.

Risk groups Number of near-crash cases Percentage Mean of braking process features

amin (m/s2 ) aaverage (m/s2 ) E

Low-risk 474 52.0% −1.931 −1.027 30.9%

Moderate-risk 367 40.2% −3.278 −1.717 56.6%
High-risk 71 7.8% −5.385 −3.125 66.1%

where p(Y = i) is the proportion of observations in node dataset Y 3.3. Variable importance
belonging to class i. If all observations in one node belong to one
class, the Gini index of that node is zero, which means that the node One of the outputs of the CART technique is variable importance,
is pure and has reached a homogenous state. which characterizes a variable’s ability to influence the model. The
The node-splitting criterion based on the Gini index aims to relative importance of variable xj is calculated as follows:
obtain the maximum decrease in the impurity of node dataset Y
by finding the best partition x* of observations, and then partition
T
nt
Vim(xj ) = Gini(Yt , xj ), (7)
node dataset Y into two child node subsets Yl and Yr , as follows: N
t=1
maxGini(Y, x)
x∈X (6) where Vim(xj ) denotes the relative importance of variable xj ;
Gini(Y, x) = Gini(Y ) − p(Yl )Gini(Yl ) − p(Yr )Gini(Yr ) Gini(Yt , xj ) is the reduction in the Gini index obtained by split-
ting variable xj at node t, according to (6); nt is the total number of
where Gini(Y, x) represents decrease in impurity, x ∈ X denotes the observations in dataset Yt belonging to node t; N is the total number
set of splits generated by all features, Yl and Yr are, respectively, the of observations; and T is the number of nodes in CART. The variable
left and right child nodes of node dataset Y; and p(Yl ) and p(Yr ) are with the largest number according to (7) is regarded as the most
the proportions of observations in node dataset Y belonging to the important variable with respect to the others.
left and right child nodes, respectively.
Tree growing is arrested based on two criteria: (1) minimum 4. Results and discussion
decrease of impurity equals 0.001; and (2) maximum number of
tree levels equals six. CART searches for the best split that maxi- 4.1. Data distribution of driving-risk level
mizes (6). From this procedure, CART can be created recursively,
which usually leads to saturation and overfitting of the training Nineteen predictor variables and one target variable (the
dataset. Saturated trees do not perform well when applied to a driving-risk level) are used in the CART model to identify the impor-
new case, which means that the tree structure overfits the infor- tant pattern that reflects the relationship among driving-risk level,
mation contained in the training data, including the useless noise driver/vehicle characteristics, and road environment. As can be
information, and it cannot reveal the real pattern behind the data. inferred from Table 4, these 19 predictor variables include vehicle
Hence, the data are usually divided into two subsets: (1) learning status (e.g., vehicle maneuver), potential crash object (e.g., crash-
(or training) set and (2) testing (or validation) set. The training set object type and triggering factors), driving environment and road
is used to construct the tree, and the testing set is used to validate types (e.g., near-crash locations), weather condition (e.g., weather
the tree performance. The saturated tree should be pruned accord- and light condition), and driver information and driver actions (e.g.,
ing to the cost-complexity algorithm that achieves a compromise driver gender and age).
between predictive accuracy and tree complexity. The main idea is Table 6 lists the information on driving-risk level in terms of
to remove the branches and merge the nodes that contribute little the predictor variables, which indicates that traffic light in the
to the predictive value of a tree. A more detailed description of the fifth predictor variable T FAC is an important factor affecting the
CART analysis and related applications can be found in Breiman driving-risk level because a relatively high proportion of near-
et al. (1984). Analyses were performed using the SPSS software crashes caused by sudden changes in traffic light status occurs in
application. the moderate- and high-risk groups (55.3% and 35.0%, respectively).
From the sixth predictor variable N LOC, we find similar statisti-
3.2. Rule extraction cal results, where the proportions of near-crashes at intersections
are relatively higher in the moderate- and high-risk groups (44.5%
The CART structure can be transformed into decision rules of and 12.6%, respectively) than those away from intersection (38.0%
the ‘IF–THEN’ type to extract potentially useful information, which and 5.2%, respectively). Other meaningful findings listed in Table 6
can be understood easily and intuitively by engineers and poli- include finding that as the braking speed increases, the proportions
cymakers. Many researchers using DTs to analyze traffic accident of near-crash cases in the moderate- and high- risk groups increase.
severity have extracted useful rules for discovering behaviors that The proportions in the moderate- and high-risk groups are, respec-
occur within a specified dataset (please see Montella et al., 2011, tively, 46.4% and 13.9%, when the speed at the braking point ranges
2012; de Oña et al., 2013; Abellán et al., 2013 and the references from 10 to 20 m/s, whereas those when the speed at the braking
therein). point ranges from 0 to 10 m/s are, respectively, 34.7% and 2.8%, as
The decision rules extracted from CART take a logic condi- shown under the 19th predictor variable V BRA.
tional structure ‘X → C’, where X denotes a set of statues of several The aforementioned preliminary statistical results are con-
attribute variables and C is the only statue of target variable, which sistent with the analysis result obtained from CART, which is
is driving-risk level in our case. In CART, rules (IF–THEN structure) presented in the next section.
begin with the tree-root node, and each variable used in the split-
ting criterion for node partition generates the IF of the rules, which 4.2. CART analysis
ends in leaf nodes with a THEN status. The THEN status is the status
of leaf nodes that take the largest number of observations, which, For the CART model, the 912 near-crashes are randomly divided
in our case, is the driving-risk level. into two subsets – one for learning and the other for testing.
60 J. Wang et al. / Accident Analysis and Prevention 84 (2015) 54–64

Table 6
Distribution of driving-risk levels by predictor variable.

Num Variable Description Count Driving risk level Num Variable Description Count Driving risk level
code code
LR MR HR LR MR HR
52.0% 40.2% 7.8% 52.0% 40.2% 7.8%

1 V STA Deceleration 265 48.7% 43.8% 7.5% 9 B OVE No 372 58.9% 37.4% 3.8%

Acceleration 531 55.4% 37.5% 7.2% Yes 540 47.2% 42.2% 10.6%

Constant speed 116 44.0% 44.8% 11.2% 10 B VEH No 537 55.9% 37.8% 6.3%

2 V MAN Straight motion 778 51.0% 40.4% 8.6% Yes 375 46.4% 43.7% 9.9%

Right turn 38 65.8% 31.6% 2.6% 11 WEA Sunny 727 51.2% 41.3% 7.6%

Left turn 41 65.9% 34.1% 0.0% Cloudy 147 55.8% 34.7% 9.5%

Lane change 46 45.7% 50.0% 4.3% Others 38 52.6% 42.1% 5.3%

Others 9 44.4% 44.4% 11.1% 12 L CON Daylight 796 52.3% 40.2% 7.5%

3 O TYP Vehicle 596 55.0% 40.4% 4.5% Dusk 116 50.0% 40.5% 9.5%

Single-track 98 72.4% 21.4% 6.1% 13 GEN Male 661 51.0% 40.5% 8.5%
vehicle
Pedestrian 69 60.9% 37.7% 1.4% Female 251 54.6% 39.4% 6.0%

Others 149 22.1% 53.0% 24.8% 14 AGE 0–30 145 50.3% 41.4% 8.3%

4 P CRA Rear end 349 51.3% 45.0% 3.7% 31–40 291 54.0% 39.9% 6.2%

Conﬂict during 70 61.4% 32.9% 5.7% 41–50 232 48.7% 40.9% 10.3%
intersection
Pedestrian conﬂict 65 60.0% 36.9% 3.1% 51–60 202 56.9% 35.6% 7.4%

Opposite driving 46 67.4% 28.3% 4.3% >60 42 38.1% 57.1% 4.8%

conﬂict
Cut-in conﬂict 191 63.4% 30.4% 6.3% 15 T DIR 0–10 305 50.8% 40.0% 9.2%

Others 191 63.4% 30.4% 6.3% 11–20 380 56.1% 37.1% 6.8%

5 T FAC Sudden change of 723 57.7% 37.6% 4.7% 21–30 157 46.5% 44.6% 8.9%
object status
Trafﬁc light 103 9.7% 55.3% 35.0% >30 70 47.1% 48.6% 4.3%

Lane reduction 9 77.8% 22.2% 0.0% 16 S LIG No 784 51.3% 40.3% 8.4%

Lane change 33 48.5% 48.5% 3.0% Yes 128 56.3% 39.8% 3.9%

Active Braking 26 57.7% 42.3% 0.0% 17 V HON No 859 51.1% 41.0% 7.9%

Others 18 57.7% 42.3% 0.0% Yes 53 66.0% 28.3% 5.7%

6 N LOC Intersection 317 42.9% 44.5% 12.6% 18 S TASK No 784 52.0% 40.4% 7.5%

Non-intersection 595 56.8% 38.0% 5.2% Talking 125 51.2% 39.2% 9.6%

7 R CON Structured road 285 46.0% 43.5% 10.5% Others 3 66.7% 33.3% 0.0%

Normal road 238 46.2% 43.3% 10.5% 19 V BRA (0, 10] 501 62.5% 34.7% 2.8%

Hybrid road 251 62.5% 31.9% 5.6% (10, 20] 388 39.7% 46.4% 13.9%

Rural road 138 55.1% 43.5% 1.4% (10, + ∞] 23 30.4% 56.5% 13.0%

8 P VEH No 586 48.1% 42.8% 9.0%

Yes 326 58.9% 35.6% 5.5%

Note: Num denotes the index of predictor variables, and LR, low-risk group; MR, moderate-risk group; HR, high-risk group.

Fig. 5 shows the classiﬁcation tree generated by CART, where a tree with 17 nodes and 9 terminal nodes. The decision rules
70% of the entire observation set is applied for learning and extracted from CART are listed in Table 7. All probabilities of deci-
the remaining observations (30%) are applied for testing, as in sion rules are higher than 52.0%, with 76.7% being the highest value
Montella et al. (2012) and de Oña et al. (2013). CART created (rule 1).
J. Wang et al. / Accident Analysis and Prevention 84 (2015) 54–64 61

Table 7
Description of rules obtained from CART.

Node/rule Rules CART: IF, . . . THEN Probability

3 IF (T FAC = 2) AND (V BRA <= 13.71) MR 76.7%

4 IF (T FAC = 2) AND (V BRA > 13.71) HR 63.2%
8 IF (T FAC =/ 2) AND (V BRA <= 12.03) AND LR 76.0%
(O TYP =/ 1)
9 IF (T FAC =/ 2) AND (V BRA > 12.03) AND MR 63.2%
(P CRA =/ 5)
10 IF (T FAC =/ 2) AND (V BRA > 12.03) AND LR 52.0%
(P CRA = 5)
12 IF (T FAC =/ 2) AND (V BRA <= 12.03) AND LR 69.2%
(O TYP = 1) AND (P CRA = 4 OR P CRA = 5 OR
P CRA = 6)
14 IF (T FAC =/ 2) AND (V BRA < = 12.03) AND MR 55.2%
(O TYP = 1) AND (P CRA = 1 OR P CRA = 2 OR
P CRA = 3) AND (V RRA > 8.7)
15 IF (T FAC =/ 2) AND (V BRA <= 12.03) AND MR 57.1%
(O TYP = 1) AND (P CRA = 1 OR P CRA = 2 OR
P CRA = 3) AND (V RRA <= 8.7) AND (AGE <= 1.5)
16 IF (T FAC =/ 2) AND (V BRA < 12.03) AND LR 67.3%
(O TYP = 1) AND (P CRA = 1 OR P CRA = 2 OR
P CRA = 3) AND (V RRA <= 8.7) AND (AGE > 1.5)

Note: probability means percentage of observations in which the rule is accurate; LR, low-risk group; MR, moderate-risk group, HR, high-risk.

The root variable that generates the CART is T FAC (see Fig. 5), To further understand the performance of CART, comparisons of
indicating that the single best variable that classifies the driving- model predictions between the observed and predicted risk levels
risk level is the triggering factor that leads to the braking maneuver. for the learning and testing data are summarized in Table 8. The
CART directs the triggering factor that involves traffic light to the overall model prediction accuracy for the learning data is approx-
left, forming node 1, and directs the remaining triggering factors imately 66% and that for the testing data is approximately 62%,
to the right, forming node 2. For node 1 and depending on the which is within a reasonable range compared with the other stud-
braking speed (V BRA), nodes 3 and 4 are obtained with differ- ies on traffic accident severity in which classification methods were
ent driving-risk levels. Near-crashes are high-risk (probability of applied. For instance, Abdelwahab and Abdel-Aty (2001) used a
63.2%) if V BRA is greater than 13.7 m/s, (rule 4) and moderate-risk neural network method and achieved accuracies of 65.6% and 60.4%
(probability of 76.7%) if V BRA is less than 13.7 m/s (rule 3). This in the training and testing phases, respectively. de Oña et al. (2013)
result shows the direct relationship between moderate- and high- obtained 55% and 54% accuracy when they applied DT using dif-
risk near-crashes and sudden changes in traffic lights with high ferent algorithms (C4.5 and CART, respectively). The prediction
vehicle speeds. This result is consistent with the statistical results performance of CART demonstrates that the CART structure can
presented in the previous section. reflect the pattern hidden behind naturalistic data to some extent.
The rest of the rules are attributed to the triggering factors other The main objective is to identify the risk factors that affect the
than traffic lights (node 2). After this node, the CART is split accord- driving-risk level using CART in conjunction with the near-crash
ing to V BRA, and near-crashes with braking speeds of less than database. The statistical results listed in Table 6, CART structure
12.03 m/s are sent to the left, forming node 5; the remaining cases shown in Fig. 5, and rules listed in Table 7 present some clues and
are sent to the right, forming node 6. Based on the triggering factors relationships. The next section discusses in depth the risk factors
and braking speed in node 6, nodes 9 and 10 are obtained depend- that affect driving-risk level.
ing on the potential crash type (P CRA). If P CRA denotes cut-in
conflicts, the near-crashes are low-risk, with a probability of 52%
4.3. Risk factors affecting driving risk
(rule 10). However, if P CRA is of another type, the near-crashes
are of moderate risk with a probability of 63.2% (rule 9). In node 5,
The variable importance obtained from CART is used to quan-
the CART continues to grow according to the potential object type
tify the influence of potential risk factors on driving-risk level.
(O TYP). If O TYP is not a vehicle after node 5, the near-crash case
Table 9 lists the normalized importance of these variables. Sixteen
is low-risk with a probability of 76.0% (node 8 and rule 8). When
variables influencing the driving-risk level are detected, with val-
the O TYP is a vehicle (node 7), the CART is divided according to
ues varying from 100% to 0.1%. It is observed that four variables,
P CRA. From this point in the CART structure, rule interpretation
namely, (1) velocity when braking (V BRA), (2) triggering factor
is difficult because multiple variables are involved in near-crashes.
(T FAC), (3) potential object type (O TYP) and (4) potential crash
However, from the CART structure shown in Fig. 5, the following
type (P CRA), have the largest influence on the driving-risk level.
results are highlighted: if P CRA is opposite driving conflict, cut-in
Meanwhile, the other variables such as driver age (AGE), vehicle
conflict, or others, the near-crashes are low-risk with a probability
maneuver (V MAN), second task (S TASK), barriers for opposing
of 69.2% (node 12 and rule 12). If P CRA is rear end conflict, conflict
traffic flow (B OVE), and vehicles parked along the roadside (P VEH)
during intersection, or jump out, CART is divided by V BRA. At leaf
are considered to have relatively less effect in our study case.
node 14, if V BRA is higher than 6.67 m/s, the near-crashes are of
moderate risk (rule 14). For node 13, CART continues to split based
on the driver age into leaf nodes 15 and 16. At leaf node 15, if the 4.3.1. Velocity when braking
driver age is less than 30 years, the driving-risk level is moderate As shown in Table 9, V BRA is the most important variable affect-
with a probability of 57.1% (rule 15). For the rest of the driver char- ing driving-risk level, which apparently does not agree with the
acteristics, leaf node 16 predicts the driving-risk level involved in results of previous traffic accident severity analyses. For example,
near-crashes as low-risk with a probability of 67.3%. From this split- lighting condition was considered to have the most important effect
ting process, the driving-risk level in near-crashes can be predicted on the traffic accident severity (de Oña et al., 2013) and similar
by proceeding down the CART branches until a leaf node is reached. results were reported in Abdel-Aty (2003).
62 J. Wang et al. / Accident Analysis and Prevention 84 (2015) 54–64

Fig. 5. Output of CART model.

Intuitively, the higher the vehicle speed, the higher is the kinetic accurate speed information about traffic accidents (please see the
energy of the lone-driver-vehicle system. If a potential threat is database used in Chang and Chen, 2005; Chang and Wang, 2006; Li
present or a sudden change in the object status occurs in the driv- et al., 2008; Harb et al., 2009; Montella et al., 2012; Abellán et al.,
ing environment, the lone-driver-vehicle system becomes more 2013).
unstable and risky, meaning that the driving-risk level involved in On the other side, many studies have indicated that driving
a near-crash case increases as the vehicle velocity increases. One speed is an important factor for road safety (Elvik et al., 2004;
direct explanation for these phenomena, in which vehicle velocity Wallén and Åberg, 2008). Elvik et al. (2004) pointed out that speed
is usually not among the main factors that affect traffic accident not only affects the severity of a crash but is also related to the risk
severity, is that most traffic accident databases do not contain of being involved in a crash. From this perspective, our finding that

Table 8
Prediction result of CART model.

Learning data (N = 628) Testing data (N = 284)

Observed Predicted Correctly Observed Predicted Correctly

risk level risk level predicted risk level risk level predicted

Low-risk group 319 371 253 (79.3%) 155 167 113 (72.9%)
Moderate-risk group 254 219 138 (54.3%) 113 107 58 (51.3%)
High-risk group 55 38 24 (43.6%) 16 10 6 (37.5%)

The overall prediction accuracy is 66.1% for the learning data and 62.3% for the testing data.
J. Wang et al. / Accident Analysis and Prevention 84 (2015) 54–64 63

Table 9 The main objectives of this study are as follows: (1) cluster
Importance of the predictor variable with CART (VIM).
driving-risk level and (2) unveil the factors that influence the
Variables Normalized importance driving-risk level. Toward the first objective, we proposed a novel
V BRA 100.0% method to quantify the driving-risk levels in near-crash cases by
T FAC 96.7% clustering the braking process characteristics, namely, (1) max-
O TYP 82.9% imum deceleration, (2) average deceleration, and (3) percentage
P CRA 75.9% reduction in vehicle kinetic energy. K-means cluster analysis was
AGE 11.7%
applied to classify the near-crashes based on driving-risk level.
V MAN 9.0%
S TASK 5.9% Toward the second objective, CART is employed for unveiling the
B OVE 5.8% relationship among driving risk, driver/vehicle characteristics, and
P VEH 4.6% road environment using the near-crash database. CART provides
N LOC 2.7%
an alternative and appropriate approach for analyzing driving-risk
WEA 2.3%
V HON 2.1% levels in near-crashes owing to its ability to identify hidden pat-
B VEH 1.1% terns in the data without pre-establishing a functional relationship
R CON 0.5% among the variables.
GEN 0.1% Nine useful decision rules were obtained from the CART struc-
V STA 0.1%
ture (Table 7). The overall model prediction accuracy for the
learning data was approximately 66% and that for the testing data
V BRA is the most important variable affecting driving-risk level was approximately 62% (Table 8). These values are within the rea-
agrees with those of previous studies in road safety research. sonable range compared with other studies on traffic accident
severity. Furthermore, four variables, namely, (1) velocity when
braking (V BRA), (2) triggering factors (T FAC), (3) potential object
4.3.2. Triggering factors
type (O TYP) and (4) potential crash type (P CRA), from CART were
Triggering factor (T FAC) is the second most important variable
found to have the largest influences on the driving-risk level, which,
with a normalized importance of 96.7% in the CART model (Table 9).
to some extent, is in accordance with the results of some previ-
Table 6 shows that traffic light in the fifth predictor variable T FAC
ous studies. These results validate the method proposed in this
has a significant effect on the driving-risk level in near-crashes
paper. It should be noted that there are some limitations of our cur-
because a relatively high proportion of near-crashes caused by sud-
rent naturalistic driving experiment. First, it was only conducted
den changes in the traffic lights occurs in the moderate- or high-risk
in one city, i.e., Beijing, and we carefully designed the experi-
groups (55.3% and 35.0%, respectively). Rules 3 and 4 in Table 7 also
ment to include all the types of roads. Because of the actual road
support this finding. This result agrees with those of previous stud-
conditions in Beijing, however, there are few curves in our exper-
ies on vehicle crashes resulting from dilemma zones at signalized
imental routes. Hence, the influence of curve alignment could not
intersections (Rakha et al., 2007; Aoude et al., 2012).
be quantified in our current database. A few previous studies, for
example, Montella et al. (2012) and Montella and Liana (2015), have
4.3.3. Potential crash object and crash type pointed out that the curve alignment in road types was an impor-
Crash object type (O TYP) and potential crash type (P CRA) have tant factor affecting road safety. Second, the time-duration of the
82.9% and 75.9% normalized importance, respectively, in the CART current experiment was not very long (lasted for two months), and
model. Rules 9 and 10 in Table 7 demonstrate that at vehicle speeds the weather conditions were sunny or cloudy for the most part.
greater than 12.03 m/s, the near-crashes caused by cut-in conflict Intuitively, rainy weather would have a significant influence on
(P CRA is equal to 5, see definition in Table 4) are likely associated the traffic safety, as pointed out in previous studies, for instance,
with the moderate-risk level, whereas the near-crashes caused by Abellán et al. (2013). In our current database, the influence of
other factors are likely associated with the low-risk level group. weather conditions on the driving risk was not fully addressed.
Cut-in conflict usually occurs during lane change maneuvers. Lane Despite such limitations, however, it should be pointed out that
change is an important factor that affects driving-risk level, and a in this paper, the authors’ proposed a novel method to quantify
lane change would lead to a collision if the maneuver is not proper the driving risk in a near-crash event and to analyze the asso-
(Pande and Abdel-Aty, 2006). ciated risk-factors. The proposed method can be extrapolated to
specific studies on other datasets (i.e., other infrastructure, roads,
4.3.4. Other factors and countries).
Other factors such as driver age (AGE), vehicle maneuvers Future research will consider individual driving risk because
(V MAN), second task (S TASK), barriers for opposing vehicles driving risk substantially varies among drivers and identifying fac-
(B OVE), and vehicles parked along the roadside (P VEH) have rel- tors associated with individual driver risk will further facilitate the
atively small effects on the driving-risk level in our naturalistic identification of apt safety countermeasures (Guo and Fang, 2013).
driving experiment conducted on Chinese roads. It should be noted We would like to identify some factors such as age, gender, and
that this conclusion suits the driving environment in our natural- driver characteristics that affect an individual driver risk by using
istic driving experiment. As the driving context changes, factors the naturalistic driving data. Furthermore, O TYP and P CRA are
such as S TASK, B OVE, and P VEH may have significant influences also found to have important influences on the driving-risk level
on road safety. in near-crashes. One question worthy for further study is whether
the factors that affect the driving-risk level remain the same when
5. Conclusions sub-datasets related to vehicles or pedestrians are the focus.

We recorded 912 near-crashes over the course of a 60-

day naturalistic driving experiment involving 31 drivers. In this Acknowledgments
paper, a comprehensive transcription protocol containing impor-
tant attributes that describe the conditions contributing to driving This collaborative research was supported by a joint project
risk was ﬁrst designed to analyze the relationship among driving of Tsinghua and Honda. The authors would like to thank the
risk, driver/vehicle characteristics, and road environment. National Natural Science Foundation of China (No. 51175290 and
64 J. Wang et al. / Accident Analysis and Prevention 84 (2015) 54–64

No. 51475254) for its support. The authors would also like to thank Jovanis, P.P., Aguero-Valverde, J., Wu, K.F., Shankar, V., 2011. Analysis of naturalistic
those who participated in the driving experiments. driving event data. Transp. Res. Rec.: J. Transp. Res. Board 2236 (1), 49–57.
Kaufman, L., Rousseeuw, P.J., 2009. Finding Groups in Data: An Introduction to
Cluster Analysis, vol. 344. John Wiley & Sons.
References Li, X., Lord, D., Zhang, Y., Xie, Y., 2008. Predicting motor vehicle crashes using
support vector machine models. Accid. Anal. Prev. 40 (4), 1611–1618.
Abdel-Aty, M., 2003. Analysis of driver injury severity levels at multiple locations Lord, D., Mannering, F., 2010. The statistical analysis of crash-frequency data: a
using ordered probit models. J. Saf. Res. 34 (5), 597–603. review and assessment of methodological alternatives. Transp. Res. A: Policy
Abdelwahab, H.T., Abdel-Aty, M.A., 2001. Development of artificial neural network Pract. 44 (5), 291–305.
models to predict driver injury severity in traffic accidents at signalized Lu, G., Cheng, B., Lin, Q., Wang, Y., 2012. Quantitative indicator of homeostatic risk
intersections. Transp. Res. Rec.: J. Transp. Res. Board 1746 (1), perception in car following. Saf. Sci. 50 (9), 1898–1905.
6–13. Malta, L., Miyajima, C., Takeda, K., 2009. A study of driver behavior under potential
Abellán, J., López, G., de OñA, J., 2013. Analysis of traffic accident severity using threats in vehicle traffic. Intell. Transp. Syst. IEEE Trans. 10 (2), 201–210.
Decision Rules via Decision Trees. Expert Syst. Appl. 40 (15), 6047–6054. Montella, A.I., Liana, L., 2015. Safety performance functions incorporating design
Al-Ghamdi, A.S., 2002. Using logistic regression to estimate the influence of consistency variables. Accid. Prev. 74, 133–144.
accident factors on accident severity. Accid. Anal. Prev. 34 (6), 729–741. Montella, A., Aria, M., Dambrosio, A., Mauriello, F., 2011. Data-mining techniques
Aoude, G.S., Desaraju, V.R., Stephens, L.H., How, J.P., 2012. Driver behavior for exploratory analysis of pedestrian crashes. Transp. Res. Rec. 2237, 107–116.
classification at intersections and validation on large naturalistic data set. Montella, A., Aria, M., D’Ambrosio, A., Mauriello, F., 2012. Analysis of powered
Intell. Transp. Syst. IEEE Trans. 13 (2), 724–736. two-wheeler crashes in Italy by classification trees and rules discovery. Accid.
Bagdadi, O., 2013. Assessing safety critical braking events in naturalistic driving Anal. Prev. 49, 58–72.
studies. Transp. Res. F: Traffic Psychol. Behav. 16, 117–126. Montella, et al., 2013. Crash databases in Australasia, the European Union, and the
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A., 1984. Classification and United States: review and prospects for improvement. Transp. Res. Rec. 2386,
Regression Trees. CRC Press. 128–136.
Chang, L.Y., Chen, W.C., 2005. Data mining of tree-based models to analyze freeway Moreno, A.T., García, A., 2013. Use of speed profile as surrogate measure: effect of
accident frequency. J. Saf. Res. 36 (4), 365–375. traffic calming devices on crosstown road safety performance. Accid. Anal.
Chang, L.Y., Wang, H.W., 2006. Analysis of traffic injury severity: an application of Prev. 61, 23–32.
non-parametric classification tree techniques. Accid. Anal. Prev. 38 (5), Pande, A., Abdel-Aty, M., 2006. Assessment of freeway traffic parameters leading to
1019–1027. lane-change related collisions. Accid. Anal. Prev. 38 (5), 936–948.
Charlton, S.G., Starkey, N.J., Perrone, J.A., Isler, R.B., 2014. What’s the risk? A Rakha, H., El-Shawarby, I., Setti, J.R., 2007. Characterizing driver behavior on
comparison of actual and perceived driving risk. Transp. Res. F: Traffic Psychol. signalized intersection approaches at the onset of a yellow-phase trigger.
Behav. 25, 50–64. Intell. Transp. Syst. IEEE Trans. 8 (4), 630–640.
Chen, H., Cao, L., Logan, D.B., 2012. Analysis of risk factors affecting the severity of Sepulcre, M., Gozalvez, J., Hernandez, J., 2013. Cooperative vehicle-to-vehicle
intersection crashes by logistic regression. Traffic Inj. Prev. 13 (3), active safety testing under challenging conditions. Transp. Res. C: Emerg.
300–307. Technol. 26, 233–255.
de Oña, J., López, G., Abellán, J., 2013. Extracting decision rules from police accident Takeda, K., Hansen, J.H., Boyraz, P., Malta, L., Miyajima, C., Abut, H., 2011.
reports through decision trees. Accid. Anal. Prev. 50, 1151–1160. International large-scale vehicle corpora for research on driver behavior on the
Dingus, T.A., Klauer, S.G., Neale, V.L., Petersen, A., Lee, S.E., Sudweeks, J.D., Knipling, road. Intell. Transp. Syst. IEEE Trans. 12 (4), 1609–1623.
R.R., 2006. The 100-Car Naturalistic Driving Study Phase II-Results of the UMTRI and GMRDC, 2005. Automotive Collision Avoidance System Field
100-Car Field Experiment (No. HS-810 593). Operational Test Report: Methodology And Results. Final research report.
Donmez, B., Boyle, L.N., Lee, J.D., 2009. Differences in off-road glances: effects on NHTSA, US Department of Transportation.
young drivers’ performance. J. Transp. Eng. 136 (5), 403–409. Wallén, Warner H., Åberg, L., 2008. Drivers’ beliefs about exceeding the speed
DTM-China (Ministry of Public Security, Department of Traffic Management), limits. Transp. Res. F: Traffic Psychol. Behav. 11 (5), 376–389.
2010. Annual Report of Road Traffic Accidents Statistics in P.R. China. Scientific Wang, J.Q., Li, S.E., Zheng, Y., Lu, X.Y., 2015. Longitudinal collision mitigation via
Research Institute of Traffic Management, Ministry of Public Security, Beijing coordinated braking of multiple vehicles using model predictive control.
(in Chinese). Integr. Comput. Aided Eng. 22 (2), 171–185.
Elvik, R., Christensen, P., Amundsen, A., 2004. Speed and road accidents. An Wu, K.F., Jovanis, P.P., 2012. Crashes and crash-surrogate events: exploratory
evaluation of the Power Model. TØI Rep. 740, 2004. modeling with naturalistic driving data. Accid. Anal. Prev. 45, 507–516.
Guo, F., Fang, Y., 2013. Individual driver risk assessment using naturalistic driving Wu, K.F., Jovanis, P.P., 2013. Screening naturalistic driving study data for
data. Accid. Anal. Prev. 61, 3–9. safety-critical events. Transp. Res. Rec.: J. Transp. Res. Board 2386 (1), 137–146.
Guo, F., Klauer, S.G., Hankey, J.M., Dingus, T.A., 2010. Near crashes as crash Wu, K.F., Aguero-Valverde, J., Jovanis, P.P., 2014. Using naturalistic driving data to
surrogate for naturalistic driving studies. Transp. Res. Rec.: J. Transp. Res. explore the association between traffic safety-related events and crash risk at
Board 2147 (1), 66–74. driver level. Accid. Anal. Prev. 47, 210–218.
Harb, R., Yan, X., Radwan, E., Su, X., 2009. Exploring precrash maneuvers using Young, W., Sobhani, A., Lenné, M.G., Sarvi, M., 2014. Simulation of safety: a review
classification trees and random forests. Accid. Anal. Prev. 41 (1), 98–107. of the state of the art in road safety simulation modelling. Accid. Anal. Prev. 66,
Jarašūniene, A., Jakubauskas, G., 2007. Improvement of road safety using passive 89–103.
and active intelligent vehicle safety systems. Transport 22 (4), 284–289. Zheng, Y., Li, S., Wang, J., Wang, L., Li, K., 2014. Influence of information flow
Jonasson, J.K., Rootzén, H., 2014. Internal validation of near-crashes in naturalistic topology on closed-loop stability of vehicle platoon with rigid formation. In:
driving studies: a continuous and multivariate approach. Accid. Anal. Prev. 62, 17th Intelligent Transportation System Conference, IEEE, pp. 2094–2100.
102–109.

View publication stats

CDL Test Answers
100% (1)
CDL Test Answers
13 pages
ZHS Esm
No ratings yet
ZHS Esm
8 pages
HSE Examination Answer
100% (2)
HSE Examination Answer
8 pages
Motor Vehicle Liability Insurance
No ratings yet
Motor Vehicle Liability Insurance
19 pages
153 Efficiency All-Small-Molecule Organic Solar Ce
No ratings yet
153 Efficiency All-Small-Molecule Organic Solar Ce
10 pages
Graphene Oxide-Based Fe-Mg (Hydr) Oxide Nanocomposite As Heavy Metals Adsorbent
No ratings yet
Graphene Oxide-Based Fe-Mg (Hydr) Oxide Nanocomposite As Heavy Metals Adsorbent
10 pages
Perovskite Solar Cells With 18.21% Efficiency and Area Over 1 cm2 Fabricated by Heterojunction Engineering
No ratings yet
Perovskite Solar Cells With 18.21% Efficiency and Area Over 1 cm2 Fabricated by Heterojunction Engineering
8 pages
FuzzyLogicMethodforEvaluatingHabitatSuitability PDF
No ratings yet
FuzzyLogicMethodforEvaluatingHabitatSuitability PDF
15 pages
Recycling 07 00066
No ratings yet
Recycling 07 00066
17 pages
GoingDeeperwithEmbeddedFPGAPlatformforConvolutionalNeuralNetwork
No ratings yet
GoingDeeperwithEmbeddedFPGAPlatformforConvolutionalNeuralNetwork
11 pages
Design Optimization of Ultrasonic Vibration Cutting Tool To Generate
No ratings yet
Design Optimization of Ultrasonic Vibration Cutting Tool To Generate
17 pages
Arsenic Accumulation and Speciation in Rice Are Affected by Root Aeration and Variation of Genotypes
No ratings yet
Arsenic Accumulation and Speciation in Rice Are Affected by Root Aeration and Variation of Genotypes
11 pages
Liuetal 2012
No ratings yet
Liuetal 2012
9 pages
Wide Angleflatmetasurfacecornerreflector
No ratings yet
Wide Angleflatmetasurfacecornerreflector
7 pages
TheinhibitionoftheAnammoxprocess Areview
No ratings yet
TheinhibitionoftheAnammoxprocess Areview
14 pages
Tai-LingLiuetalShyi CompPsychiatry 2019
No ratings yet
Tai-LingLiuetalShyi CompPsychiatry 2019
8 pages
A Comprehensive Review On The Progress of Lead Zirconate-Based Antiferroelectric Materials
No ratings yet
A Comprehensive Review On The Progress of Lead Zirconate-Based Antiferroelectric Materials
8 pages
2014 459 Tjog Medicaltreatmentofadenomyoma
No ratings yet
2014 459 Tjog Medicaltreatmentofadenomyoma
8 pages
2020 CyberC AComprehensiveDetectionApproachofNmap-PrinciplesRulesandExperiments
No ratings yet
2020 CyberC AComprehensiveDetectionApproachofNmap-PrinciplesRulesandExperiments
9 pages
20090521biotechniques Prim SNPing
No ratings yet
20090521biotechniques Prim SNPing
7 pages
Design and Fabrication of A Magnetic Propulsion
No ratings yet
Design and Fabrication of A Magnetic Propulsion
13 pages
Amphibians and Reptiles Matang Wildlife Centre
No ratings yet
Amphibians and Reptiles Matang Wildlife Centre
8 pages
Açık Kanal Akışlarında Yatak Malzemesi Tortu Deşarjını Tahmin Etmek İçin Karma Hız Ölçeği #Velocity Scale
No ratings yet
Açık Kanal Akışlarında Yatak Malzemesi Tortu Deşarjını Tahmin Etmek İçin Karma Hız Ölçeği #Velocity Scale
9 pages
In Uence of Surface Roughness of Bragg Re Ectors On Resonance Characteristics of Solidly-Mounted Resonators
No ratings yet
In Uence of Surface Roughness of Bragg Re Ectors On Resonance Characteristics of Solidly-Mounted Resonators
8 pages
Deep Learning Deep Learning Hyperspectral
No ratings yet
Deep Learning Deep Learning Hyperspectral
18 pages
A Simple Method To Extract DNA From Hair Shafts Us
No ratings yet
A Simple Method To Extract DNA From Hair Shafts Us
8 pages
Nitrogen Oxide Cycle Regulates Nitric Oxide Levels
No ratings yet
Nitrogen Oxide Cycle Regulates Nitric Oxide Levels
12 pages
Transmission Tower
No ratings yet
Transmission Tower
15 pages
Shear Behavior of Externally Prestressed Concrete Beams With Draped Tendons
No ratings yet
Shear Behavior of Externally Prestressed Concrete Beams With Draped Tendons
13 pages
Network Failure-Aware Redundant Virtual Machine Placement in A Cloud Data Center
No ratings yet
Network Failure-Aware Redundant Virtual Machine Placement in A Cloud Data Center
11 pages
Vol.26No.82024-TableofContents-InternationalJournalofMentalHealthPromotion
No ratings yet
Vol.26No.82024-TableofContents-InternationalJournalofMentalHealthPromotion
3 pages
Biomimetic Design of Macroporous 3D Truss Materials For Efficient Interfacial Solar Steam
No ratings yet
Biomimetic Design of Macroporous 3D Truss Materials For Efficient Interfacial Solar Steam
10 pages
Recommendations For Acceleration Lane Length For Metered On-Ramps
No ratings yet
Recommendations For Acceleration Lane Length For Metered On-Ramps
12 pages
Treatment of 5 Critically Ill Patients With COVID-19 With Convalescent Plasma
No ratings yet
Treatment of 5 Critically Ill Patients With COVID-19 With Convalescent Plasma
9 pages
2022 GRSL Acceptedversion Sysu3d
No ratings yet
2022 GRSL Acceptedversion Sysu3d
6 pages
Fake News Research: Theories, Detection Strategies, and Open Problems
No ratings yet
Fake News Research: Theories, Detection Strategies, and Open Problems
3 pages
Snow 2015
No ratings yet
Snow 2015
11 pages
Fully Integrated Wearable Sensor Arrays For Multiplexed in Situ Perspiration Analysis
No ratings yet
Fully Integrated Wearable Sensor Arrays For Multiplexed in Situ Perspiration Analysis
19 pages
Structural Analysis and Deformation Characteristics of The Yingba Metamorphic Core Complex, Northwestern Margin Of..
No ratings yet
Structural Analysis and Deformation Characteristics of The Yingba Metamorphic Core Complex, Northwestern Margin Of..
19 pages
Optimization of MN Content For High Strengths in H
No ratings yet
Optimization of MN Content For High Strengths in H
8 pages
UHPCforAcceleratedBridgeConstructionMaterialPropertiesStructuralElementsandStructuralApplications
No ratings yet
UHPCforAcceleratedBridgeConstructionMaterialPropertiesStructuralElementsandStructuralApplications
25 pages
Arxiv
No ratings yet
Arxiv
11 pages
11 Ii
No ratings yet
11 Ii
8 pages
2017 MA Teaching Chinese As Second or Foreign Language
No ratings yet
2017 MA Teaching Chinese As Second or Foreign Language
18 pages
AHybrid Regression Techniquefor House Prices Predictions
No ratings yet
AHybrid Regression Techniquefor House Prices Predictions
6 pages
AmarinebiodiversityplanforChinaandbeyond
No ratings yet
AmarinebiodiversityplanforChinaandbeyond
4 pages
Microstructural Characteristics of A Stainless Steel/Copper Dissimilar Joint Made by Laser Welding
No ratings yet
Microstructural Characteristics of A Stainless Steel/Copper Dissimilar Joint Made by Laser Welding
8 pages
The Biotron Breeding System A Rapid and Reliable P PDF
No ratings yet
The Biotron Breeding System A Rapid and Reliable P PDF
10 pages
Chemical Reaction Directed Oriented Attachment Fro
No ratings yet
Chemical Reaction Directed Oriented Attachment Fro
31 pages
Materials 14 00147 1
No ratings yet
Materials 14 00147 1
13 pages
MycologicalResearch2001
No ratings yet
MycologicalResearch2001
8 pages
CNF
No ratings yet
CNF
12 pages
A Peep at Pornography Web in China
0% (1)
A Peep at Pornography Web in China
7 pages
Efficient_transposition_of_the_Tol2_transposable_e
No ratings yet
Efficient_transposition_of_the_Tol2_transposable_e
7 pages
Effective Ways in Teaching Chinese Characters Without Phonetic Clues
No ratings yet
Effective Ways in Teaching Chinese Characters Without Phonetic Clues
8 pages
Srep 04936
No ratings yet
Srep 04936
5 pages
The GENIA Corpus: An Annotated Research Abstract Corpus in Molecular Biology Domain
No ratings yet
The GENIA Corpus: An Annotated Research Abstract Corpus in Molecular Biology Domain
6 pages
First Report of Phytophthora Sojae Causing Root
No ratings yet
First Report of Phytophthora Sojae Causing Root
9 pages
Experimental and Numerical Studies On The Performances of Stone Column and Sand Compaction Piles
No ratings yet
Experimental and Numerical Studies On The Performances of Stone Column and Sand Compaction Piles
7 pages
02jbeb04 1067
No ratings yet
02jbeb04 1067
5 pages
ADigitalTwin-BasedApproachforDesigning
No ratings yet
ADigitalTwin-BasedApproachforDesigning
12 pages
Neuro-Fuzzy and Soft Computing-A Computational Approach To Learning and Machine Intelligence (Book Review)
No ratings yet
Neuro-Fuzzy and Soft Computing-A Computational Approach To Learning and Machine Intelligence (Book Review)
4 pages
Vehicle Crash Testing
From Everand
Vehicle Crash Testing
Serena Vaughn
No ratings yet
Driver Behaviour Questionnaire PDF
No ratings yet
Driver Behaviour Questionnaire PDF
3 pages
Profiling Road Accidents in Terms of Tire Defects in Sorsogon City
No ratings yet
Profiling Road Accidents in Terms of Tire Defects in Sorsogon City
9 pages
Traffic Management Investigation Student
No ratings yet
Traffic Management Investigation Student
273 pages
Traffic Crash Investigations: A Training Guide For Law Enforcement Officers
100% (5)
Traffic Crash Investigations: A Training Guide For Law Enforcement Officers
113 pages
Revised Guidelines On Industry Transport Discipline
No ratings yet
Revised Guidelines On Industry Transport Discipline
12 pages
A7 Octavia OwnersManual
No ratings yet
A7 Octavia OwnersManual
268 pages
Ibiza ST: Owner's Manual
No ratings yet
Ibiza ST: Owner's Manual
280 pages
National Road Safety Strategy 2021-30-3
No ratings yet
National Road Safety Strategy 2021-30-3
36 pages
8th Grades Source of Success
No ratings yet
8th Grades Source of Success
41 pages
03 Updated Risk Assessment (15.12.2020)
No ratings yet
03 Updated Risk Assessment (15.12.2020)
11 pages
SUMMATIVE ASSESSMENT FOR 11th GRADE
No ratings yet
SUMMATIVE ASSESSMENT FOR 11th GRADE
10 pages
Unit 5
No ratings yet
Unit 5
7 pages
Sakrabilar 2005
No ratings yet
Sakrabilar 2005
8 pages
Owners Manual Cayman PCNA
No ratings yet
Owners Manual Cayman PCNA
284 pages
Group 6
No ratings yet
Group 6
71 pages
AI-Based Helmet Violation Detection For Traffic Ma
No ratings yet
AI-Based Helmet Violation Detection For Traffic Ma
17 pages
MACP Appeal
No ratings yet
MACP Appeal
13 pages
‘Cashless Treatment’ Scheme for Road Accident Victims _ Current Affairs _ Vision IAS
No ratings yet
‘Cashless Treatment’ Scheme for Road Accident Victims _ Current Affairs _ Vision IAS
10 pages
Chapter 5 Final Road Safety WHO
No ratings yet
Chapter 5 Final Road Safety WHO
9 pages
Teacher-6-Traffic-Awareness-and-Road-Safety
No ratings yet
Teacher-6-Traffic-Awareness-and-Road-Safety
20 pages
Bank Soal Bahasa Inggris Kelas Xi Sem.3
No ratings yet
Bank Soal Bahasa Inggris Kelas Xi Sem.3
19 pages
Safe Forklift Operation Manual
100% (1)
Safe Forklift Operation Manual
32 pages
Untitled
No ratings yet
Untitled
2 pages
Safety Performance Indicators - Motor Vehicle Crash Data - 2008-2019
No ratings yet
Safety Performance Indicators - Motor Vehicle Crash Data - 2008-2019
56 pages
Risk Assessement English Version
100% (1)
Risk Assessement English Version
114 pages
Restricted DoD Guide - Individual Protective Measures To Combat Terrorism GTA 19-04-003 PDF
No ratings yet
Restricted DoD Guide - Individual Protective Measures To Combat Terrorism GTA 19-04-003 PDF
24 pages
Cupang Antipolo
No ratings yet
Cupang Antipolo
1 page

2 PDF

Uploaded by

2 PDF

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Driving risk assessment using near-crash database through data mining of

Article in Accident Analysis & Prevention · August 2015

Jianqiang Wang Yang Zheng

SEE PROFILE SEE PROFILE

Chenfei Yu Kenji Kodaka

The user has requested enhancement of the downloaded file.

Contents lists available at ScienceDirect

Accident Analysis and Prevention

Driving risk assessment using near-crash database through data

1. Introduction As the responsibility for trafﬁc accidents involves the vehicles,

Time period Morning Afternoon Night

Hours 140 220 50

Kilometers 1800 1210 4100 1650

1, highway; 2, city ring road; 3, inner-city road; 4, rural road.

2.1.2. Experiment design

3 The designed transcription protocol is comprehensive and con-

Variable Code Type Description

In the present paper, driving risk is deﬁned as a potential threat

3) Percentage reduction in vehicle kinetic energy E from t0 to t1 .

The average deceleration aaverage is calculated as follows:

(1/2)mv2 (t0 ) − (1/2)mv2 (t1 ) v(t1 )

Cluster analysis is a valid approach for classifying driving risks

amin (m/s2 ) aaverage (m/s2 ) E

Low-risk 474 52.0% −1.931 −1.027 30.9%

Lane change 46 45.7% 50.0% 4.3% Others 38 52.6% 42.1% 5.3%

Opposite driving 46 67.4% 28.3% 4.3% >60 42 38.1% 57.1% 4.8%

Others 18 57.7% 42.3% 0.0% Yes 53 66.0% 28.3% 5.7%

8 P VEH No 586 48.1% 42.8% 9.0%

Yes 326 58.9% 35.6% 5.5%

Node/rule Rules CART: IF, . . . THEN Probability

3 IF (T FAC = 2) AND (V BRA <= 13.71) MR 76.7%

Fig. 5. Output of CART model.

Learning data (N = 628) Testing data (N = 284)

Observed Predicted Correctly Observed Predicted Correctly

We recorded 912 near-crashes over the course of a 60-

View publication stats

You might also like