Impri A Lou 2016
Impri A Lou 2016
a
School of Civil and Building Engineering, Loughborough University, Loughborough LE11 3TU, United Kingdom
b
Zachry Department of Civil Engineering, Texas A&M University, College Station, TX 3136, United States
a r t i c l e i n f o a b s t r a c t
Article history: Although speed is considered to be one of the main crash contributory factors, research findings are
Received 21 April 2015 inconsistent. Independent of the robustness of their statistical approaches, crash frequency models typ-
Received in revised form 20 August 2015 ically employ crash data that are aggregated using spatial criteria (e.g., crash counts by link termed
Accepted 1 October 2015
as a link-based approach). In this approach, the variability in crashes between links is explained by
Available online 11 November 2015
highly aggregated average measures that may be inappropriate, especially for time-varying variables
such as speed and volume. This paper re-examines crash–speed relationships by creating a new crash
Keywords:
data aggregation approach that enables improved representation of the road conditions just before
Traffic speed
Crash frequency
crash occurrences. Crashes are aggregated according to the similarity of their pre-crash traffic and geo-
Crash severity metric conditions, forming an alternative crash count dataset termed as a condition-based approach.
Pre-crash conditions Crash–speed relationships are separately developed and compared for both approaches by employing
Multivariate Poisson lognormal regression the annual crashes that occurred on the Strategic Road Network of England in 2012. The datasets are
Multivariate spatial correlation modelled by injury severity using multivariate Poisson lognormal regression, with multivariate spatial
effects for the link-based model, using a full Bayesian inference approach. The results of the condition-
based approach show that high speeds trigger crash frequency. The outcome of the link-based model is the
opposite; suggesting that the speed–crash relationship is negative regardless of crash severity. The differ-
ences between the results imply that data aggregation is a crucial, yet so far overlooked, methodological
element of crash data analyses that may have direct impact on the modelling outcomes.
© 2015 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license
(http://creativecommons.org/licenses/by/4.0/).
http://dx.doi.org/10.1016/j.aap.2015.10.001
0001-4575/© 2015 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
174 M.-I.M. Imprialou et al. / Accident Analysis and Prevention 86 (2016) 173–185
2. Literature review
and Mannering, 2009; Chang, 2005; Milton and Mannering, 1998; Road Accident Database of the United Kingdom (STATS 19) and
Shankar et al., 1995) and horizontal curvature (i.e. frequent and include 10,520 crashes that occurred during 2012 on the Strategic
sharp curves) (Abdel-Aty and Radwan, 2000; Anastasopoulos and Road Network (SRN) of England (Department for Transport, 2011).
Mannering, 2009; Ma et al., 2008; Milton and Mannering, 1998; The SRN consists of the main motorways and A-roads of the coun-
Shankar et al., 1995). The number of lanes is also linked with lane try and the total length is 6920 km (4272 miles). STATS 19 reports
changes that increase vehicle interactions and consequently the record crashes that accounted for at least one slight injury, along
number of crashes (Chang, 2005; Milton and Mannering, 1998); with information related to the crash and the involved parties (i.e.,
nevertheless, Ma and Kockelman (2006) report that wider roads drivers, casualties and vehicles). The variables that were used here
decreased the number of non-fatal crashes. were crash severity, date, time and location.
From a methodological perspective, during the last two decades Location is a crucial factor for crash analyses because it is closely
count models such as Poisson and Negative Binomial (NB) regres- related with the identification of the traffic and geometric condi-
sion (Lord and Mannering, 2010) as well as their various extensions tions that are related to a crash. When crash location data are not
are considered to have the most suitable statistical properties satisfactory in terms of accuracy, the application of crash mapping
for modelling crash count data that are usually characterised techniques has been shown to significantly change the results of
by low mean values, over-dispersion and heteroscedasticity (see crash analyses (Imprialou et al., 2015). The objective of crash map-
also Mannering and Bhat, 2014). The initial approaches employed ping is to determine a set of coordinates that represent the crash
fixed-parameters NB regression (Abdel-Aty and Radwan, 2000; location as precisely as possible. In STATS 19 reports, crash locations
Ivan et al., 2000; Lord et al., 2005b; Miaou and Lum, 1993; were less accurate than desired. Thus, crashes were reallocated to
Milton and Mannering, 1998; Shankar et al., 1995). More recent refined positions estimated by a fuzzy logic crash mapping algo-
studies are controlling for unobserved heterogeneity (such as rithm that was developed for the study area using distance, vehicle
spatial and temporal correlation) using random effects (Barua direction, road name and type. This provides a 98.9% (±1.1%) accu-
et al., 2014; Guo et al., 2010; Quddus, 2008) hierarchical (Kim rate matching score (Imprialou et al., 2014).
et al., 2007) or random-parameter models (Anastasopoulos and Traffic data were extracted from the UK Highways Agency
Mannering, 2009). Multivariate Poisson (Ma and Kockelman, 2006) Journey Time Database (JTDB) that includes link-level traffic infor-
and Poisson lognormal models (Aguero-Valverde and Jovanis, mation obtained by inductive loop detectors for the entire SRN
2009; El-Basyouny and Sayed, 2009; Ma et al., 2008; Park and Lord, (2505 links1 ) (Highways Agency, 2011). The measurement inter-
2007) are proposed for modelling simultaneously different crash val is 15 min resulting to a dataset of approximately 88 million
types (e.g., by level of severity and frequency simultaneously) in observations. The variables used for this analysis are average speed
order to control for the unobserved heterogeneity that arises from (km/h), volume (vehicles) and travel time (seconds) (Highways
the correlations between them. Agency, 2011). Road configuration was determined based on the UK
Crash counts in the majority of the studies are generated by Highways Agency Traffic Speed Condition Survey database (TRACS)
dividing the examined network into homogeneous links or seg- (Highways Agency, 2008). TRACS contains measurements of geo-
ments (i.e. link-based approach). This approach is logical and metric characteristics (i.e., radius and gradient) by survey vehicles
effective from a practical point of view as the traffic data are for the entire SRN divided into 10-m segments.
usually available at the link level. Nevertheless, it is a fact that The data were processed separately in order to produce the
both traffic and geometric conditions at the roadway may vary datasets. Although the two datasets stem from exactly the same
significantly even for adjacent parts of the same road (e.g. due databases, they represent the relationship of crashes with road-
to road topography and on-off ramps). Therefore, the assump- related variables from entirely different perspectives and sampling
tion of homogeneity of the conditions within links that include frames. The sampling frame of the link-based dataset consists of
up to several miles of roadway and sometimes both directions of road links that are actual spatial entities and is the conventional
traffic may not necessarily be true. Additionally, the characteris- approach for safety models. The sampling frame of the condition-
tic values used for each of the examined factors that are usually based dataset comprises of all the possible combinations of traffic
measures of central tendency may not be representative of the and geometric conditions; a set of abstract/non-physical attributes
actual conditions at the time and location of the crash. Studies that can potentially co-exist at the time and the location of a crash.
focusing on proactive crash prediction highlight that crashes are
related to suddenly developed and often extreme traffic conditions
3.1. Link-based dataset
(e.g., high and low speeds) that cannot be captured from aggre-
gated measures such as hourly or annual averages (Abdel-Aty and
A link-based dataset enlists the links that comprise a road net-
Pande, 2005; Hossain and Muromachi, 2013; Pande and Abdel-Aty,
work and the total number of crashes per link. The crashes occurred
2005). The use of these measures therefore leads to loss of informa-
on the link at different time points during the study period. Each
tion and under-representation of extreme conditions that may be
link contains information that represents the conditions on the
crucial in explaining crash occurrences. These limitations of link-
road defined by descriptive statistics (e.g. mean, median, maxi-
based crash modelling are likely to be reflected in the results of
mum, etc.). Based on this aggregation method, it is assumed that
analyses leading to the possibly erroneous and inconsistent con-
the triggering factors for crashes that occurred on the same link are
clusions.
similar, which of course might be not true for all the cases as shown
This paper attempts to address the above limitations using an
in Figs. 1 and 2.
alternative crash data aggregation method. Condition-based mod-
Based on the output of the crash mapping algorithm, each road
elling enables a more accurate representation of the conditions just
link was assigned with a number of crashes (crash counts varied
before crashes so as to shed more light on the relationship of traffic
from 0 to 36 per link) and one characteristic value represent-
speed with crash frequency.
ing speed, volume, curvature, gradient and the number of lanes.
Considering the dynamic nature of the traffic variables (i.e., speed
3. Data description and pre-processing
and volume) as well as the fact that a road link typically covers a
The generation of the crash datasets for both the link-based and
the condition-based approaches requires the merger of crash, traf-
fic and geometry data. Crash data were obtained from the National 1
Average link length 5.23 (±4.76) km.
176 M.-I.M. Imprialou et al. / Accident Analysis and Prevention 86 (2016) 173–185
Table 1
Definition of variables which are included in the link-based dataset and the condition-based dataset respectively.
Speeda Annual average of measured speeds on each link (averaged S1. Speed up to 2nd percentile
over 35,040 records) S2. Speed between the 3nd and the 4th percentile
S3. Speed between the 5th and the 6th percentile
...
S50. Speed between the 99th and the 100th percentile
Volumea Annual average daily traffic per link (AADT) Separately for each of the 50 speed scenarios:
V1. Volume up to the 25th percentile
V2. Volume between the 26th and the 50th percentile
V3. Volume between the 51st and the 75th percentile
V4. Volume over the 76th percentile
Curvature C1. Links with multiple and/or sharp curves (Curve) C1. Segments that above 50% of their radius measurements are lower
C2. Links that above 50% of their radius measurements are than 2000 m (Curve)
equal with 2000 m (Straight) C2. Segments that above 50% of their radius measurements are equal
with 2000 m (Straight)
Gradient G1. Links with median gradient above 0.5% (Uphill) G1. Segments that have more gradient measurements above 0.5% than
G2. Links with median gradient below −0.5% (Downhill) below 0.5% (Uphill)
G3. Links with median gradient between ±0.5% (Level) G2. Segments that have more gradient measurements below −0.5%
than above −0.5% (Downhill)
G3. Segments that have more gradient measurements between ±0.5%
than above -0.5% and below 0.5% (Level)
Lanes L1. Links that above 50% of their sections include more than L1. Sections with more than two lanes (Lanes above 2)
two lanes (Lanes above 2) L2. Sections with up to two lanes (Lanes up to 2)
L2. Links that above 50% of their sections include up to two
lanes
(Lanes up to 2)
a
Classification was based on the weighted speed and volume (Sw and Vw ; see Eqs. (1) and (2)).
considerable road length, it can be understood that both the traf- 3.2.1. Traffic conditions identification
fic conditions and the geometric configuration of each link can The final condition-based dataset included all the possible com-
only be partially represented by single measures per link. Traffic binations of pre-crash-condition scenarios and the crash counts per
conditions were expressed by annual averages, while road geom- scenario. As the scope of the creation of the alternative dataset was
etry was represented by categorical variables. A more detailed the representation of the conditions on the roadway just before
description of the variables can be found in Table 1. After the exclu- crashes, all the examined crashes were matched with a set of traffic
sion of the links with missing traffic or geometry data the final and geometric conditions based on the geocoded crash locations.
link-based dataset included 2356 observations (i.e., links) that rep- The pre-crash traffic conditions on the crash location were iden-
resent overall 9028 crashes. Crash counts were divided by severity tified based on the reported crash date and time. In order to have
into crashes with Killed or Serious injuries (henceforth: KS) and a comparable set of measurements for all crashes, each crash was
crashes with Slight injuries (henceforth: SL). The split between the matched with traffic data equivalent to 15 min of measurements.
two severity categories was 1268 and 7760 for KS and SL crashes, Therefore, the speed (Sw ) and volume (Vw ) were estimated using a
respectively. weighted average of the 15-min interval that includes the time of
the crash (second interval) and its precedent (first interval).
t t
Sw = Ssecond + 1 − Sfirst (1)
3.2. Condition-based dataset 15 15
t t
A pre-crash condition-based dataset (henceforth: condition- Vw = Vsecond + 1 − Vfirst (2)
15 15
based dataset) consists of every possible combination/scenario of
traffic and geometric conditions that could ever be present on the where Sw and Vw : weighted average of speed (km/h) and volume
network just before a crash (limited to the examined variables and (vehicles), Sfirst and Vfirst : speed (km/h) and volume (vehicles) mea-
their specifications). Each scenario is matched with a number of surements of the first interval, Ssecond and Vsecond : speed (km/h) and
crashes (from zero to, theoretically, all the crashes of the database)
that were found to occur under this particular combination of traffic
and geometry conditions. Condition-based modelling attempts to
represent the actual crash-related traffic and geometry conditions.
In contrast to the link-based approach, the crashes that belong to
the same condition scenario are spatially and temporally indepen-
dent. Instead, they are similar in the sense that when they occurred
the external circumstances on the road were approximately the
same. Assuming that some or all of these circumstances might be
related with the crash occurrences, the concentration (or absence)
of crashes in some particular combinations should provide useful
insights about crash triggering factors.
The formation of the condition-based dataset is quite com-
plex relatively to the link-based dataset. Fig. 3 presents a simple
flowchart describing the main processes to develop the condition-
based dataset consisting of Nmax crashes. Each step is explained in
detail below. Fig. 3. Flow chart of the condition-based dataset development process.
M.-I.M. Imprialou et al. / Accident Analysis and Prevention 86 (2016) 173–185 177
Fig. 5. Road length upstream and downstream of a crash location for defining the road geometry that is considered for each crash.
in Table 1 curvature and lanes are divided into two categories (i.e. et al., 1992; Navon, 2003). The number of vehicle encounters at a
Lcurvature = Llanes = 2) and gradient into three (i.e. Lgradient = 3). Using particular condition scenario increases as the number of vehicles
Eq. (4) the number of scenarios (S) was estimated to be: and the duration of their stay under these conditions raise. In order
to control for this effect, the offset variable for the condition-based
S = Kspeed · Kvolume · Lcurvature · Lgradient · Llanes dataset was set to be the average vehicle-hours per kilometre for
= 50 · 4 · 2 · 3 · 2 = 2400 (5) each scenario. Vehicle hours per kilometre were estimated by mul-
tiplying the mean of all the travel time per kilometre measurements
of a scenario with the corresponding average volume.
Overall, the spreadsheet contained the 2400 unique combina-
tions of pre-crash scenarios (e.g. speed is between the 40th and 4. Methodology
the 42nd percentile with the median value of 93 km/h, the vol-
ume is between the 50th and the 75th percentile for these speed Despite the difference in data generating mechanism, both the
conditions with median 112 veh/lane, on a straight and downhill link-based and the condition-based are count datasets. Poisson
section with up to two lanes). The distinct values of each cate- regression and its extensions is the most suitable family of mod-
gorical or continuous variable had equal frequency with the other els for modelling crash counts, in terms of statistical properties
values of this variable (e.g., 800 scenarios were on uphill segments, (Lord and Mannering, 2010). One of the ways to control for over-
800 scenarios on downhill and 800 scenarios on level). Each crash dispersion (i.e., variance of the dependent variable is higher than
was classified to one of these scenarios with respect to its traffic its mean), that appears practically to most count datasets due to
and geometric conditions and the severity of its consequences. The heterogeneity, is to add a random effect to the Poisson regression
final output of this process was a dataset with 2400 observations model. When the Poisson parameter is lognormally distributed the
that represent all crash counts by severity (i.e., KS and SL). Table 2 regression model transforms to a Poisson lognormal (PLN). The PLN
presents the summary statistics of the explanatory variables of both model was found to be adequate for the data at hand, since the
the datasets. maximum percentage of zeros was 65% and the skewness for all
the datasets was below 3.0 (Vangala, 2015; Vangala et al., 2015).
3.3. Exposure The main objective of this paper is the examination of the rela-
tionship of speed with motorway crashes for two severity levels.
In order to enable meaningful comparisons in terms of crash risk Different crash types cannot be considered independent of each
between the observations of crash models it is necessary to take other and modelled as such because they are both subsets of the
into account one exposure variable. The use of an offset in a count total crashes on a road network (Park and Lord, 2007). For simulta-
model indirectly transforms the dependent variable from a number neous modelling two or more crash categories multivariate Poisson
of events to a rate of events per the exposure measure. Exposure lognormal (MVPLN) regression is proposed. MVPLN controls simul-
in link-based approaches attempts to express the total amount of taneously for over-dispersion and the correlations between the
travel on each link. The most appropriate measures of exposure for categories (Aguero-Valverde and Jovanis, 2009; El-Basyouny and
link-based modelling have been broadly discussed in the literature Sayed, 2009; Ma et al., 2008; Park and Lord, 2007).
(e.g. Qin et al., 2004; Pei et al., 2012; Lord et al., 2005a) as there is The observations of the link-based dataset cannot be consid-
a plurality of surrogate measures of exposure such as link length, ered as spatially independent. Consequently, in the link-based
average annual daily traffic, vehicle-miles travelled, vehicle-hours model the effects of unobserved spatial relationships between adja-
travelled, etc. Link length, that is one of the most commonly used cent segments should be taken into account by adding a random
exposure variables in crash analyses, was employed for the link- effect using a multivariate conditional autoregressive priors (CAR)
based model in this paper. model in a hierarchical Bayesian approach (Aguero-Valverde, 2013;
The way of expressing exposure in a condition-based approach Barua et al., 2014). As mentioned above, the observations of the
is similar, however not identical. The condition-based dataset that condition-based dataset are not spatial entities and thus at this case
is developed here divides traffic conditions based on the percentiles unobserved spatial correlation does not need to be considered. The
of their occurrence on the entire network (Table 1). In other words, models below are presented including the random effect for spa-
in terms of the traffic conditions, all scenarios had equal occur- tial correlation that should be taken as zero for the condition-based
rence frequency on the study network during the study period. The dataset.
fact that all the scenarios are equally likely to occur, though, does For a crash count dataset containing n observations (links or
not mean that they have equal crash probability, so the exposure pre-crash scenarios) the number of crashes by severity is Poisson
cannot be considered as uniform among condition scenarios. The distributed:
probability of crashes is proportional with the probability of crash
prone interactions between vehicles on the network (e.g. Chipman yik ∼Poisson(ik ), i = 1, 2, . . ., n k = 1, 2, . . ., K (6)
M.-I.M. Imprialou et al. / Accident Analysis and Prevention 86 (2016) 173–185 179
Table 2
Descriptive statistics of the variables of the link-based and the condition-based datasets.
Link-based Condition-based
Dependent variables
All crashes 3.83 4.34 0.00 36.00 3.88 6.23 0.00 77.00
KS crashes 0.54 0.94 0.00 7.00 0.55 1.07 0.00 10.00
SL crashes 3.29 3.88 0.00 36.00 3.33 5.54 0.00 72.00
Independent variables
Speed (km/h) 94.19 16.58 27.21 128.31 93.13 19.55 33.00 129.19
AADT (in Thousands) 28.8 1.80 01.1 107.1 – – – –
Speed*AADT 2856.08 1920.5 35.95 10,686 – – – –
(km/h*AADT)
Volume (15-min – – – – 114.36 95.53 6.07 304.23
period)
(vehicles/lane)
Speed*Volume – – – – 10,920.3 9640.23 436.97 30,741.4
(km/h* vehicles/lane)
Curvature
Curve 0.46 0.50 0.00 1.00 0.50 0.50 0.00 1.00
Straight 0.54 0.50 0.00 1.00 0.50 0.50 0.00 1.00
Gradient
Uphill 0.11 0.31 0.00 1.00 0.33 0.47 0.00 1.00
Downhill 0.48 0.50 0.00 1.00 0.33 0.47 0.00 1.00
Even 0.41 0.49 0.00 1.00 0.33 0.47 0.00 1.00
Number of lanes
Lanes above 2 0.32 0.47 0.00 1.00 0.50 0.50 0.00 1.00
Lanes up to 2 0.68 0.47 0.00 1.00 0.50 0.50 0.00 1.00
where i: index of observation, k: index of severity type, yik : observed boundary),or wij = 0 otherwise, ˝: variance–covariance matrix for
number of crashes of k severity for the ith observation and ik : the the spatial correlation.
expected mean of crashes of k severity for the for the ith observa- ⎛ ⎞
s2 s2 ... s2
tion. The expected mean ik is a function of the model’s explanatory
⎜ 11 12 1K
⎟
variables (link function): ⎜ s2 s2 ... s2 ⎟
⎜ 21 22 2K ⎟
˝=⎜ ⎟ (10)
⎜ .. .. .. .. ⎟
m ⎝. . . . ⎠
ln(ik ) = ˇk0 + ˇkm Xikm + ln(ei ) + εik + uik (7) s2 s2 ... s2
K1 K2 KK
m=1
As the direct computation of the marginal distribution of crash
counts is not possible, because it requires the computation of a
where ˇk0 : intercept for severity k, ˇkm : coefficient of the K-variate integral of the Poisson distribution with respect to εik ,
mth explanatory variable for severity k, Xikm : value of the mth the parameter estimation was done via Markov chain Monte Carlo
explanatory variable for the ith observation and severity k, ei : off- (MCMC) in a Bayesian framework (Barua et al., 2014; Ma et al., 2008;
set/exposure variable, εik : unobserved heterogeneity for the ith Park and Lord, 2007). The prior distribution for ˇ is:
observation and severity k and uik : random effect for the spatial cor-
ˇ∼MVN(ˇ0 , Rˇ0 ) (11)
relation between the ith observation and its neighbours for severity
k. In order to take into account the correlations within the unob- The conjugate prior distribution of the inverse of the variance-
served heterogeneity, εi has a multivariate normal distribution: covariance matrix for the heterogeneity an the spatial correlation
is usually Wishart (Aguero-Valverde and Jovanis, 2009; Aguero-
⎛ ⎞ Valverde, 2013; Barua et al., 2014; Ma et al., 2008; Park and Lord,
11 12 ··· 1K 2007):
⎜ ⎟ −1
⎜ 21 22 · · · 2K ⎟
εi ∼MVN(0, ˙), ˙=⎜
⎜
⎟
⎟ (8) ∼Wishart(R, d) (12)
⎝ ... ..
.
..
.
..
. ⎠
˝−1 ∼Wishart(S, d) (13)
K1 K2 ··· KK
where ˇ0 , Rˇ0 , R and S are known non-informative hyperparam-
eters and d is equal to the degrees of freedom (number of the
where ˙ is the variance–covariance matrix of the unobserved het- examined crash severity types: d = 2).
erogeneity.
The uik term as proposed by Besag (1974) is: 5. Estimation results
Table 3
Multivariate coefficient estimates for crashes with killed and serious injured (KS) and crashes with slightly injured (SL) for the link-based dataset.
burn-in sample. Convergence was visually detected from Markov AADT=20,000 AADT=40,000 AADT=60,000
chain history graphs of the models’ coefficients. The multivariate 0.80
models for both the link-based and the condition-based datasets 0.70
showed improved statistical fit (based on the Deviance Information
0.60
Criterion) compared to univariate models estimated by severity
Crashes /km
0.50
group.
As there is no clear evidence about the form of the relationship 0.40
2
(i) linear speed–linear volume, (ii) linear speed–logarithmic volume, (iii) lin-
ear speed–quadratic volume, (iv)logarithmic speed–linear volume, (v) logarithmic Sl crashes
speed–logarithmic volume, (vi) logarithmic speed–quadratic volume, (vii) quadratic
= exp(−0.0290 · Speed
Link Length
speed–linear volume, (viii) quadratic speed–logarithmic volume, (ix) quadratic
speed–quadratic volume. + 0.6848 · ln(AADT) + 0.6075) (15)
M.-I.M. Imprialou et al. / Accident Analysis and Prevention 86 (2016) 173–185 181
Table 4
Multivariate coefficient estimates for crashes with killed and serious injured (KS) and crashes with slightly injured (SL) crashes for the condition-based dataset.
12.00
10.00
Crashes/ km
8.00
6.00
4.00
2.00
0.00
0 10 20 30 40 50 60 70 80 90 100 110 120
Speed (km/h)
Fig. 7. Predicted SL crashes per kilometre as a function of speed for links with average annual daily traffic: (a) 20,000, (b) 40,000 and (c) 60,000.
Fig. 8. 3D contour plot of the predicted KS crashes per kilometre as a function of speed and average annual daily traffic.
182 M.-I.M. Imprialou et al. / Accident Analysis and Prevention 86 (2016) 173–185
Fig. 9. 3D contour plot of the predicted SL crashes per kilometre as a function of speed and average annual daily traffic.
V=50vehicles/lane V=100vehicles/lane V=150vehicles/lane Higher AADT is related with more crashes, however, considering
0.05 the estimated coefficients AADT has stronger impact on SL crashes
0.045 that on KS, a result that is in-line with most of the existing studies.
Crashes/ vehicle-hours per km
0.04 As for the geometrical features of the links, they mostly seem to
0.035 be statistically insignificant apart from links with more than two
0.03 lanes for all crashes and downhill links for SL crashes only. The use
0.025 of dummy variables for geometry could possibly affect the esti-
0.02 mated coefficients. However, the signs of the coefficients of the
0.015 most important variables (i.e., speed) did not change even when the
0.01 geometrical characteristics were represented by continuous vari-
0.005 ables, that is not presented due to brevity. These results possibly
0 indicate the inability of average measures of time-varying variables
0 10 20 30 40 50 60 70 80 90 100 110 120 that are frequently used in the link-based approaches to accurately
Speed (km/h) explain the variation in crashes and that this inefficiency might
have a direct impact on the modelling results.
Fig. 10. Predicted KS crashes per vehicle-hours travelled as a function of speed for
15-min volume per lane: (a) 50 vehicles, (b) 100 vehicles and (c) 150 vehicles. On the other hand, the outcomes of the condition-based models
are quite different (see Figs. 10–13). Speed was found to be pro-
portional with both crash frequencies (i.e., KS and SL crashes). The
Overall, the results of the link-based model were hard to inter- shape of the curves shows that the number of crashes increases
pret and to a certain extent counterintuitive (see Figs. 6–9). Speed proportionally with speed until a point (e.g. 85 km/h at a volume of
was found to be inversely proportional with all crashes. Although 100 vehicles/lane) and then either it stabilises or decreases. This can
some other studies have presented similar findings (Baruya, 1998; be potentially explained by the decrease of crash prone reactions
Lave, 1985), none of the researchers has given a very good expla- that increase while speed reaches very high values (Navon, 2003).
nation of why higher average speeds are overall safer. Some of Comparing the maxima of the curves between Figs. 10 and 11 it can
the main arguments to support this idea are the increased design be seen that, not surprisingly, crashes which occur under higher
standards of high speed motorways and the longer available dis- speed conditions tend to have more serious outcomes; a finding
tances between vehicles at high speed conditions. However, the that is consistent with the literature (e.g., Kloeden et al., 1997; Pei
vast majority of studies that examined the number of crashes before et al., 2012). The KS and SL crash rates for the reference cases of cat-
and after speed limit changes (consequently changes in average egorical independent variables, i.e., Curve = 0, Uphill = Downhill = 0,
speed) suggest that higher speeds are related to more crashes (e.g., Lanes above 2 = 0; see Table 4):
Elvik et al., 2004). KS crahes
= exp(0.0241 · Speed − 0.00014 · Speed2
VehHours per km
V=50vehicles/lane V=100vehicles/lane V=150vehicles/lane
− 0.0204 · Volume + 0.00004 · Volume2
0.18
+ 0.00002 · Speed · Volume − 3.24) (16)
Crashes /vehicle hours per km
0.16
0.14
0.12
Sl crashes
0.1 = exp(0.036 · Speed − 0.0002 · Speed2
0.08
VehHours per km
0.06 − 0.0076 · Volume + 0.000025 · Volume2
0.04
0.02
− 0.00003 · Speed · Volume − 3.01) (17)
0
0 10 20 30 40 50 60 70 80 90 100 110 120
An interesting finding of the condition-based model was that
Speed (km/h)
the frequency of crashes is higher at low volume conditions than
Fig. 11. Predicted SL crashes per vehicle-hours travelled as a function of speed for that of at high volume conditions, ceteris paribus More specifically,
15-min volume per lane: (a) 50 vehicles, (b) 100 vehicles and (c) 150 vehicles. the relationship between crash rate and volume is described as an
M.-I.M. Imprialou et al. / Accident Analysis and Prevention 86 (2016) 173–185 183
Fig. 12. 3D contour plot of the predicted KS crashes per vehicle hours travelled as a function of speed and volume per lane.
Fig. 13. 3D contour plot of the predicted SL crashes per vehicle hours travelled as a function of speed and volume per lane.
approximate U-shaped curve with the minimum crash rates were even on freeway segments (Park et al., 2010a). Another explana-
found to be at 241 and 211 vehicles per lane for KS and SL crashes tion could be that speeding and other risk-taking actions might be
respectively at average speed conditions (see Figs. 12 and 13). This more unlikely on curved sections. Vertical alignment of the road
outcome is consistent with the results for speed, because high vol- section just before a crash is found to be associated with more
ume is usually associated with congested, low speed conditions crashes. The existence of both positive and negative slope seems to
when crashes are less likely to be severe and reported (Lord, 2002). triggers crash occurrence although, based on the coefficient values,
Another explanation for this finding could be that low volumes the latter has higher impact. This outcome is in line with findings
can be related with higher speed variations (when traffic is build- of existing literature (e.g., Milton and Mannering, 1998). Finally,
ing up) that may increase the probability of crashes (Garber and roads with more than two lanes are related to lower crash counts
Ehrhart, 2000). This is because when the volume decreases drivers for all crash severities. This is similar to the findings of Ma and
have more freedom to choose their own speed and so speed pat- Kockelman (2006) who reported that the number of lanes decreases
terns on the roadway tend to be less uniform leading to more crash counts for non-fatal crashes and the results by Bonneson and
encounters between vehicles (Elvik et al., 2004). Additionally, low Pratt (2008) and Park et al. (2010b) who found that 6-lane free-
volumes occur more often during off-peak periods, such as night ways are less crash prone than 4 or 8-lanes but opposite to the
time, that is related to insufficient light conditions and extreme majority of current literature (Chang, 2005; Milton and Mannering,
driving behaviours (e.g. drinking and driving) that are also factors 1998). A possible explanation for that could be that wider roads
proved to trigger crash occurrence (Chang and Wang, 2006; Clarke allow more manoeuvres for crash avoidance during a crash-prone
et al., 2010; Jonah, 1986). encounter. Moreover, this result can also be explained by the inclu-
Curvature is not shown to have a statistically significant rela- sion of crashes that occurred on undivided (single) carriageways.
tionship with KS crashes but it increases the likelihood of SL crashes. Over half of the examined crashes occurred on A-roads that include
The finding for SL crashes is consistent with other studies on the some single carriageways which are related with hazardous vehi-
relationship of horizontal alignment with crashes (Ma et al., 2008; cle interactions that may lead to crashes with severe consequences
Milton and Mannering, 1998; Park et al., 2010a). However, the out- (e.g. head-on collisions).
come for KS crashes is not expected as literature suggests that Considering the variations between the results of the two mod-
curvature is associated with higher crash severity (Geedipally et al., els, it is clear that aggregation bias that occurs at link-based
2013; Ma and Kockelman, 2006). The high design standards of approaches might lead to significant errors, meaning that the data
the study area could be a possible explanation why curvature is aggregation concept plays a major role to the outcomes of safety
not statistically significant for KS crashes (i.e. small radius curves analyses. This subject has been disregarded by most researchers,
are relatively rare for motorways and major A-roads) although it who mainly focused their research on developing more advanced
has been suggested that curvature is linked with more crashes statistical models; however it seems that the way crash data are
184 M.-I.M. Imprialou et al. / Accident Analysis and Prevention 86 (2016) 173–185
prepared for the statistical analysis is important too. The link-based Acknowledgements
and the condition-based models cannot be directly compared to
each other neither using goodness-of-fit statistics nor based on The authors would like to gratefully thank Prof. Benjamin Hey-
the interpretability of their outcomes. However, it can be argued decker and Prof. Mike Maher of University College London for their
that the condition-based model gives a significantly more accu- thoughts and comments during the development of this work.
rate representation of the crash-related conditions and so its This research was partially funded by a grant from the UK Engi-
results apart from being more reasonable might be also more neering and Physical Sciences Research Council (EPSRC) (Grant
reliable. reference: EP/F018894/1). The authors take full responsibility for
the content of the paper and any errors or omissions. Research
data for this paper are available upon request from Maria-Ioanna
7. Conclusions Imprialou.
Geedipally, S., Bonneson, J., Pratt, M., Lord, D., 2013. Severity distribution functions Miaou, S.P., Lum, H., 1993. Modeling vehicle accidents and highway geometric
for freeway segments. Transp. Res. Rec.: J. Transp. Res. Board, 19–27. design relationships. Accid. Anal. Prev. 25, 689–709.
Guo, F., Wang, X., Abdel-Aty, M.a, 2010. Modeling signalized intersection safety Milton, J., Mannering, F., 1998. The relationship among highway geometrics,
with corridor-level spatial correlations. Accid. Anal. Prev. 42, 84–92. traffic-related elements and motor-vehicle accident frequencies.
Highways Agency, 2008. Pavement design and maintenance. In: Design Manual for Transportation 25, 395–413.
Roads and Bridges. Department for Transport, UK. Munden, J.M., 1967. The Relation Between a Driver’s Speed and His Accident Rate.
Highways Agency, 2011. HATRIS JTDB Reference Manual. Road Research Laboratory, Ministry of Transport, Crowthorne, England.
Hossain, M., Muromachi, Y., 2013. Understanding crash mechanism on urban Navon, D., 2003. The paradox of driving speed: two adverse effects on highway
expressways using high-resolution traffic data. Accid. Anal. Prev. 57, 17–29. accident rate. Accid. Anal. Prev. 35, 361–367.
Imprialou, M.-I., Quddus, M., Pitfield, D., 2015. Multilevel logistic regression Pande, A., Abdel-Aty, M., 2005. A freeway safety strategy for advanced proactive
modeling for crash mapping in metropolitan areas. In: Transportation traffic management. J. Intell. Transp. Syst.: Technol. Plann. Oper. 9,
Research Board 94th Annual Meeting, Washington, DC. 145–158.
Imprialou, M.-I.M., Quddus, M., Pitfield, D.E., 2014. High accuracy crash mapping Park, B.-J., Fitzpatrick, K., Lord, D., 2010a. Evaluating the effects of freeway design
using fuzzy logic. Transp. Res. Part C: Emerg. Technol. 42, 107–120. elements on safety. Transp. Res. Rec.: J. Transp. Res. Board,
IRTAD – International Traffic Safety Analysis Group, 2014. Road Safety Annual 58–69.
Report 2014. International Transport Forum. Park, B.-J., Fitzpatrick, K., Lord, D., 2010b. Evaluating the effects of freeway design
Ivan, J.N., Wang, C., Bernardo, N.R., 2000. Explaining two-lane highway crash rates elements on safety. Transp. Res. Rec.: J. Transp. Res. Board 2195,
using land use and hourly exposure. Accid. Anal. Prev. 32, 787–795. 58–69.
Joksch, H.C., 1975. An empirical realation between fatal accident involvement per Park, E.S., Lord, D., 2007. Multivariate Poisson-lognormal models for jointly
accident involvement and speed. Accid. Anal. Prev. 7, 129–132. modeling crash frequency by severity. Transp. Res. Rec.: J. Transp. Res. Board
Jonah, B.A., 1986. Accident risk and risk-taking behaviour among young drivers. 2019, 1–6.
Accid. Anal. Prev. 18, 255–271. Pei, X., Wong, S.C., Sze, N.N., 2012. The roles of exposure and speed in road safety
Kim, D.G., Lee, Y., Washington, S., Choi, K., 2007. Modeling crash outcome analysis. Accid. Anal. Prev. 48, 464–471.
probabilities at rural intersections: application of hierarchical binomial logistic Qin, X., Ivan, J.N., Ravishanker, N., 2004. Selecting exposure measures in crash rate
models. Accid. Anal. Prev. 39, 125–134. prediction for two-lane highway segments. Accid. Anal. Prev. 36,
Kloeden, C.N., McLean, A., Glonek, G., 2002. Reanalysis of Travelling Speed and the 183–191.
Risk of Crash Involvement in Adelaide South Australia, No CR 207. NHMRC Quddus, M., 2013. Exploring the relationship between average speed, speed
Road Accident Research Unit, The University of Adelaide. variation, and accident rates using spatial statistical models and GIS. J. Transp.
Kloeden, C.N., Mclean, A.J., Moore, V.M., Ponte, G., 1997. Travelling Speed and the Saf. Secur. 5, 27–45.
Risk of Crash Involvement Volume 1 – Findings, No CR 172. NHMRC Road Quddus, M.A., 2008. Modelling area-wide count outcomes with spatial correlation
Accident Research Unit, The University of Adelaide, South Australia. and heterogeneity: an analysis of London crash data. Accid. Anal. Prev. 40,
Kockelman, K.M., Ma, J., 2007. Freeway speeds and speed variations preceding 1486–1497.
crashes, within and across lanes. Transp. Res. Forum 46, 43–61. Shankar, V., Mannering, F., Barfield, W., 1995. Effect of roadway geometrics and
Lave, C., 1985. Speeding, coordination, and the 55 MPH limit. Am. Econ. Assoc. 75, environmental factors on rural freeway accident frequencies. Accid. Anal. Prev.
1159–1164. 27, 371–389.
Lord, D., 2002. Issues related to the application of accident prediction models for Solomon, D., 1964. Accidents on Main Rural Highways Related to Speed, Driver and
computation of accident risk on transportation networks. Transp. Res. Rec. Vehicle. US Department of Commerce, Federal Bureau of Highways,
1784, 17–26. Washington, DC.
Lord, D., Manar, A., Vizioli, A., 2005a. Modeling crash-flow-density and Spiegelhalter, D., Thomas, A., Best, N., Lunn, D., 2003. WinBUGS User Manual.
crash-flow-V/C ratio relationships for rural and urban freeway segments. Stuster, J., 2004. Aggressive Driving Enforcement: Evaluations of Two
Accid. Anal. Prev. 37, 185–199. Demonstration Programs, Report DOT HS 809 707. Washington, DC.
Lord, D., Mannering, F., 2010. The statistical analysis of crash-frequency data: a Taylor, M.C., Lynam, D.A., Baruya, A., 2000. The Effects of Drivers’ Speed on the
review and assessment of methodological alternatives. Transp. Res. Part A: Frequency of Road Accidents. Transport Research Laboratory, Crowthorne,
Policy Pract. 44, 291–305. England.
Lord, D., Washington, S.P., Ivan, J.N., 2005b. Poisson, Poisson-gamma and Vangala, P., Master of Science Thesis 2015. Negative Binomial-Generalized
zero-inflated regression models of motor vehicle crashes: balancing statistical Exponential Distribution: Generalized Linear Model and Its Applications.
fit and theory. Accid. Anal. Prev. 37, 35–46. Zachry Department of Civil Engineering, Texas A&M University, College
Ma, J., Kockelman, K.M., 2006. Bayesian multivariate Poisson regression for models Station, TX.
of injury count, by severity. Transp. Res. Rec. 1950, 24–34. Vangala, P., Lord, D., Geedipally, S.R., 2015. Exploring the application of the
Ma, J., Kockelman, K.M., Damien, P., 2008. A multivariate Poisson-lognormal negative binomial – generalized exponential model for analyzing traffic crash
regression model for prediction of crash counts by severity, using Bayesian data with excess zeros. Anal. Methods Accid. Res. 7, 29–36.
methods. Accid. Anal. Prev. 40, 964–975. WHO, 2013. WHO Global Status Report on Road Safety 2013: Supporting a Decade
Mannering, F.L., Bhat, C.R., 2014. Analytic methods in accident research: of Action. World Health Organization.
methodological frontier and future directions. Anal. Methods Accid. Res. 1,
1–22.