0% found this document useful (0 votes)
8 views13 pages

Impri A Lou 2016

Uploaded by

maria
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views13 pages

Impri A Lou 2016

Uploaded by

maria
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Accident Analysis and Prevention 86 (2016) 173–185

Contents lists available at ScienceDirect

Accident Analysis and Prevention


journal homepage: www.elsevier.com/locate/aap

Re-visiting crash–speed relationships: A new perspective in crash


modelling
Maria-Ioanna M. Imprialou a,∗ , Mohammed Quddus a , David E. Pitfield a , Dominique Lord b

a
School of Civil and Building Engineering, Loughborough University, Loughborough LE11 3TU, United Kingdom
b
Zachry Department of Civil Engineering, Texas A&M University, College Station, TX 3136, United States

a r t i c l e i n f o a b s t r a c t

Article history: Although speed is considered to be one of the main crash contributory factors, research findings are
Received 21 April 2015 inconsistent. Independent of the robustness of their statistical approaches, crash frequency models typ-
Received in revised form 20 August 2015 ically employ crash data that are aggregated using spatial criteria (e.g., crash counts by link termed
Accepted 1 October 2015
as a link-based approach). In this approach, the variability in crashes between links is explained by
Available online 11 November 2015
highly aggregated average measures that may be inappropriate, especially for time-varying variables
such as speed and volume. This paper re-examines crash–speed relationships by creating a new crash
Keywords:
data aggregation approach that enables improved representation of the road conditions just before
Traffic speed
Crash frequency
crash occurrences. Crashes are aggregated according to the similarity of their pre-crash traffic and geo-
Crash severity metric conditions, forming an alternative crash count dataset termed as a condition-based approach.
Pre-crash conditions Crash–speed relationships are separately developed and compared for both approaches by employing
Multivariate Poisson lognormal regression the annual crashes that occurred on the Strategic Road Network of England in 2012. The datasets are
Multivariate spatial correlation modelled by injury severity using multivariate Poisson lognormal regression, with multivariate spatial
effects for the link-based model, using a full Bayesian inference approach. The results of the condition-
based approach show that high speeds trigger crash frequency. The outcome of the link-based model is the
opposite; suggesting that the speed–crash relationship is negative regardless of crash severity. The differ-
ences between the results imply that data aggregation is a crucial, yet so far overlooked, methodological
element of crash data analyses that may have direct impact on the modelling outcomes.
© 2015 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license
(http://creativecommons.org/licenses/by/4.0/).

1. Introduction rarely possible due to the limited data availability. As a conse-


quence, the crashes of a road network are usually analysed in a way
The primary objective of developing a traffic crash model is that their volume is reduced while they remain informative (Lord
to elucidate the association between crashes and their poten- and Mannering, 2010). The main crash aggregation method is based
tial contributory factors so as to formulate efficient and targeted on topological and temporal criteria. In the so-called link-based (or
crash mitigating measures. The accuracy of the modelling out- segment-based) approach the counts of crashes that occurred on
comes is therefore critical for inappropriate decisions to be avoided. pre-defined road links during a certain time period are modelled
Motorway crashes appear to have a decreasing trend, especially in against explanatory variables that represent the average conditions
western countries; however, the number of casualties is still any- on each link (e.g. speed, traffic flow, road geometry). The explana-
thing but negligible (IRTAD, 2014; WHO, 2013). The question then tory power of these approaches, in terms of statistical methodology,
arises: are the crash models we currently use accurate enough to has evolved over the years, reaching high levels of sophistication
develop appropriate preventive measures? and offering better understanding of traffic crashes (e.g. Abdel-Aty
Each crash is the outcome of a unique sequence of events related and Radwan, 2000; Lord and Mannering, 2010; Ma et al., 2008;
to the involved driver(s), vehicle(s) and the road environment. The Mannering and Bhat, 2014). Despite the fact that the link-based
in-depth examination of individual crashes one-by-one though, is approach is straightforward and simple, it is also by default linked
with aggregation problems or else with the information loss that
is aroused when multiple values are represented by a single mea-
∗ Corresponding author. sure (Black et al., 2009; Clark and Avery, 1976; Davis, 2004). This
E-mail address: [email protected] (M.-I.M. Imprialou). limitation may affect the models’ explanatory potential, especially

http://dx.doi.org/10.1016/j.aap.2015.10.001
0001-4575/© 2015 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
174 M.-I.M. Imprialou et al. / Accident Analysis and Prevention 86 (2016) 173–185

such as speed, volume and geometric configuration. The grouping


attribute of crashes in the proposed method is the similarity of pre-
crash conditions rather than a link-level spatial relationship. In this
way, crash counts are represented more precisely by explanatory
variables that approximate the actual conditions enabling, possibly,
improved relationships. The condition-based dataset can be mod-
elled using multivariate Poisson lognormal regression. In order to
compare the two methods with respect to their outcomes, the same
data are also used to build a link-based spatial multivariate Poisson
lognormal regression model.

2. Literature review

A considerable amount of literature has been published on the


impact of various traffic and geometric road characteristics on
link-based crash frequency. Among others speed, traffic volume,
number of lanes, gradient and horizontal curvature are widely stud-
ied. From a qualitative point of view, findings show that although
Fig. 1. Frequency and cumulative distribution of the 15-min speed at the time and crash severity is positively correlated with driving speed (Clarke
the location of the crash by the annual average of the speed on this link.
et al., 2010; Joksch, 1975; Kloeden et al., 1997; Pei et al., 2012),
the relationship between speed and crash frequency is not equally
for time-varying independent variables (e.g. speed, traffic volume)
straightforward (Aarts and Van Schagen, 2006). The early study of
as their spatial and temporal variations within a link cannot be
Solomon (1964) was the first to suggest that speed and crash fre-
captured.
quency are not proportional but their relationship can be described
Speed is regarded as one of the main traffic related crash con-
by a “U-shaped” curve; an idea that was supported by several other
tributory factors (Abdel-aty et al., 2005; Elvik et al., 2004), but
researchers (e.g., Munden, 1967; Cirillo, 1968). Solomon’s curve
research findings do not confirm this unanimously. The inconsis-
implies that only extremely low and high speed conditions trig-
tency between the results could be partially due to the inadequacy
ger crashes. However, most of the subsequent studies find driving
of annual average speed to represent the speeds at which crashes
speeds to be linearly or exponentially related to crashes (Baruya
actually occurred. In fact, two crashes recorded on the same link
and Finch, 1994; Fildes et al., 1991; Kloeden et al., 2002, 1997;
may have occurred under entirely different traffic conditions but
Taylor et al., 2000). A few studies contradicted this view propos-
in a link-based approach they will be both explained by the
ing that the speed–crash relationship is negative (Baruya, 1998;
annual average speed on the link. This can be further explained
Stuster, 2004) and others reported that this relationship is statisti-
by Figs. 1 and 2. Fig. 1 shows the frequency and the cumulative dis-
cally insignificant (Garber and Gadiraju, 1989; Lave, 1985). Some of
tribution of the ratio of the actual speed at the crash location to the
the most recent papers that employed advanced statistical models
annual average speed on the corresponding road link for all 2012
did not find statistically significant relationships between speed
motorway crashes in England. Fig. 2 is the same for traffic volume.
and crashes (Kockelman and Ma, 2007; Quddus, 2013). Pei et al.
It is obvious that the ratios are considerably different from one for a
(2012) attempted to explain the results’ inconsistencies suggest-
high proportion of crashes (ratio = 1 means that crash speed or vol-
ing that the crash–speed relationship that is estimated by models
ume is equal with the respective annual average), confirming that
strongly depends on the selected measure of exposure; the rela-
the representation of time-varying measures by annual averages is
tionship was shown to be negative for distance-based exposure
rather inadequate in many cases.
(i.e., vehicle miles travelled) but positive for time-based exposure
This paper introduces a new crash data aggregation concept
(i.e. vehicle hours travelled).
termed as condition-based approach that aims to represent in more
The relationship of speed with crashes cannot be defined with-
detail the actual pre-crash conditions in order to explore the rela-
out taking into account the simultaneous effect of other traffic
tionship between motorway crashes and their contributory factors
characteristics such as traffic flow (Aarts and Van Schagen, 2006)
and vehicle occupancy (Garber and Subramanyan, 2001; Lord et al.,
2005a). High traffic flow (represented by AADT, hourly volume, etc.)
is generally considered to increase the risk of crashes (Abdel-Aty
and Radwan, 2000; Chang, 2005; Milton and Mannering, 1998).
On the contrary, lower flows have been also correlated with
higher speed variance that is also considered to be a significant
crash precursor (e.g., Garber and Ehrhart, 2000; Elvik et al., 2004).
The mechanism of its impact though is not explicitly explained
because of the lack of individual vehicle-level second-by-second
data that would lead to reliable estimations. Instead, current studies
employed relatively highly aggregated data that lead to inconclu-
sive results (Garber and Ehrhart, 2000; Kockelman and Ma, 2007;
Quddus, 2013; Solomon, 1964). Although seldom researched, vehi-
cle occupancy ratio was found to have a non-linear relationship
with the number of crashes (Garber and Subramanyan, 2001) and
was also dependent on the number of vehicles involved in the crash
(i.e., single- versus multi-vehicle crashes) (Lord et al., 2005a).
Road geometric design is also believed to be related with
Fig. 2. Frequency and cumulative distribution of the 15-min volume at the time and crash frequency on the roadway (AASHTO, 2010). High crash fre-
the location of the crash by the annual average of the volume on this link. quency is associated with high vertical grades (Anastasopoulos
M.-I.M. Imprialou et al. / Accident Analysis and Prevention 86 (2016) 173–185 175

and Mannering, 2009; Chang, 2005; Milton and Mannering, 1998; Road Accident Database of the United Kingdom (STATS 19) and
Shankar et al., 1995) and horizontal curvature (i.e. frequent and include 10,520 crashes that occurred during 2012 on the Strategic
sharp curves) (Abdel-Aty and Radwan, 2000; Anastasopoulos and Road Network (SRN) of England (Department for Transport, 2011).
Mannering, 2009; Ma et al., 2008; Milton and Mannering, 1998; The SRN consists of the main motorways and A-roads of the coun-
Shankar et al., 1995). The number of lanes is also linked with lane try and the total length is 6920 km (4272 miles). STATS 19 reports
changes that increase vehicle interactions and consequently the record crashes that accounted for at least one slight injury, along
number of crashes (Chang, 2005; Milton and Mannering, 1998); with information related to the crash and the involved parties (i.e.,
nevertheless, Ma and Kockelman (2006) report that wider roads drivers, casualties and vehicles). The variables that were used here
decreased the number of non-fatal crashes. were crash severity, date, time and location.
From a methodological perspective, during the last two decades Location is a crucial factor for crash analyses because it is closely
count models such as Poisson and Negative Binomial (NB) regres- related with the identification of the traffic and geometric condi-
sion (Lord and Mannering, 2010) as well as their various extensions tions that are related to a crash. When crash location data are not
are considered to have the most suitable statistical properties satisfactory in terms of accuracy, the application of crash mapping
for modelling crash count data that are usually characterised techniques has been shown to significantly change the results of
by low mean values, over-dispersion and heteroscedasticity (see crash analyses (Imprialou et al., 2015). The objective of crash map-
also Mannering and Bhat, 2014). The initial approaches employed ping is to determine a set of coordinates that represent the crash
fixed-parameters NB regression (Abdel-Aty and Radwan, 2000; location as precisely as possible. In STATS 19 reports, crash locations
Ivan et al., 2000; Lord et al., 2005b; Miaou and Lum, 1993; were less accurate than desired. Thus, crashes were reallocated to
Milton and Mannering, 1998; Shankar et al., 1995). More recent refined positions estimated by a fuzzy logic crash mapping algo-
studies are controlling for unobserved heterogeneity (such as rithm that was developed for the study area using distance, vehicle
spatial and temporal correlation) using random effects (Barua direction, road name and type. This provides a 98.9% (±1.1%) accu-
et al., 2014; Guo et al., 2010; Quddus, 2008) hierarchical (Kim rate matching score (Imprialou et al., 2014).
et al., 2007) or random-parameter models (Anastasopoulos and Traffic data were extracted from the UK Highways Agency
Mannering, 2009). Multivariate Poisson (Ma and Kockelman, 2006) Journey Time Database (JTDB) that includes link-level traffic infor-
and Poisson lognormal models (Aguero-Valverde and Jovanis, mation obtained by inductive loop detectors for the entire SRN
2009; El-Basyouny and Sayed, 2009; Ma et al., 2008; Park and Lord, (2505 links1 ) (Highways Agency, 2011). The measurement inter-
2007) are proposed for modelling simultaneously different crash val is 15 min resulting to a dataset of approximately 88 million
types (e.g., by level of severity and frequency simultaneously) in observations. The variables used for this analysis are average speed
order to control for the unobserved heterogeneity that arises from (km/h), volume (vehicles) and travel time (seconds) (Highways
the correlations between them. Agency, 2011). Road configuration was determined based on the UK
Crash counts in the majority of the studies are generated by Highways Agency Traffic Speed Condition Survey database (TRACS)
dividing the examined network into homogeneous links or seg- (Highways Agency, 2008). TRACS contains measurements of geo-
ments (i.e. link-based approach). This approach is logical and metric characteristics (i.e., radius and gradient) by survey vehicles
effective from a practical point of view as the traffic data are for the entire SRN divided into 10-m segments.
usually available at the link level. Nevertheless, it is a fact that The data were processed separately in order to produce the
both traffic and geometric conditions at the roadway may vary datasets. Although the two datasets stem from exactly the same
significantly even for adjacent parts of the same road (e.g. due databases, they represent the relationship of crashes with road-
to road topography and on-off ramps). Therefore, the assump- related variables from entirely different perspectives and sampling
tion of homogeneity of the conditions within links that include frames. The sampling frame of the link-based dataset consists of
up to several miles of roadway and sometimes both directions of road links that are actual spatial entities and is the conventional
traffic may not necessarily be true. Additionally, the characteris- approach for safety models. The sampling frame of the condition-
tic values used for each of the examined factors that are usually based dataset comprises of all the possible combinations of traffic
measures of central tendency may not be representative of the and geometric conditions; a set of abstract/non-physical attributes
actual conditions at the time and location of the crash. Studies that can potentially co-exist at the time and the location of a crash.
focusing on proactive crash prediction highlight that crashes are
related to suddenly developed and often extreme traffic conditions
3.1. Link-based dataset
(e.g., high and low speeds) that cannot be captured from aggre-
gated measures such as hourly or annual averages (Abdel-Aty and
A link-based dataset enlists the links that comprise a road net-
Pande, 2005; Hossain and Muromachi, 2013; Pande and Abdel-Aty,
work and the total number of crashes per link. The crashes occurred
2005). The use of these measures therefore leads to loss of informa-
on the link at different time points during the study period. Each
tion and under-representation of extreme conditions that may be
link contains information that represents the conditions on the
crucial in explaining crash occurrences. These limitations of link-
road defined by descriptive statistics (e.g. mean, median, maxi-
based crash modelling are likely to be reflected in the results of
mum, etc.). Based on this aggregation method, it is assumed that
analyses leading to the possibly erroneous and inconsistent con-
the triggering factors for crashes that occurred on the same link are
clusions.
similar, which of course might be not true for all the cases as shown
This paper attempts to address the above limitations using an
in Figs. 1 and 2.
alternative crash data aggregation method. Condition-based mod-
Based on the output of the crash mapping algorithm, each road
elling enables a more accurate representation of the conditions just
link was assigned with a number of crashes (crash counts varied
before crashes so as to shed more light on the relationship of traffic
from 0 to 36 per link) and one characteristic value represent-
speed with crash frequency.
ing speed, volume, curvature, gradient and the number of lanes.
Considering the dynamic nature of the traffic variables (i.e., speed
3. Data description and pre-processing
and volume) as well as the fact that a road link typically covers a
The generation of the crash datasets for both the link-based and
the condition-based approaches requires the merger of crash, traf-
fic and geometry data. Crash data were obtained from the National 1
Average link length 5.23 (±4.76) km.
176 M.-I.M. Imprialou et al. / Accident Analysis and Prevention 86 (2016) 173–185

Table 1
Definition of variables which are included in the link-based dataset and the condition-based dataset respectively.

Variable Link-based dataset Condition-based dataset

Speeda Annual average of measured speeds on each link (averaged S1. Speed up to 2nd percentile
over 35,040 records) S2. Speed between the 3nd and the 4th percentile
S3. Speed between the 5th and the 6th percentile
...
S50. Speed between the 99th and the 100th percentile
Volumea Annual average daily traffic per link (AADT) Separately for each of the 50 speed scenarios:
V1. Volume up to the 25th percentile
V2. Volume between the 26th and the 50th percentile
V3. Volume between the 51st and the 75th percentile
V4. Volume over the 76th percentile
Curvature C1. Links with multiple and/or sharp curves (Curve) C1. Segments that above 50% of their radius measurements are lower
C2. Links that above 50% of their radius measurements are than 2000 m (Curve)
equal with 2000 m (Straight) C2. Segments that above 50% of their radius measurements are equal
with 2000 m (Straight)
Gradient G1. Links with median gradient above 0.5% (Uphill) G1. Segments that have more gradient measurements above 0.5% than
G2. Links with median gradient below −0.5% (Downhill) below 0.5% (Uphill)
G3. Links with median gradient between ±0.5% (Level) G2. Segments that have more gradient measurements below −0.5%
than above −0.5% (Downhill)
G3. Segments that have more gradient measurements between ±0.5%
than above -0.5% and below 0.5% (Level)
Lanes L1. Links that above 50% of their sections include more than L1. Sections with more than two lanes (Lanes above 2)
two lanes (Lanes above 2) L2. Sections with up to two lanes (Lanes up to 2)
L2. Links that above 50% of their sections include up to two
lanes
(Lanes up to 2)
a
Classification was based on the weighted speed and volume (Sw and Vw ; see Eqs. (1) and (2)).

considerable road length, it can be understood that both the traf- 3.2.1. Traffic conditions identification
fic conditions and the geometric configuration of each link can The final condition-based dataset included all the possible com-
only be partially represented by single measures per link. Traffic binations of pre-crash-condition scenarios and the crash counts per
conditions were expressed by annual averages, while road geom- scenario. As the scope of the creation of the alternative dataset was
etry was represented by categorical variables. A more detailed the representation of the conditions on the roadway just before
description of the variables can be found in Table 1. After the exclu- crashes, all the examined crashes were matched with a set of traffic
sion of the links with missing traffic or geometry data the final and geometric conditions based on the geocoded crash locations.
link-based dataset included 2356 observations (i.e., links) that rep- The pre-crash traffic conditions on the crash location were iden-
resent overall 9028 crashes. Crash counts were divided by severity tified based on the reported crash date and time. In order to have
into crashes with Killed or Serious injuries (henceforth: KS) and a comparable set of measurements for all crashes, each crash was
crashes with Slight injuries (henceforth: SL). The split between the matched with traffic data equivalent to 15 min of measurements.
two severity categories was 1268 and 7760 for KS and SL crashes, Therefore, the speed (Sw ) and volume (Vw ) were estimated using a
respectively. weighted average of the 15-min interval that includes the time of
the crash (second interval) and its precedent (first interval).
 t   t

Sw = Ssecond + 1 − Sfirst (1)
3.2. Condition-based dataset 15 15
 t   t

A pre-crash condition-based dataset (henceforth: condition- Vw = Vsecond + 1 − Vfirst (2)
15 15
based dataset) consists of every possible combination/scenario of
traffic and geometric conditions that could ever be present on the where Sw and Vw : weighted average of speed (km/h) and volume
network just before a crash (limited to the examined variables and (vehicles), Sfirst and Vfirst : speed (km/h) and volume (vehicles) mea-
their specifications). Each scenario is matched with a number of surements of the first interval, Ssecond and Vsecond : speed (km/h) and
crashes (from zero to, theoretically, all the crashes of the database)
that were found to occur under this particular combination of traffic
and geometry conditions. Condition-based modelling attempts to
represent the actual crash-related traffic and geometry conditions.
In contrast to the link-based approach, the crashes that belong to
the same condition scenario are spatially and temporally indepen-
dent. Instead, they are similar in the sense that when they occurred
the external circumstances on the road were approximately the
same. Assuming that some or all of these circumstances might be
related with the crash occurrences, the concentration (or absence)
of crashes in some particular combinations should provide useful
insights about crash triggering factors.
The formation of the condition-based dataset is quite com-
plex relatively to the link-based dataset. Fig. 3 presents a simple
flowchart describing the main processes to develop the condition-
based dataset consisting of Nmax crashes. Each step is explained in
detail below. Fig. 3. Flow chart of the condition-based dataset development process.
M.-I.M. Imprialou et al. / Accident Analysis and Prevention 86 (2016) 173–185 177

crashes that occurred on uphill segments were considered those


that included more gradient measurements above 0.5% than below
0.5%, on downhill those that include more gradient measurements
below −0.5% than above −0.5% and otherwise on level segments.
The road width was represented with another dummy variable that
separated road segments with more than two lanes from segments
with up two lanes.

3.2.3. Final condition-based dataset


After each crash was matched with a set of traffic and geomet-
ric pre-crash conditions, the initial 10,520 crashes of the database
decreased to 9310 (1310 KS and 8000 SL crashes) due to missing or
illogical values in one or more variables. Crashes left in the analysis
were classified according to their prior conditions to a spreadsheet
Fig. 4. Crash distribution per minutes of the reported crash time (the horizontal
that included all the possible combinations of pre-crash conditions.
bar shows the expected percentage per minute group (1.67%) if the distribution of
crashes was, as expected, uniform). Apart from the crash data, to generate a condition-based dataset
it is necessary to employ all data that describe the conditions on
the network. The scenarios of a condition-based dataset should
volume (vehicles) measurements of the second interval, t: time dif- represent all the condition combinations that existed on the net-
ference between the start of the second interval and the reported work regardless of whether these were associated with crashes or
crash time (min). not. That is why before generating a condition-based dataset the
It is a fact that the resolution of the traffic data is not ideal for range and the distribution of the measurements of the explanatory
defining the exact traffic conditions just before the crashes; within variables that will be used should be known. The process of the
15 min traffic conditions can change on the roadway. Even so, the development of this dataset might not be the only way for building
traffic characteristics used here are significantly more represen- a condition-based dataset. However, the presentation and compar-
tative than annual averages that are typically used for link-based ison of different data combination methods fall out of the scope of
analyses. Moreover, it should be noted that the reported time of this paper.
crashes in the examined database tends to be rounded; an issue To facilitate controlling for the exposure, all the scenarios of
that is also reported by Kockelman and Ma (2007). In STATS19, crash the condition-based dataset were chosen to have equal likelihood
time is reported with an hours-minutes format (i.e. HH:MM). Fig. 4 of occurrence during the examined study period. To do this, the
presents the distribution of the second part of the reported time continuous variables that were included in the dataset (i.e., speed
(i.e. MM from 00 to 59) of the examined crashes. It can be seen that and volume) were divided into equal frequency groups defined
the distribution is clustered at the nearest 5’s. This data limitation by percentile ranges with a constant step n (e.g. from the Nth
shows that even if more disaggregated traffic data were available percentile to the (N + n)th percentile, from the (N + n)th percentile
(e.g., 1-min resolution), it would be necessary to consider a wider to the (N + 2n)th percentile, from the (N + 2n)th percentile to the
temporal interval per crash so as to capture the error of reporting (N + 3n)th percentile. . .). Each group was represented in the dataset
crash time. by a representative value (e.g., a central tendency measure). In this
way, for every continuous variable Ci there were a number of Ki
3.2.2. Geometrical conditions identification equally likely distinct groups of observations (where Ki = 100/n).
The configuration of the roadway a few metres before the crash Every discrete variable Dj had by default a number of categories,
location is probably also related with the crash occurrence. That is Lj . To develop a dataset that includes every possible variable com-
why the length of the road that was considered for each crash was bination the number of scenarios (S) that should be generated is:
defined by the stopping distance upstream of the identified crash
location on the link. Stopping distance was estimated based on the 
I 
J

annual average speed of motorways and A-roads separately using S= Ki Lj (4)


the following equation (Elvik et al., 2004): i=1 j=1

V02 The number of scenarios of the dataset can be empirically


SD = RD + BD = tr v0 + (3) adjusted so as to serve the analyses needs by selecting a smaller
2fk g
step n that decreases the number of scenarios and vice versa.
where SD : stopping distance (m), RD : reaction distance (m), BD : Traffic characteristics were grouped into categories of equal fre-
braking distance (m), tr : reaction time (here: 1.5 s), v0 : average quency. The speed groups were defined by dividing the cumulative
speed (m/s), V0 : average speed (km/h), fk : friction (here: 0.8, aver- speed distribution of the entire network into 50 equal parts (i.e.,
age tire on dry pavement), g: gravity acceleration (here: 9.8 m/s2 ). Kspeed = 50) with a 2-percentile step (i.e., nspeed = 2) (see Table 1).
Based on the above equation, the stopping distance was esti- Following, the volume, for each speed group separately, was split
mated 97 and 75 metres for motorways and A-roads respectively. into to the quartiles of its cumulative distribution (i.e., Kvolume = 4
To correct for errors in the crash location, the final road segment for and nvloume = 25). The number of groups was decided to be higher
each crash included the length of the stopping distance upstream for speed than for volume because this paper mainly focuses on the
of the crash location and 20 m downstream (error distance). Fig. 5 impact of speed on crashes. Some different combinations of num-
is a schematic illustration of the road segment that is considered bers of groups for speed and volume that have been attempted
for obtaining the geometrical conditions of each crash. The final (that are not presented here due to brevity) did not seem to sig-
road segments included a number of successive radius and gradi- nificantly change the modelling outcomes. Speed and volume per
ent measurements that were converted to categorical variables so group were represented by their medians. Other measures were
as to keep the number of scenarios of the final dataset relatively also tested such as the mean and the 85th percentile that did not
low. Thus, crashes were considered to occur on curves if the major- exhibit any statistical difference in the modelling results. To keep
ity (above 50%) of the radius measurements of the segment were the number of combinations relatively low, all the geometric vari-
less than 2000 m and on straight segments otherwise. Similarly, ables were represented by categorical variables. As it can be seen
178 M.-I.M. Imprialou et al. / Accident Analysis and Prevention 86 (2016) 173–185

Fig. 5. Road length upstream and downstream of a crash location for defining the road geometry that is considered for each crash.

in Table 1 curvature and lanes are divided into two categories (i.e. et al., 1992; Navon, 2003). The number of vehicle encounters at a
Lcurvature = Llanes = 2) and gradient into three (i.e. Lgradient = 3). Using particular condition scenario increases as the number of vehicles
Eq. (4) the number of scenarios (S) was estimated to be: and the duration of their stay under these conditions raise. In order
to control for this effect, the offset variable for the condition-based
S = Kspeed · Kvolume · Lcurvature · Lgradient · Llanes dataset was set to be the average vehicle-hours per kilometre for
= 50 · 4 · 2 · 3 · 2 = 2400 (5) each scenario. Vehicle hours per kilometre were estimated by mul-
tiplying the mean of all the travel time per kilometre measurements
of a scenario with the corresponding average volume.
Overall, the spreadsheet contained the 2400 unique combina-
tions of pre-crash scenarios (e.g. speed is between the 40th and 4. Methodology
the 42nd percentile with the median value of 93 km/h, the vol-
ume is between the 50th and the 75th percentile for these speed Despite the difference in data generating mechanism, both the
conditions with median 112 veh/lane, on a straight and downhill link-based and the condition-based are count datasets. Poisson
section with up to two lanes). The distinct values of each cate- regression and its extensions is the most suitable family of mod-
gorical or continuous variable had equal frequency with the other els for modelling crash counts, in terms of statistical properties
values of this variable (e.g., 800 scenarios were on uphill segments, (Lord and Mannering, 2010). One of the ways to control for over-
800 scenarios on downhill and 800 scenarios on level). Each crash dispersion (i.e., variance of the dependent variable is higher than
was classified to one of these scenarios with respect to its traffic its mean), that appears practically to most count datasets due to
and geometric conditions and the severity of its consequences. The heterogeneity, is to add a random effect to the Poisson regression
final output of this process was a dataset with 2400 observations model. When the Poisson parameter is lognormally distributed the
that represent all crash counts by severity (i.e., KS and SL). Table 2 regression model transforms to a Poisson lognormal (PLN). The PLN
presents the summary statistics of the explanatory variables of both model was found to be adequate for the data at hand, since the
the datasets. maximum percentage of zeros was 65% and the skewness for all
the datasets was below 3.0 (Vangala, 2015; Vangala et al., 2015).
3.3. Exposure The main objective of this paper is the examination of the rela-
tionship of speed with motorway crashes for two severity levels.
In order to enable meaningful comparisons in terms of crash risk Different crash types cannot be considered independent of each
between the observations of crash models it is necessary to take other and modelled as such because they are both subsets of the
into account one exposure variable. The use of an offset in a count total crashes on a road network (Park and Lord, 2007). For simulta-
model indirectly transforms the dependent variable from a number neous modelling two or more crash categories multivariate Poisson
of events to a rate of events per the exposure measure. Exposure lognormal (MVPLN) regression is proposed. MVPLN controls simul-
in link-based approaches attempts to express the total amount of taneously for over-dispersion and the correlations between the
travel on each link. The most appropriate measures of exposure for categories (Aguero-Valverde and Jovanis, 2009; El-Basyouny and
link-based modelling have been broadly discussed in the literature Sayed, 2009; Ma et al., 2008; Park and Lord, 2007).
(e.g. Qin et al., 2004; Pei et al., 2012; Lord et al., 2005a) as there is The observations of the link-based dataset cannot be consid-
a plurality of surrogate measures of exposure such as link length, ered as spatially independent. Consequently, in the link-based
average annual daily traffic, vehicle-miles travelled, vehicle-hours model the effects of unobserved spatial relationships between adja-
travelled, etc. Link length, that is one of the most commonly used cent segments should be taken into account by adding a random
exposure variables in crash analyses, was employed for the link- effect using a multivariate conditional autoregressive priors (CAR)
based model in this paper. model in a hierarchical Bayesian approach (Aguero-Valverde, 2013;
The way of expressing exposure in a condition-based approach Barua et al., 2014). As mentioned above, the observations of the
is similar, however not identical. The condition-based dataset that condition-based dataset are not spatial entities and thus at this case
is developed here divides traffic conditions based on the percentiles unobserved spatial correlation does not need to be considered. The
of their occurrence on the entire network (Table 1). In other words, models below are presented including the random effect for spa-
in terms of the traffic conditions, all scenarios had equal occur- tial correlation that should be taken as zero for the condition-based
rence frequency on the study network during the study period. The dataset.
fact that all the scenarios are equally likely to occur, though, does For a crash count dataset containing n observations (links or
not mean that they have equal crash probability, so the exposure pre-crash scenarios) the number of crashes by severity is Poisson
cannot be considered as uniform among condition scenarios. The distributed:
probability of crashes is proportional with the probability of crash
prone interactions between vehicles on the network (e.g. Chipman yik ∼Poisson(ik ), i = 1, 2, . . ., n k = 1, 2, . . ., K (6)
M.-I.M. Imprialou et al. / Accident Analysis and Prevention 86 (2016) 173–185 179

Table 2
Descriptive statistics of the variables of the link-based and the condition-based datasets.

Link-based Condition-based

Mean SD Min Max Mean SD Min Max

Dependent variables
All crashes 3.83 4.34 0.00 36.00 3.88 6.23 0.00 77.00
KS crashes 0.54 0.94 0.00 7.00 0.55 1.07 0.00 10.00
SL crashes 3.29 3.88 0.00 36.00 3.33 5.54 0.00 72.00
Independent variables
Speed (km/h) 94.19 16.58 27.21 128.31 93.13 19.55 33.00 129.19
AADT (in Thousands) 28.8 1.80 01.1 107.1 – – – –
Speed*AADT 2856.08 1920.5 35.95 10,686 – – – –
(km/h*AADT)
Volume (15-min – – – – 114.36 95.53 6.07 304.23
period)
(vehicles/lane)
Speed*Volume – – – – 10,920.3 9640.23 436.97 30,741.4
(km/h* vehicles/lane)
Curvature
Curve 0.46 0.50 0.00 1.00 0.50 0.50 0.00 1.00
Straight 0.54 0.50 0.00 1.00 0.50 0.50 0.00 1.00
Gradient
Uphill 0.11 0.31 0.00 1.00 0.33 0.47 0.00 1.00
Downhill 0.48 0.50 0.00 1.00 0.33 0.47 0.00 1.00
Even 0.41 0.49 0.00 1.00 0.33 0.47 0.00 1.00
Number of lanes
Lanes above 2 0.32 0.47 0.00 1.00 0.50 0.50 0.00 1.00
Lanes up to 2 0.68 0.47 0.00 1.00 0.50 0.50 0.00 1.00

where i: index of observation, k: index of severity type, yik : observed boundary),or wij = 0 otherwise, ˝: variance–covariance matrix for
number of crashes of k severity for the ith observation and ik : the the spatial correlation.
expected mean of crashes of k severity for the for the ith observa- ⎛ ⎞
s2 s2 ... s2
tion. The expected mean ik is a function of the model’s explanatory
⎜ 11 12 1K

variables (link function): ⎜ s2 s2 ... s2 ⎟
⎜ 21 22 2K ⎟
˝=⎜ ⎟ (10)
⎜ .. .. .. .. ⎟
 m ⎝. . . . ⎠
ln(ik ) = ˇk0 + ˇkm Xikm + ln(ei ) + εik + uik (7) s2 s2 ... s2
K1 K2 KK
m=1
As the direct computation of the marginal distribution of crash
counts is not possible, because it requires the computation of a
where ˇk0 : intercept for severity k, ˇkm : coefficient of the K-variate integral of the Poisson distribution with respect to εik ,
mth explanatory variable for severity k, Xikm : value of the mth the parameter estimation was done via Markov chain Monte Carlo
explanatory variable for the ith observation and severity k, ei : off- (MCMC) in a Bayesian framework (Barua et al., 2014; Ma et al., 2008;
set/exposure variable, εik : unobserved heterogeneity for the ith Park and Lord, 2007). The prior distribution for ˇ is:
observation and severity k and uik : random effect for the spatial cor-
ˇ∼MVN(ˇ0 , Rˇ0 ) (11)
relation between the ith observation and its neighbours for severity
k. In order to take into account the correlations within the unob- The conjugate prior distribution of the inverse of the variance-
served heterogeneity, εi has a multivariate normal distribution: covariance matrix for the heterogeneity an the spatial correlation
is usually Wishart (Aguero-Valverde and Jovanis, 2009; Aguero-
⎛ ⎞ Valverde, 2013; Barua et al., 2014; Ma et al., 2008; Park and Lord,
11 12 ··· 1K 2007):
⎜ ⎟ −1
⎜ 21 22 · · · 2K ⎟
εi ∼MVN(0, ˙), ˙=⎜


⎟ (8) ∼Wishart(R, d) (12)
⎝ ... ..
.
..
.
..
. ⎠
˝−1 ∼Wishart(S, d) (13)
K1 K2 ··· KK
where ˇ0 , Rˇ0 , R and S are known non-informative hyperparam-
eters and d is equal to the degrees of freedom (number of the
where ˙ is the variance–covariance matrix of the unobserved het- examined crash severity types: d = 2).
erogeneity.
The uik term as proposed by Besag (1974) is: 5. Estimation results

 The model presented in Eqs. (6)–(13) was fitted to both the


u w
jk jk ij ˝ link-based and the condition-based datasets using WinBUGS 1.4.3
uik |ujk ∼MVN , , i=
/ j (9)
wij wij (Spiegelhalter et al., 2003), an open-source software that is suit-
ik i
able for full Bayes model estimation using the Markov Chains
Monte Carlo (MCMC) method. The posterior distributions were
where wij : adjacency weight matrix that denotes wij = 1 if the obtained from 50,000 iterations of two Markov chains. The first
links i and j are first order neighbours (they share a common 20,000 iterations were discarded from the final estimations as the
180 M.-I.M. Imprialou et al. / Accident Analysis and Prevention 86 (2016) 173–185

Table 3
Multivariate coefficient estimates for crashes with killed and serious injured (KS) and crashes with slightly injured (SL) for the link-based dataset.

KS crashes Mean S.D. MC Error 2.50% 5.00% Median 95% 97.50%

Speed −0.0231* 0.0026 0.0001 −0.0280 −0.0272 −0.0264 −0.0187 −0.0179


Ln(AADT) 0.1310* 0.0636 0.0024 0.0046 0.0243 0.0484 0.2355 0.2547
Curve −0.0740 0.0720 0.0014 −0.2164 −0.1933 −0.1655 0.0439 0.0655
Straight (reference) 0.0000 – – – – – – –
Uphill −0.0763 0.1278 0.0016 −0.3285 −0.2889 −0.2417 0.1316 0.1700
Downhill −0.0094 0.0680 0.0011 −0.1420 −0.1205 −0.0959 0.1028 0.1254
Even (reference) 0.0000 – – – – – – –
Lanes above 2 0.2005* 0.0913 0.0025 0.0226 0.0509 0.0841 0.3502 0.3799
Lanes up to 2 (reference) 0.0000 – – – – – – –
Intercept −0.0551 0.2548 0.0105 −0.5703 −0.4820 −0.3792 0.3640 0.4458
Ln (Exposure) 1 Total link length

SL crashes Mean S.D. MC Error 2.50% 5.00% Median 95% 97.50%

Speed −0.0290* 0.0015 0.0001 −0.0321 −0.0315 −0.0309 −0.0266 −0.0260


Ln(AADT) 0.6848* 0.0410 0.0019 0.6058 0.6201 0.6347 0.7553 0.7699
Curve −0.0271 0.0385 0.0008 −0.1022 −0.0903 −0.0764 0.0362 0.0481
Straight (reference) 0.0000 – – – – – – –
Uphill 0.0728 0.0644 0.0010 −0.0537 −0.0330 −0.0095 0.1785 0.1989
Downhill 0.0814* 0.0365 0.0006 0.0100 0.0209 0.0339 0.1412 0.1528
Even (reference) 0.0000 – – – – – – –
Lanes above 2 0.1396* 0.0518 0.0017 0.0371 0.0538 0.0730 0.2247 0.2405
Lanes up to 2 (reference) 0.0000 – – – – – – –
Intercept 0.6075* 0.6429 0.4831 0.2716 0.3272 0.3927 0.8899 0.9459
Ln (Exposure) 1 Total link length
*
Statistically significant coefficients at the 95% credible interval.

burn-in sample. Convergence was visually detected from Markov AADT=20,000 AADT=40,000 AADT=60,000
chain history graphs of the models’ coefficients. The multivariate 0.80
models for both the link-based and the condition-based datasets 0.70
showed improved statistical fit (based on the Deviance Information
0.60
Criterion) compared to univariate models estimated by severity
Crashes /km

0.50
group.
As there is no clear evidence about the form of the relationship 0.40

between speed and crash occurrences, three different functional 0.30


forms in the link function was tested for both datasets: (a) a 0.20
linear (e.g. ˇ*Speed), (b) a logarithmic (e.g. ˇ*ln(Speed)) and (c) 0.10
a quadratic (e.g. ˇ1 *Speed + ˇ2 *Speed2 ). The same strategy was
0.00
applied to traffic volume. To control for a possible interaction 0 10 20 30 40 50 60 70 80 90 100 110 120 130
between speed and volume on crash frequency, a multiplicative Speed (km/h)
interaction term (i.e. speed*volume) was also investigated. This
results in a total of nine different specifications2 of the link func- Fig. 6. Predicted KS crashes per kilometre as a function of speed for links with
average annual daily traffic: (a) 20,000, (b) 40,000 and (c) 60,000.
tion. The functional form with the best goodness-of-fit statistic (i.e.
the functional form with the lowest Deviance Information Crite-
rion (DIC) score) is considered as the most accurate representation
of each dataset thus for brevity only these models are presented results of crash prediction models. The relationship between the
here. The best fitting specification for the condition-based model traffic variables and crashes cannot however be interpreted solely
was the quadratic speed and the quadratic volume along with their based on the signs of their coefficients due to the variable transfor-
interaction term and for the link-based linear speed and logarith- mations and the speed-volume interaction term. To facilitate the
mic volume without the interaction term. Tables 3 and 4 present interpretation of the outcomes, Figs. 6, 7, 10 and 11 provide a graph-
the posterior means, standard deviations, Monte Carlo error (MC ical representation of the crash rate as a function of speed for three
error) and the 95% credible intervals of the estimated coefficients. distinct volumes and the reference categories for geometry (i.e., for
The functional forms of the models are described in the next section. a straight (Curve = 0) and level segment (Downhill = Uphill = 0) with
2 or less lanes (lanes above 2 = 0)). Figs. 8, 9, 12 and 13 illustrate the
6. Discussion variations of crash rate as a function of the entire range of speed and
volume. The corresponding KS and SL crash rates for the link-based
From Tables 3 and 4, it can be seen that the results derived approach can be shown as follows:
from the two models are significantly different. The estimated KSI crashes
coefficients for some of the variables have different signs indicating = exp(−0.0231 · Speed
Link Length
that the data aggregation concept has a considerable impact on the
+ 0.1310 · ln(AADT) − 0.0551) (14)

2
(i) linear speed–linear volume, (ii) linear speed–logarithmic volume, (iii) lin-
ear speed–quadratic volume, (iv)logarithmic speed–linear volume, (v) logarithmic Sl crashes
speed–logarithmic volume, (vi) logarithmic speed–quadratic volume, (vii) quadratic
= exp(−0.0290 · Speed
Link Length
speed–linear volume, (viii) quadratic speed–logarithmic volume, (ix) quadratic
speed–quadratic volume. + 0.6848 · ln(AADT) + 0.6075) (15)
M.-I.M. Imprialou et al. / Accident Analysis and Prevention 86 (2016) 173–185 181

Table 4
Multivariate coefficient estimates for crashes with killed and serious injured (KS) and crashes with slightly injured (SL) crashes for the condition-based dataset.

KS crashes Mean S.D. MC Error 2.50% 5.00% Median 95% 97.5%

Speed 0.02414* 0.00897 0.00047 0.00720 0.01043 0.02356 0.04045 0.04281


Speed squared −0.00014* 0.00005 0.00000 −0.00025 −0.00024 −0.00014 −0.00006 −0.00004
Volume −0.02037* 0.00201 0.00009 −0.02417 −0.02363 −0.02036 −0.01695 −0.01621
Volume squared 0.00004* 0.00000 0.0000002 0.00003 0.00003 0.00004 0.00005 0.00005
Speed · Volume 0.00002* 0.00002 0.000001 −0.00001 0.000001 0.00002 0.00005 0.00006
Curve 0.08056 0.06534 0.00099 −0.04698 −0.02665 0.08047 0.18860 0.20870
Straight (reference) 0.0000 – – – – – – –
Uphill 2.12500* 0.15890 0.00441 1.82500 1.87000 2.12200 2.39400 2.45100
Downhill 2.95200* 0.15460 0.00451 2.65900 2.70400 2.94900 3.21400 3.27000
Even (reference) 0.0000 – – – – – – –
Lanes above 2 −0.64770* 0.06954 0.00099 −0.78550 −0.76190 −0.64710 −0.53300 −0.51090
Lanes up to 2 (reference) 0.0000 – – – – – – –
Intercept −3.23900* 0.43100 0.02135 −4.15100 −4.01600 −3.20200 −2.58600 −2.47900
Ln (Exposure) 1.0 Average vehicle hours travelled per kilometre by condition scenario

SL crashes Mean S.D. MC Error 2.50% 5.00% Median 95% 97.5%


*
Speed 0.03647 0.00607 0.00032 0.02309 0.02489 0.03656 0.04614 0.04772
Speed squared −0.00020* 0.00003 0.000002 −0.00027 −0.00026 −0.00020 −0.00014 −0.00013
Volume −0.00759* 0.00120 0.00006 −0.00986 −0.00959 −0.00758 −0.00552 −0.00518
Volume squared 0.00002* 0.000003 0.0000001 0.00002 0.00002 0.00002 0.00003 0.00003
Speed · Volume −0.00003* 0.00001 0.0000005 −0.00005 −0.00005 −0.00003 −0.00001 −0.00001
Curve 0.11740* 0.03773 0.00080 0.04327 0.05534 0.11770 0.17930 0.19080
Straight (reference) 0.0000 – – – – – – –
Uphill 2.25700* 0.07198 0.00222 2.11600 2.13800 2.25700 2.37500 2.39700
Downhill 2.91100* 0.07025 0.00225 2.77100 2.79300 2.91100 3.02500 3.04600
Even (reference) 0.0000 – – – – – – –
Lanes above 2 −0.32670* 0.03790 0.00076 −0.40100 −0.38910 −0.32680 −0.26460 −0.25260
Lanes up to 2 (reference) 0.0000 – – – – – – –
Intercept −3.00800* 0.28060 0.01458 −3.53300 −3.43900 −3.02900 −2.48500 −2.36200
Ln (Exposure) 1.0 Average vehicle hours travelled per kilometre by condition scenario
*
Statistically significant coefficients at the 95% credible interval.

AADT=20,000 AADT=40,000 AADT=60,000


14.00

12.00

10.00
Crashes/ km

8.00

6.00

4.00

2.00

0.00
0 10 20 30 40 50 60 70 80 90 100 110 120
Speed (km/h)

Fig. 7. Predicted SL crashes per kilometre as a function of speed for links with average annual daily traffic: (a) 20,000, (b) 40,000 and (c) 60,000.

Fig. 8. 3D contour plot of the predicted KS crashes per kilometre as a function of speed and average annual daily traffic.
182 M.-I.M. Imprialou et al. / Accident Analysis and Prevention 86 (2016) 173–185

Fig. 9. 3D contour plot of the predicted SL crashes per kilometre as a function of speed and average annual daily traffic.

V=50vehicles/lane V=100vehicles/lane V=150vehicles/lane Higher AADT is related with more crashes, however, considering
0.05 the estimated coefficients AADT has stronger impact on SL crashes
0.045 that on KS, a result that is in-line with most of the existing studies.
Crashes/ vehicle-hours per km

0.04 As for the geometrical features of the links, they mostly seem to
0.035 be statistically insignificant apart from links with more than two
0.03 lanes for all crashes and downhill links for SL crashes only. The use
0.025 of dummy variables for geometry could possibly affect the esti-
0.02 mated coefficients. However, the signs of the coefficients of the
0.015 most important variables (i.e., speed) did not change even when the
0.01 geometrical characteristics were represented by continuous vari-
0.005 ables, that is not presented due to brevity. These results possibly
0 indicate the inability of average measures of time-varying variables
0 10 20 30 40 50 60 70 80 90 100 110 120 that are frequently used in the link-based approaches to accurately
Speed (km/h) explain the variation in crashes and that this inefficiency might
have a direct impact on the modelling results.
Fig. 10. Predicted KS crashes per vehicle-hours travelled as a function of speed for
15-min volume per lane: (a) 50 vehicles, (b) 100 vehicles and (c) 150 vehicles. On the other hand, the outcomes of the condition-based models
are quite different (see Figs. 10–13). Speed was found to be pro-
portional with both crash frequencies (i.e., KS and SL crashes). The
Overall, the results of the link-based model were hard to inter- shape of the curves shows that the number of crashes increases
pret and to a certain extent counterintuitive (see Figs. 6–9). Speed proportionally with speed until a point (e.g. 85 km/h at a volume of
was found to be inversely proportional with all crashes. Although 100 vehicles/lane) and then either it stabilises or decreases. This can
some other studies have presented similar findings (Baruya, 1998; be potentially explained by the decrease of crash prone reactions
Lave, 1985), none of the researchers has given a very good expla- that increase while speed reaches very high values (Navon, 2003).
nation of why higher average speeds are overall safer. Some of Comparing the maxima of the curves between Figs. 10 and 11 it can
the main arguments to support this idea are the increased design be seen that, not surprisingly, crashes which occur under higher
standards of high speed motorways and the longer available dis- speed conditions tend to have more serious outcomes; a finding
tances between vehicles at high speed conditions. However, the that is consistent with the literature (e.g., Kloeden et al., 1997; Pei
vast majority of studies that examined the number of crashes before et al., 2012). The KS and SL crash rates for the reference cases of cat-
and after speed limit changes (consequently changes in average egorical independent variables, i.e., Curve = 0, Uphill = Downhill = 0,
speed) suggest that higher speeds are related to more crashes (e.g., Lanes above 2 = 0; see Table 4):
Elvik et al., 2004). KS crahes
= exp(0.0241 · Speed − 0.00014 · Speed2
VehHours per km
V=50vehicles/lane V=100vehicles/lane V=150vehicles/lane
− 0.0204 · Volume + 0.00004 · Volume2
0.18
+ 0.00002 · Speed · Volume − 3.24) (16)
Crashes /vehicle hours per km

0.16
0.14
0.12
Sl crashes
0.1 = exp(0.036 · Speed − 0.0002 · Speed2
0.08
VehHours per km
0.06 − 0.0076 · Volume + 0.000025 · Volume2
0.04
0.02
− 0.00003 · Speed · Volume − 3.01) (17)
0
0 10 20 30 40 50 60 70 80 90 100 110 120
An interesting finding of the condition-based model was that
Speed (km/h)
the frequency of crashes is higher at low volume conditions than
Fig. 11. Predicted SL crashes per vehicle-hours travelled as a function of speed for that of at high volume conditions, ceteris paribus More specifically,
15-min volume per lane: (a) 50 vehicles, (b) 100 vehicles and (c) 150 vehicles. the relationship between crash rate and volume is described as an
M.-I.M. Imprialou et al. / Accident Analysis and Prevention 86 (2016) 173–185 183

Fig. 12. 3D contour plot of the predicted KS crashes per vehicle hours travelled as a function of speed and volume per lane.

Fig. 13. 3D contour plot of the predicted SL crashes per vehicle hours travelled as a function of speed and volume per lane.

approximate U-shaped curve with the minimum crash rates were even on freeway segments (Park et al., 2010a). Another explana-
found to be at 241 and 211 vehicles per lane for KS and SL crashes tion could be that speeding and other risk-taking actions might be
respectively at average speed conditions (see Figs. 12 and 13). This more unlikely on curved sections. Vertical alignment of the road
outcome is consistent with the results for speed, because high vol- section just before a crash is found to be associated with more
ume is usually associated with congested, low speed conditions crashes. The existence of both positive and negative slope seems to
when crashes are less likely to be severe and reported (Lord, 2002). triggers crash occurrence although, based on the coefficient values,
Another explanation for this finding could be that low volumes the latter has higher impact. This outcome is in line with findings
can be related with higher speed variations (when traffic is build- of existing literature (e.g., Milton and Mannering, 1998). Finally,
ing up) that may increase the probability of crashes (Garber and roads with more than two lanes are related to lower crash counts
Ehrhart, 2000). This is because when the volume decreases drivers for all crash severities. This is similar to the findings of Ma and
have more freedom to choose their own speed and so speed pat- Kockelman (2006) who reported that the number of lanes decreases
terns on the roadway tend to be less uniform leading to more crash counts for non-fatal crashes and the results by Bonneson and
encounters between vehicles (Elvik et al., 2004). Additionally, low Pratt (2008) and Park et al. (2010b) who found that 6-lane free-
volumes occur more often during off-peak periods, such as night ways are less crash prone than 4 or 8-lanes but opposite to the
time, that is related to insufficient light conditions and extreme majority of current literature (Chang, 2005; Milton and Mannering,
driving behaviours (e.g. drinking and driving) that are also factors 1998). A possible explanation for that could be that wider roads
proved to trigger crash occurrence (Chang and Wang, 2006; Clarke allow more manoeuvres for crash avoidance during a crash-prone
et al., 2010; Jonah, 1986). encounter. Moreover, this result can also be explained by the inclu-
Curvature is not shown to have a statistically significant rela- sion of crashes that occurred on undivided (single) carriageways.
tionship with KS crashes but it increases the likelihood of SL crashes. Over half of the examined crashes occurred on A-roads that include
The finding for SL crashes is consistent with other studies on the some single carriageways which are related with hazardous vehi-
relationship of horizontal alignment with crashes (Ma et al., 2008; cle interactions that may lead to crashes with severe consequences
Milton and Mannering, 1998; Park et al., 2010a). However, the out- (e.g. head-on collisions).
come for KS crashes is not expected as literature suggests that Considering the variations between the results of the two mod-
curvature is associated with higher crash severity (Geedipally et al., els, it is clear that aggregation bias that occurs at link-based
2013; Ma and Kockelman, 2006). The high design standards of approaches might lead to significant errors, meaning that the data
the study area could be a possible explanation why curvature is aggregation concept plays a major role to the outcomes of safety
not statistically significant for KS crashes (i.e. small radius curves analyses. This subject has been disregarded by most researchers,
are relatively rare for motorways and major A-roads) although it who mainly focused their research on developing more advanced
has been suggested that curvature is linked with more crashes statistical models; however it seems that the way crash data are
184 M.-I.M. Imprialou et al. / Accident Analysis and Prevention 86 (2016) 173–185

prepared for the statistical analysis is important too. The link-based Acknowledgements
and the condition-based models cannot be directly compared to
each other neither using goodness-of-fit statistics nor based on The authors would like to gratefully thank Prof. Benjamin Hey-
the interpretability of their outcomes. However, it can be argued decker and Prof. Mike Maher of University College London for their
that the condition-based model gives a significantly more accu- thoughts and comments during the development of this work.
rate representation of the crash-related conditions and so its This research was partially funded by a grant from the UK Engi-
results apart from being more reasonable might be also more neering and Physical Sciences Research Council (EPSRC) (Grant
reliable. reference: EP/F018894/1). The authors take full responsibility for
the content of the paper and any errors or omissions. Research
data for this paper are available upon request from Maria-Ioanna
7. Conclusions Imprialou.

This paper presented a novel crash modelling approach in re-


examining crash–speed relationships based on a new concept that References
overcomes some existing limitations of the conventional approach.
The originality of the work lies in the development of an alterna- Aarts, L., Van Schagen, I., 2006. Driving speed and the risk of road crashes: a
review. Accid. Anal. Prev. 38, 215–224.
tive data aggregation concept that defines the pre-crash traffic and
AASHTO, 2010. Highway Safety Manual. American Association of state highway
geometric conditions as the crash aggregating factors. Compared to and Transportation Officials, Washington, DC.
the approaches that assign crashes into groups based on their spa- Abdel-Aty, M., Pande, A., 2005. Identifying crash propensity using specific traffic
speed conditions. J. Saf. Res. 36, 97–108.
tial relationship with road entities, the new method addresses the
Abdel-Aty, M., Radwan, A.E., 2000. Modeling traffic accident occurrence and
inherent problem of over aggregation of time-varying traffic vari- involvement. Accid. Anal. Prev. 32, 633–642.
ables and relevant information losses that may affect the modelling Abdel-aty, M., Uddin, N., Pande, A., 2005. Split models for predicting multivehicle
outcomes. crashes during high-speed and low-speed. Transp. Res. Rec.: J. Transp. Res.
Board 1908, 51–58.
The new modelling approach was employed to all the crashes Aguero-Valverde, J., 2013. Multivariate spatial models of excess crash frequency at
that occurred on the Strategic Road Network (SRN) of England area level: case of Costa Rica. Accid. Anal. Prev. 59, 365–373.
during 2012. Pre-crash condition identification was based on geo- Aguero-Valverde, J., Jovanis, P.P., 2009. Bayesian multivariate poisson lognormal
models for crash severity modeling and site ranking. Transp. Res. Rec.: J.
coded crash locations obtained by a crash mapping algorithm that Transp. Res. Board 2136, 82–91.
was previously developed for the study area. In order to compare Anastasopoulos, P.C., Mannering, F.L., 2009. A note on modeling vehicle accident
the traditional modelling approach with the proposed approach, frequencies with random-parameters count models. Accid. Anal. Prev. 41,
153–159.
link-based and condition-based datasets were developed based on Barua, S., El-Basyouny, K., Islam, M.T., 2014. A full Bayesian multivariate count data
identical crash, traffic and geometry data. Bayesian multivariate model of collision severity with spatial correlation. Anal. Methods Accid. Res.
Poisson lognormal regression was employed for modelling both 3–4, 28–43.
Baruya, A., 1998. Speed-accident relationships on European roads. In: 9th
datasets by injury severity taking into account first order spa-
International Conference Road Safety in Europe, Bergisch Gladbach, Germany.
tial correlation for the link-based model. The models explored the Baruya, A., Finch, D.J., 1994. Investigation of traffic speeds and accidents on urban
optimal variable specifications as well as for potential interactions roads. In: Traffic Management and Road Safety. Proceedings of Seminar J Held
at the PTRC European Transport Forum, September 12–16, Volume P381.
between speed and volume.
Warwick, UK.
Speed has been found to be a significant contributory factor for Besag, J., 1974. Spatial interaction and the statistical analysis of lattice systems. J. R.
the number and the consequences of crashes when the data are Stat. Soc. Ser. B (Methodol.), 192–236.
modelled with the condition-based approach. In contrast to that, Black, J., Hashimzade, N., Myles, G., 2009. A Dictionary of Economics. Oxford
University Press.
according to the results of the link-based model speed has a nega- Bonneson, J.A., Pratt, M.P., 2008. Calibration Factors Handbook: Safety Prediction
tive relationship with crash occurrences for all severity types. From Models Calibrated with Texas Highway System Data.
a methodological point of view, the difference in the results of Chang, L.-Y., 2005. Analysis of freeway accident frequencies: negative binomial
regression versus artificial neural network. Saf. Sci. 43, 541–557.
these approaches reveals that the data aggregation method is an Chang, L.-Y., Wang, H.-W., 2006. Analysis of traffic injury severity: an application
important decision before conducting a crash data statistical anal- of non-parametric classification tree techniques. Accid. Anal. Prev. 38,
ysis. Thinking that the link-based approaches include observations 1019–1027.
Chipman, M.L., MacGregor, C.G., Smiley, A.M., Lee-Gosselin, M., 1992. Time vs.
that often lack details and tend to mask the crash contributory distance as measures of exposure in driving surveys. Accid. Anal. Prev. 24,
factors, link-based models are very likely to have limited explana- 679–684.
tory potential. Condition-based approaches, on the other hand, Cirillo, J.A., 1968. Interstate system accident research study II, interim report II.
Public Roads, 35.
focus on the crash time and location and can be considered as
Clark, W., Avery, K., 1976. The effects of data aggregation in statistical analysis.
more representative of the actual circumstances. That is why they Geogr. Anal. 8, 428–438.
provide more explainable, logical and possibly more credible out- Clarke, D.D., Ward, P., Bartle, C., Truman, W., 2010. Killer crashes: fatal road traffic
accidents in the UK. Accid. Anal. Prev. 42, 764–770.
comes.
Davis, G.A., 2004. Possible aggregation biases in road safety research and a
Condition-based crash modelling, according to the results pre- mechanism approach to accident modeling. Accid. Anal. Prev. 36, 1119–1127.
sented on this paper, is a new and promising approach that can Department for Transport, 2011. STATS19 Road Accident Injury Statistics – Report
increase the insight about various crash triggering factors and by Form [WWW Document], https://www.gov.uk/government/uploads/system/
uploads/attachment data/file/230590/stats19.pdf.
indicating hazard-prone traffic conditions contribute to the assess- El-Basyouny, K., Sayed, T., 2009. Collision prediction models using multivariate
ment and the development of road safety measures. The method Poisson-lognormal regression. Accid. Anal. Prev. 41, 820–828.
is flexible and transferable to other study areas and can be imple- Elvik, R., Christensen, P., Amundsen, A., 2004. Speed and Road Accidents: An
Evaluation of the Power Model. TØI Report.
mented using different combinations of variables and, preferably, Fildes, B.N., Rumbold, G., Leening, A., 1991. Speed Behaviour and Drivers’ Attitude
higher resolution traffic data. Future work should include assess- to Speeding. Monash University Accident Research Centre, Report, Victoria,
ment of the two methods through comparison of the predicted Austalia.
Garber, N., Ehrhart, A., 2000. Effect of speed, flow, and geometric characteristics on
values between link-based and condition-based models. Instead crash frequency for two-lane highways. Transp. Res. Rec. 1717, 76–83.
of employing condition-based as a substitute of link-based meth- Garber, N., Subramanyan, S., 2001. Incorporating crash risk in selecting
ods, it would be also useful to research whether and how these congestion-mitigation strategies: hampton roads area (Virginia) case study.
Transp. Res. Rec. 1746, 1–5.
approaches could work complementary of each other towards the
Garber, N.J., Gadiraju, R., 1989. Factors affecting speed variance and its influence on
quantification of crash risk from different perspectives. accidents. Transp. Res. Rec.: J. Transp. Res. Board 1213, 64–71.
M.-I.M. Imprialou et al. / Accident Analysis and Prevention 86 (2016) 173–185 185

Geedipally, S., Bonneson, J., Pratt, M., Lord, D., 2013. Severity distribution functions Miaou, S.P., Lum, H., 1993. Modeling vehicle accidents and highway geometric
for freeway segments. Transp. Res. Rec.: J. Transp. Res. Board, 19–27. design relationships. Accid. Anal. Prev. 25, 689–709.
Guo, F., Wang, X., Abdel-Aty, M.a, 2010. Modeling signalized intersection safety Milton, J., Mannering, F., 1998. The relationship among highway geometrics,
with corridor-level spatial correlations. Accid. Anal. Prev. 42, 84–92. traffic-related elements and motor-vehicle accident frequencies.
Highways Agency, 2008. Pavement design and maintenance. In: Design Manual for Transportation 25, 395–413.
Roads and Bridges. Department for Transport, UK. Munden, J.M., 1967. The Relation Between a Driver’s Speed and His Accident Rate.
Highways Agency, 2011. HATRIS JTDB Reference Manual. Road Research Laboratory, Ministry of Transport, Crowthorne, England.
Hossain, M., Muromachi, Y., 2013. Understanding crash mechanism on urban Navon, D., 2003. The paradox of driving speed: two adverse effects on highway
expressways using high-resolution traffic data. Accid. Anal. Prev. 57, 17–29. accident rate. Accid. Anal. Prev. 35, 361–367.
Imprialou, M.-I., Quddus, M., Pitfield, D., 2015. Multilevel logistic regression Pande, A., Abdel-Aty, M., 2005. A freeway safety strategy for advanced proactive
modeling for crash mapping in metropolitan areas. In: Transportation traffic management. J. Intell. Transp. Syst.: Technol. Plann. Oper. 9,
Research Board 94th Annual Meeting, Washington, DC. 145–158.
Imprialou, M.-I.M., Quddus, M., Pitfield, D.E., 2014. High accuracy crash mapping Park, B.-J., Fitzpatrick, K., Lord, D., 2010a. Evaluating the effects of freeway design
using fuzzy logic. Transp. Res. Part C: Emerg. Technol. 42, 107–120. elements on safety. Transp. Res. Rec.: J. Transp. Res. Board,
IRTAD – International Traffic Safety Analysis Group, 2014. Road Safety Annual 58–69.
Report 2014. International Transport Forum. Park, B.-J., Fitzpatrick, K., Lord, D., 2010b. Evaluating the effects of freeway design
Ivan, J.N., Wang, C., Bernardo, N.R., 2000. Explaining two-lane highway crash rates elements on safety. Transp. Res. Rec.: J. Transp. Res. Board 2195,
using land use and hourly exposure. Accid. Anal. Prev. 32, 787–795. 58–69.
Joksch, H.C., 1975. An empirical realation between fatal accident involvement per Park, E.S., Lord, D., 2007. Multivariate Poisson-lognormal models for jointly
accident involvement and speed. Accid. Anal. Prev. 7, 129–132. modeling crash frequency by severity. Transp. Res. Rec.: J. Transp. Res. Board
Jonah, B.A., 1986. Accident risk and risk-taking behaviour among young drivers. 2019, 1–6.
Accid. Anal. Prev. 18, 255–271. Pei, X., Wong, S.C., Sze, N.N., 2012. The roles of exposure and speed in road safety
Kim, D.G., Lee, Y., Washington, S., Choi, K., 2007. Modeling crash outcome analysis. Accid. Anal. Prev. 48, 464–471.
probabilities at rural intersections: application of hierarchical binomial logistic Qin, X., Ivan, J.N., Ravishanker, N., 2004. Selecting exposure measures in crash rate
models. Accid. Anal. Prev. 39, 125–134. prediction for two-lane highway segments. Accid. Anal. Prev. 36,
Kloeden, C.N., McLean, A., Glonek, G., 2002. Reanalysis of Travelling Speed and the 183–191.
Risk of Crash Involvement in Adelaide South Australia, No CR 207. NHMRC Quddus, M., 2013. Exploring the relationship between average speed, speed
Road Accident Research Unit, The University of Adelaide. variation, and accident rates using spatial statistical models and GIS. J. Transp.
Kloeden, C.N., Mclean, A.J., Moore, V.M., Ponte, G., 1997. Travelling Speed and the Saf. Secur. 5, 27–45.
Risk of Crash Involvement Volume 1 – Findings, No CR 172. NHMRC Road Quddus, M.A., 2008. Modelling area-wide count outcomes with spatial correlation
Accident Research Unit, The University of Adelaide, South Australia. and heterogeneity: an analysis of London crash data. Accid. Anal. Prev. 40,
Kockelman, K.M., Ma, J., 2007. Freeway speeds and speed variations preceding 1486–1497.
crashes, within and across lanes. Transp. Res. Forum 46, 43–61. Shankar, V., Mannering, F., Barfield, W., 1995. Effect of roadway geometrics and
Lave, C., 1985. Speeding, coordination, and the 55 MPH limit. Am. Econ. Assoc. 75, environmental factors on rural freeway accident frequencies. Accid. Anal. Prev.
1159–1164. 27, 371–389.
Lord, D., 2002. Issues related to the application of accident prediction models for Solomon, D., 1964. Accidents on Main Rural Highways Related to Speed, Driver and
computation of accident risk on transportation networks. Transp. Res. Rec. Vehicle. US Department of Commerce, Federal Bureau of Highways,
1784, 17–26. Washington, DC.
Lord, D., Manar, A., Vizioli, A., 2005a. Modeling crash-flow-density and Spiegelhalter, D., Thomas, A., Best, N., Lunn, D., 2003. WinBUGS User Manual.
crash-flow-V/C ratio relationships for rural and urban freeway segments. Stuster, J., 2004. Aggressive Driving Enforcement: Evaluations of Two
Accid. Anal. Prev. 37, 185–199. Demonstration Programs, Report DOT HS 809 707. Washington, DC.
Lord, D., Mannering, F., 2010. The statistical analysis of crash-frequency data: a Taylor, M.C., Lynam, D.A., Baruya, A., 2000. The Effects of Drivers’ Speed on the
review and assessment of methodological alternatives. Transp. Res. Part A: Frequency of Road Accidents. Transport Research Laboratory, Crowthorne,
Policy Pract. 44, 291–305. England.
Lord, D., Washington, S.P., Ivan, J.N., 2005b. Poisson, Poisson-gamma and Vangala, P., Master of Science Thesis 2015. Negative Binomial-Generalized
zero-inflated regression models of motor vehicle crashes: balancing statistical Exponential Distribution: Generalized Linear Model and Its Applications.
fit and theory. Accid. Anal. Prev. 37, 35–46. Zachry Department of Civil Engineering, Texas A&M University, College
Ma, J., Kockelman, K.M., 2006. Bayesian multivariate Poisson regression for models Station, TX.
of injury count, by severity. Transp. Res. Rec. 1950, 24–34. Vangala, P., Lord, D., Geedipally, S.R., 2015. Exploring the application of the
Ma, J., Kockelman, K.M., Damien, P., 2008. A multivariate Poisson-lognormal negative binomial – generalized exponential model for analyzing traffic crash
regression model for prediction of crash counts by severity, using Bayesian data with excess zeros. Anal. Methods Accid. Res. 7, 29–36.
methods. Accid. Anal. Prev. 40, 964–975. WHO, 2013. WHO Global Status Report on Road Safety 2013: Supporting a Decade
Mannering, F.L., Bhat, C.R., 2014. Analytic methods in accident research: of Action. World Health Organization.
methodological frontier and future directions. Anal. Methods Accid. Res. 1,
1–22.

You might also like