A Crash-Prediction Model For Multilane Roads
A Crash-Prediction Model For Multilane Roads
Abstract
Considerable research has been carried out in recent years to establish relationships between crashes and traffic flow, geometric infrastructure
characteristics and environmental factors for two-lane rural roads. Crash-prediction models focused on multilane rural roads, however, have rarely
been investigated. In addition, most research has paid but little attention to the safety effects of variables such as stopping sight distance and
pavement surface characteristics. Moreover, the statistical approaches have generally included Poisson and Negative Binomial regression models,
whilst Negative Multinomial regression model has been used to a lesser extent. Finally, as far as the authors are aware, prediction models involving
all the above-mentioned factors have still not been developed in Italy for multilane roads, such as motorways. Thus, in this paper crash-prediction
models for a four-lane median-divided Italian motorway were set up on the basis of accident data observed during a 5-year monitoring period
extending between 1999 and 2003. The Poisson, Negative Binomial and Negative Multinomial regression models, applied separately to tangents
and curves, were used to model the frequency of accident occurrence. Model parameters were estimated by the Maximum Likelihood Method, and
the Generalized Likelihood Ratio Test was applied to detect the significant variables to be included in the model equation. Goodness-of-fit was
measured by means of both the explained fraction of total variation and the explained fraction of systematic variation. The Cumulative Residuals
Method was also used to test the adequacy of a regression model throughout the range of each variable. The candidate set of explanatory variables
was: length (L), curvature (1/R), annual average daily traffic (AADT), sight distance (SD), side friction coefficient (SFC), longitudinal slope (LS)
and the presence of a junction (J). Separate prediction models for total crashes and for fatal and injury crashes only were considered. For curves
it is shown that significant variables are L, 1/R and AADT, whereas for tangents they are L, AADT and junctions. The effect of rain precipitation
was analysed on the basis of hourly rainfall data and assumptions about drying time. It is shown that a wet pavement significantly increases the
number of crashes.
The models developed in this paper for Italian motorways appear to be useful for many applications such as the detection of critical factors,
the estimation of accident reduction due to infrastructure and pavement improvement, and the predictions of accidents counts when comparing
different design options. Thus this research may represent a point of reference for engineers in adjusting or designing multilane roads.
2006 Elsevier Ltd. All rights reserved.
Keywords: Crash-prediction model; Multilane road; Negative Multinomial distribution; Traffic flow; Road geometry; Pavement friction; Weather
1. Introduction
Over the last few years numerous road-accident-prediction
models have been developed to investigate the effects that various variables may have on the value of a pre-selected crash
indicator. The most common crash indicators that have hitherto
been used are the number of crashes per year (crash frequency)
and the number of crashes per million vehicle-kilometres (crash
rate). The fact that accidents might not be a linear function of
traffic flow and section length generally induces one to use crash
Corresponding author. Tel.: +39 089 964140; fax: +39 089 964045.
E-mail address: [email protected] (C. Caliendo).
0001-4575/$ see front matter 2006 Elsevier Ltd. All rights reserved.
doi:10.1016/j.aap.2006.10.012
658
659
660
3. Data description
3.1. Horizontal and vertical alignment
A 5-year monitoring period extending from 1999 to 2003
was carried out on a four-lane median-divided motorway. This
infrastructure was 46.6 km long, and the horizontal alignment
contained tangents and circular curves without any transition
curves. Vertical alignment consisted of gradients and circular
curves.
During the period of observation, crash data, traffic flow,
pavement surface conditions and rainfall data were collated.
Accident data were extracted from the official reports of the
Motorway Management Agency (MMA). For each accident a
variety of details was recorded, including date and location of
accident, horizontal alignment (tangent or curve), vertical alignment (upgrade or downgrade), weather and pavement surface
conditions (dry or wet), type and severity of accidents, number of vehicles and persons involved, and a short description of
the accident dynamics. Some 1916 accidents were considered
in this study, 21 of which were fatal and 594 were injury accidents. Since fatalities appear to be too few to be analysed alone,
fatal and injury crashes were considered collectively and are
Horizontal and vertical alignments of the monitored motorway were derived from a file containing a recent 3D aerial survey.
The Autodesk Inc. AutoCAD 2000 software was applied in
order to measure the geometric characteristics (e.g. length of
tangents and curves, horizontal and vertical curvature, longitudinal slope). Tangents with length ranging from 0.1 to 1.7 km
and horizontal curves with radii from 0.2 to 8.0 km were computed. Furthermore, gradients with longitudinal slope ranging
between 4.5 and +4.5% were estimated and vertical curves of
circular type were defined.
3.2. Sight distance
In order to establish the role played by restricted visibility
on accident occurrence, sight distances (SD) regarding the horizontal and vertical alignment were also determined.
On horizontal curves, a physical feature outside the travelled way such as the longitudinal safety barrier was con-
661
Table 1
Accident count data observed during the 5-year monitoring period
Year
South direction
Curves
Years total
Tangents
Years total
Curves
Years total
1999
2000
2001
2002
2003
140 (47)
115 (29)
119 (39)
156 (52)
155 (43)
57 (25)
32 (10)
78 (27)
77 (25)
68 (17)
197 (72)
147 (39)
197 (66)
233 (77)
223 (60)
111 (35)
110 (46)
142 (36)
131 (40)
141 (38)
36 (9)
40 (14)
69 (21)
69 (30)
70 (25)
147 (44)
150 (60)
211 (57)
200 (70)
211 (63)
344 (116)
297 (99)
408 (123)
433 (147)
434 (123)
Total
685 (210)
312 (104)
997 (314)
635 (195)
284 (99)
919 (294)
1916 (608)
Total number of
vehicles travelling
54.3 106
56.4 106
56.1 106
55.7 106
56.0 106
lanes and for each direction of travel. For estimating the SFC
value for each of 265 segments considered (147 for tangents
and 118 for curves) in this paper the data provided by MMA
was used and the following approximate procedure was applied.
The SFC value for each segment
of length l was calculated as
the weighted mean SFC = ni=1 li SFCi / l, where SFCi is the
pavement friction value on the sub-segment of length li (SFCi
being the average value of the friction measured on the two
lanes of each travel direction). Using this procedure, SCF values
ranging from 0.26 to 0.74 were estimated.
Summary statistics of the above independent variables are
given in Table 2.
3.5. Rain
Rain precipitation data were derived from the Functional
Hydrogeological Centre of Campanian Region. They consist of
millimetres of rain per hour measured by eight weather stations
located along the route of the motorway.
Of all the 1916 accidents registered on the motorway in the
5-year monitoring period, 273 are reported by MMA as having
occurred on wet pavement. In order to evaluate a potential rain
effect on the number of crashes, the amount of time the pavement
is wet is estimated using the hourly rainfall data and assumptions
about drying time. For this purpose for each of the 265 road
segments (curve and tangent), the amount of time the pavement
was wet in a year was estimated by summing up the hours of
rainfall observed in that year as available from the records of the
nearest weather station. Subsequently, this amount of time in
hours was transformed into a time-equivalent number of days
with a wet pavement in each year of the monitoring period.
In other words, in this study conventional days are introduced,
each day being totally dry or wet, with a number of dry
and wet days in a year proportional to the estimated amount
of time the pavement of a road segment was, respectively, dry
or wet during that year. Then, accidents occurred when the
pavement surface was dry (wet) are associated with days
with a conventional surface status dry (wet). All remaining
days, both dry or wet, have zero accidents. Thus, a data
set resulted which consists of the daily number of accidents on
each road section in the 5-year monitoring period, along with
the conventional daily status (dry or wet) of the pavement
surface.
662
Table 2
Summary statistics of independent variablessouth direction carriageway
Length (km)
Longitudinal slope (%)
Sight distance (km)
Curvature (km1 )
Side friction coefficient
AADT/10,000
Mean
Mode
Standard deviation
Minimum
Maximum
0.350
0.05
0.583
2.105
0.473
2.748
0.245
2.42
0.200
0.504
0.515
1.874
0.298
2.09
0.477
1.463
0.060
0.972
0.069
4.37
0.100
0.126
0.286
1.764
1.695
4.26
2.335
4.854
0.670
4.741
Note. SFC and AADT vary along time and the table was calculated based on 665 observations.
Table 3
Summary statistics of rain data: day counts for all road segments from 1999 to
2003
With 0 crashes
With at least 1 crash
Column total
Wet pavement
Dry pavement
Row total
32,073
270
32,343
449,919
1,628
451,547
481,992
1,898
483,890
N
i=1 {(Yi
663
be included in the regression model, a stepwise forward procedure based on the Generalized Likelihood Ratio Test (GLRT)
was used.
4.3. Measuring goodness-of-t
To measure the overall goodness-of-fit (g.o.f) in Linear
Regression Models the so-called coefficient of determination,
R2 , is often used.
In the case of Poisson and NB regression models, however, different measures of g.o.f. have also been suggested (see
Fridstrm et al., 1995). In fact, for these models the ML estimation method is usually used. To the extent that one wants to use
R2 statistic as a basis for testing g.o.f., the way the model parameters are estimated becomes relevant, since R2 is maximized by
ordinary Least Squares estimation but not by ML estimation.
Among g.o.f. indexes alternative to R2 , the most natural
one appears to be the likelihood ratio g.o.f. statistic R2D . Let Dm
be the scaled deviance of model m, that is Dm = 2 ln(Lm /LN ),
where LN is the maximum of the likelihood function of the model
in which there are as many parameters as there are observations
(the so-called full or saturated model). Now, let D0 denote the
scaled deviance of the zero model, i.e. the model with only a
constant term and an overdispersion parameter. In Fridstrm et
al. (1995) it is observed that D0 can be assumed as a measure of
the total variation present in the sample. Then, the R2D statistic
defined by
R2D = 1
Dm /(N m)
D0 /(N 2)
(2)
E{Dm }/(N m 1)
D0 /(N 2)
(3)
i ) Yi }
1/2
2
2 N
i=1 i
(1)
13.384
370.15
0.129
0.814
3.443
372.23
0.121
0.767
9.548
375.43
0.110
0.696
2.625
376.53
0.106
0.671
379.40
0.098
0.616
18.254
463.60
0.185
0.687
4.146
465.50
0.181
0.671
14.491
479.57
0.148
0.550
3.623
480.54
0.146
0.542
490.93
0.123
0.458
Overdispersion parameter
Log likelihood
R2D , explained fraction of total variation
R2D /PD2 , explained fraction of systematic variation
1.32175
0.92575
0.32702
0.39709
0.28692
0.69578
1.39837
0.88783
0.33336
0.39903
0.26655
0.69321
1.33025
0.91221
0.32744
0.39095
0.26474
0.68694
1.33450
0.90626
0.32951
0.38973
0.28842
0.69605
0.07130
0.80311
0.27017
0.32660
0.38151
0.68686
0.09039
0.88222
0.25718
0.31828
0.38278
0.68692
NM
NBH
NB
0.03459
0.82431
0.26640
0.32674
0.38406
0.69335
NBH
NB
Poisson
Poisson
NMH
Severe crashes
All crashes
Table 4
Parameter estimates and goodness-of-fit measures for curves
NM
The data set for curves consists of annual number of total and
severe accidents registered in n = 5 years from 1999 to 2003 on
N = 118 segments, 59 for each carriageway. The candidate set
of explanatory variables is: length (L), curvature (1/R), annual
average daily traffic (AADT), sight distance (SD), side friction
coefficient (SFC) and longitudinal slope (LS). Moreover, from
Table 1 it appears that the number of accidents registered on
curves during 1999 and especially during 2000 seems to be much
smaller than during the remaining years. Since we were not able
to assign a specific cause to this trend, dummy variables yr99,
yr00, yr01 and yr02 are also considered to capture the potential
non-random year effect. The reference year for these dummy
variables
is 2003. The log-linear regression model i (xi ; ) =
m
exp
j=0 j xji is assumed for the expected number of counts
on section i.
0.03523
0.86109
0.26196
0.32274
0.37102
0.69053
0.01243
0.84796
0.26772
0.31986
0.38248
0.68724
NMH
5. Estimation results
1.45703
0.86881
0.33793
0.40863
0.28510
0.69566
Constant
Log of the section length (km)
Curvature (km1 )
AADT/10,000
Year 1999 (dummy 0, 1)
Year 2000 (dummy 0, 1)
664
17.4
12.0
22.9
27.3
11.9
1.67
0.71
0.75
0.15
0.08
509.31
500.61
494.63
483.19
469.56
463.60
462.76
463.24
463.22
463.52
463.55
0.02615
0.19534
Slope
Friction
Sight
0.61921
Yr2001
0.03938
Yr2002
0.02954
GLRT
Log-LKH
P value
665
3.03E05
5.42E04
1.72E06
1.78E07
5.54E04
1.96E01
3.98E01
3.87E01
7.00E01
7.75E01
0.38151
0.37999
0.38205
0.35773
0.36822
0.37156
0.60843
0.68686
0.68671
0.68690
0.70106
0.67360
0.67695
0.34486
0.35244
0.32660
0.33554
0.32329
0.31542
0.32678
0.32683
0.15967
0.27726
0.27950
0.27017
0.24766
0.26885
0.26663
0.27027
0.27025
0.59609
0.73897
0.80636
0.80742
0.80311
0.81372
0.80088
0.80430
0.80315
0.80314
9.382
11.90
14.12
18.20
18.18
18.25
18.62
18.57
18.59
18.25
18.25
1
2
3
4
5
6
7
8
9
10
11
0.13625
0.91513
0.78127
0.30633
0.23371
0.07130
0.06550
0.06432
0.25101
0.08520
0.08198
Yr1999
Yr2000
AADT
Curvature
Log-length
Constant
Step
Table 5
Stepwise procedure: sequence of models and parameters for NMH model (curvesall crashes)
0.823
0.811
0.789
0.778
0.762
0.608
0.587
8.957
507.50
0.216
8.290
510.01
0.210
6.339
511.37
0.207
513.37
0.205
13.772
135.83
0.259
5.472
142.04
0.250
0.556
0.475
8.957
146.81
0.243
4.227
151.116
0.236
175.60
0.203
Overdispersion parameter
Log likelihood
R2D , explained fraction of
total variation
R2D /PD2 , explained fraction
of systematic variation
0.21344
0.21327
0.22190
0.21404
0.21343
Constant
Log of the section length
(km)
AADT/10,000
AADT/10,000 junctions
(1 if present, 0 if absent)
Year 2000
0.571
13.366
506.09
0.219
0.42575
0.50628
0.42685
0.47142
0.40574
0.45428
0.41867
0.44513
0.42610
0.44149
0.23960
0.22848
0.22554
0.21715
0.22096
0.22066
0.23105
0.22172
0.23198
0.23190
1.38939
0.77791
1.35319
0.74731
1.36559
0.77679
1.37427
0.79270
0.50347
0.85729
0.50931
0.81380
0.52609
0.86044
0.53912
0.83745
0.53501
0.86883
NM
NBH
NB
Poisson
NB
NBH
NM
NMH
Poisson
Severe crashes
Table 6
Parameter estimates and goodness-of-fit measures for tangents
NMH
1.40044
0.76232
All crashes
666
0.08981
0.02197
0.21344
0.21464
0.21288
0.23603
0.24023
0.18960
0.24459
0.22848
0.21890
0.22952
0.22324
0.22918
0.24020
0.07488
Yr2000
Junction
667
0.29497
0.23404
0.23960
0.23813
0.23927
0.23508
0.23888
0.23588
For multilane road sections containing curves, the accidentprediction models for each carriageway are:
3.457
7.502
11.09
13.78
13.77
13.81
13.92
13.78
13.79
13.78
0.96813
1.33758
0.39950
0.47637
0.50347
0.54376
0.50505
0.53827
0.52941
0.48707
0.90237
0.86380
0.85591
0.85729
0.85839
0.86022
0.85640
0.85584
0.85516
total crashes:
1
2
3
4
5
6
7
8
9
10
Constant
Log-length
AADT
6.1. Curves
Step
Table 7
Stepwise procedure: sequence of models and parameters for NMH model (tangentsall crashes)
Friction
Slope
Yr1999
0.09670
Yr2001
0.11240
94.2
40.5
18.9
7.34
0.02
0.86
1.50
2.04
1.42
216.28
169.20
148.95
139.50
135.83
135.82
135.40
135.08
134.81
135.12
2.92E22
1.96E10
1.38E05
6.74E03
8.88E01
3.54E01
2.21E01
1.53E01
2.33E01
GLRT
Log-LKH
Yr2002
P value
668
Table 8
Parameter estimates for the regression model with a surface status indicator
All crashes
Constant
Log of the section length (km)
Curvature (km1 )
AADT/10,000
Surface status (1 if wet, 0 if dry)
AADT/10,000 junctions (1 if present, 0 if absent)
Year 1999 (dummy 0, 1)
Year 2000 (dummy 0, 1)
Overdispersion parameter
Severe crashes
Tangents
Curves
Tangents
Curves
5.47635
0.87122
6.01594
0.84834
0.26664
0.32652
0.99436
7.33916
0.80626
7.53979
0.90707
0.34003
0.41839
1.18186
0.24076
0.84363
0.23873
0.19606
1.160
0.37727
0.66717
1.242
0.42121
1.03324
0.44289
0.960
0.61426
0.773
669
670
Hsiao, C.H., Lin, C.T., Cassidy, M., 1994. Application of fuzzy logic and neural
networks to automatically detect freeway traffic incidents. J. Transp. Eng.
120, 753773.
IMSL MATH/LIBRARY, 1989. Fortran Subroutines for Mathematical Applications. IMSL.
Keay, K., Sommonds, I., 2005. The association of rainfall and other weather
variables with road traffic volume in Melbourne. Aust. Accid. Anal. Prev.
37, 109124.
Keay, K., Sommonds, I., 2006. Road accident and rainfall in a large Australian
city. Accid. Anal. Prev. 38, 445454.
Knuiman, M.W., Council, F.M., Reinfurt, D.W., 1993. Association of median
width and highway accident rates. Transp. Res. Rec., 1401.
Lord, D., Washington, S.P., Ivan, J.N., 2005. Poisson, Poisson-gamma and zeroinflated regression models of motor vehicle crashes: balancing statistical fit
and theory. Accid. Anal. Prev. 37, 3546.
Martin, J.-L., 2002. Relationship crash rate and hourly traffic flow on interurban
motorways. Accid. Anal. Prev. 34, 619629.
Miaou, S.P., Song, J.J., 2005. Bayesian ranking of sites for engineering safety
improvements: decision parameter, treatability concept, statistical criterion,
and spatial dependence. Accid. Anal. Prev. 37, 699720.
Mussone, L., Ferrari, A., Oneta, M., 1999. An analysis of urban collision using
an artificial intelligence model. Accid. Anal. Prev. 31, 705718.
Ozbay, K., Noyan, N., 2006. Estimation of incident clearance times using
Bayesian Network approach. Accid. Anal. Prev. 38, 542555.
Persaud, B., Dzbik, L., 1993. Accident prediction models for freeways. Transp.
Res. Rec. 1401, 5560.
Persaud, B., Lyon, C., Nguyen, T., 1999. Empirical Bayes procedure for ranking
sites for safety investigation by potential for safety improvement. Transp.
Res. Rec. 1665, 712.
Persaud, B., Retting, R.A., Lyon, C., 2000. Guidelines for the identification of
Hazardous Highway Curves. Transp. Res. Rec. 1717, 1418.
Sayed, T., Abdelwahab, W., Navin, F., 1995. Identifying accident-prone
location using fuzzy pattern recognition. J. Transp. Eng. 121, 352
358.
Shankar, V., Mannering, F., Barfield, W., 1995. Effect of roadway geometrics and
environmental factors on rural freeway accident frequencies. Accid. Anal.
Prev. 27 (3), 542555.
Wang, X., Abdel-Aty, M., Brady, P.A., 2006. Crash estimation at signalized
intersections: significant factors and temporal effect. In: Proceedings of the
TRB 2006 Annual Meeting, TRB 06-0009.