0% found this document useful (0 votes)
56 views

Toward Detecting Accidents With Already Available Passive Traffic Information

The document discusses using machine learning algorithms to detect traffic accidents from passive location data collected from smartphones. It describes simulating traffic using a multi-agent system and training machine learning models on simulated data with and without accidents to detect accidents. Three machine learning algorithms are evaluated: logistic regression, bagging classifier, and AdaBoost classifier.

Uploaded by

rayestm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

Toward Detecting Accidents With Already Available Passive Traffic Information

The document discusses using machine learning algorithms to detect traffic accidents from passive location data collected from smartphones. It describes simulating traffic using a multi-agent system and training machine learning models on simulated data with and without accidents to detect accidents. Three machine learning algorithms are evaluated: logistic regression, bagging classifier, and AdaBoost classifier.

Uploaded by

rayestm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Toward Detecting Accidents with Already Available

Passive Traffic Information

Robert W. Thomas and Jos M. Vidal


Department of Computer Science and Engineering
University of South Carolina
Columbia, SC, USA

Abstract Traffic accidents occur every day, causing disrup- Early detection of incidents is one way to reduce potential
tions. The longer disruptions are in place, the more severe they congestion. Machine learning has been used to this end in the
may become as additional vehicles continue to enter the affected context of analyzing video feeds at fixed intersections with the
roadways. This paper looks at using passive data from a readily
available source, smart phones, to detect traffic accidents auto- goal of detecting incidents [3][4]. The necessary infrastructure
matically via machine learning algorithms and thereby allow for monitoring roadways and intersections in this manner can
additional alerts and actions to occur to minimize the disruption. be expensive though. This has led some researchers to use
Using simulated data, machine learning algorithms were scored probe vehicles to gather data for use in qualitative analysis
for accuracy and the results were analyzed. [13][14]. Methods reliant on Vehicular Ad-Hoc Network
(VANET) enabled cars to relay data for analysis have also
KeywordsMulti-Agent System, smart cities, trace data
been explored [15]. These approaches show promise, but they
also suffer from low volume of sensor data due to the depend-
I. INTRODUCTION
ency on the presence of probe cars and VANET equipped ve-
Traffic accidents occur every day. Each and every one is hicles.
disruptive to the flow of traffic. How disruptive is a product of The presence of untapped data and how to make use of it is
the number of vehicles in the area, severity of the accident, and an area of consideration for Smart Cities [10]. There is a large-
how long it takes to clear the accident. Efforts to clear an
ly under-utilized data source that is already deployed. Smart
accident cannot begin until the accident is detected by
appropriate resources such as police or a towing company. phones are everywhere, including inside vehicles on the road.
Furthermore, if the accident is undetected by vehicles in the Their sensors could potentially feed data to traffic manage-
area, drivers may unwittingly use affected routes, believing any ment systems.
initial congestion is simply due to traffic volume vice an Expanding upon Wazes approach of capturing latent
obstruction. This only exacerbates the problem. There is an traffic data from smart phones to determine routing options
abundance of data that can be applied toward detecting traffic [12], this paper looks at using that latent traffic data as input to
accidents and automatically disseminating disruption alerts. machine learning algorithms tasked with distinguishing
Today, smart phones and other GPS enabled devices are nearly between normal congestion patterns and congestion caused by
ubiquitous in developed areas. This paper looks at how a traffic incident. The approach leverages a Multi-Agent
location information from individual smart phones of car System (MAS) to simulate traffic by modeling behavior of
drivers might be used to collectively detect traffic accidents. independent vehicles represented by agents traversing a
Specifically, how machine learning techniques might be roadway. Machine learning algorithms trained to recognize
applied to detect the presence of a traffic accident from passive normal congestion versus congestion due to an incident for any
traffic information of vehicles not involved in the incident. given time step of a simulation are used to determine the
Once an accident has been detected, steps can be taken to presence of an incident for the monitored stretch of road.
notify interested parties, such as emergency responders who
can resolve the incident and nearby vehicles that may want to A. NetLogo MAS Simulator
avoid the area. NetLogo is a programmable MAS simulation engine. It
allows researchers to rapidly instantiate models to observe
II. BACKGROUND behavior of both individual agents and system collectives. It
Optimizing road traffic to reduce congestion has been a provides an intuitive user interface where one can add buttons
and control widgets to easily manipulate a model to view
goal of many for some time. Research has gone into develop-
different scenarios. An added benefit of NetLogo is that it is
ing effective methods for relieving congestion including how Java based so it can easily be moved across platforms. It is also
to improve routing of traffic based on available network in- open source with a number of tools and extensions available,
formation [1] and increasing overall throughput of existing including BehaviorSpace which allows for data collection from
intersections [2]. repeatable simulations runs specified by the user. The

978-1-5090-4228-9/17/$31.00 2017 IEEE


experiment described in section 3 below uses NetLogo version Fig. 1. Traffic 2x2 Lanes Model with Accident
5.2 [7].

B. Machine Learning Algorithm Desciptions


Machine learning algorithms are a broad category of
algorithms that come in all shapes and sizes. The commonality
that they all share is a means to refine their outputs based on
previous input or results. The algorithms used in this study are
supervised learning classifiers. For this category, training data
with correct classification labels are provided. Each algorithm
refines itself with the training data so that it can more
accurately classify future examples of test data [5].
Three classifiers are used in this study. The classifiers used have to navigate around. Vehicles traveling the opposite
are Logistic Regression, a Bagging classifier of logistic direction are impacted too. They face a distraction, causing
regression, and an adaBoost classifier, as instantiated by scikit- them to slow down when passing the crash car. Fig. 1
learn libraries [6] included with Anaconda 2.3.0 [11]. Logistic illustrates the model in action with a traffic accident
regression is a linear model for classification which takes as represented by the red car. The model allows for setting
input a feature vector [5]. The other two algorithms are parameters for vehicle speed-up and slow-down rates and look-
ensemble methods which combine two or more sub-classifiers ahead which affects when a car begins speeding up or slowing
to form a committee that decides what class label input test down relative to cars in front of it. These settings work the
data should receive. same as in the original model. The crash-chance control sets
Bagging classifiers augment a base classifier, in this case, the probability that an accident will spawn with each time an
logistic regression. Multiple versions of the logistic regression agent moves. The model only allows one crash per simulation.
classifier are trained on different subsets of the training data to Once a crash occurs during a simulation run, it remains for the
form a committee of sub-classifiers that ultimately judge the duration of that run. A foundational assumption is that
input test data. The idea behind bagging is that by varying the everyone today carries a smart phone. Because of this, all
training data, errors spawned by over-correcting the model vehicles other than the crash vehicle are capable of reporting
during training are marginalized, leading to a more accurate passive traffic information.
classifier [5][6]. Using the built-in NetLogo tool, BehaviorSpace, experi-
Adaptive boosting, or adaboost, follows a similar theory, ment parameters were set for light, moderate, and heavy traffic
but with a decision tree as the base classifier. A committee of based on the total number of vehicles traversing the road in a
sub-classifiers is formed from a decision tree. With each given direction. Light traffic consisted of 25 cars, moderate
iteration of sub-classifier creation, training instances that were consisted of 100 cars, and heavy consisted of 126 cars. Indi-
mislabeled are more heavily weighted to ensure those instances vidual simulations then generated training data sets for each
receive the more attention as successive sub-classifiers are combination of traffic levels and corresponding test data sets
spawned [5][6]. with and without accidents. Other than the number of cars trav-
eling in each direction and crash-chance, variables remained
constant for each simulation run. Look ahead was set to one,
III. METHODOLOGY speed-up was set to 38, and slow-down was set to 76. For train-
The methodology used to explore how the machine ing data, 20,200 training samples were generated by running
learning algorithms perform at detecting accidents using 200 simulations of 101 steps each with crash-chance set to zero
passive traffic data was experimentation on simulated data. for half of them and 100% for the other half. 1,010 test samples
Simulation data sets were generated using NetLogo. The were then produced from 10 simulations of 101 steps each with
simulations used to produce both training data and test data to zero chance of a crash and 1,000 samples with 100% crash-
evaluate the effectiveness of different machine learning chance. Ten samples were removed from the 100% crash-
algorithms in correctly classifying observed information were chance set due to a model limitation preventing a scenario from
produced using the same model. The NetLogo modeling and being initialized with a crash already in place at time step 0.
simulation engine was used to generate data sets which were in
turn fed into machine learning classification algorithms, both Test cases were broken into no crash and all crash sets
for training and testing purposes, to determine the accuracy to simplify analysis. Since the model assumes that the crash car
with which the presence of a traffic accident could be detected. cannot self-report, determining the presence of a crash relies on
the behavior of the other cars in the system. A trivial solution
The model for the experiment was adapted from the would be to identify a crash scenario as that of one where some
NetLogo Traffic 2 Lanes Model [8] to simulate a monitored car reports a speed of zero based on it being impeded by the
stretch of road. The model was adapted for 2-way traffic with crash car. A simple classifier is introduced below to test for this
two lanes going each way and to allow insertion of a non- condition. In real life, this can be problematic as heavy conges-
reporting vehicle with a speed of zero. This crash vehicle tion leading to stop-and-go traffic also results in speeds of zero
creates an obstruction that vehicles traveling the same direction for one or more cars. Incorrect classifications on the no crash
would
set represent false positives for incident detection. Misclassify- IV. RESULTS
ing samples from the all crash set reflects false negatives. The results show promise. Overall, the most accurate clas-
Data output from BehaviorSpace running simulations sifier was adaboost, averaging over 85% accuracy across all
against the traffic model consisted of Comma Separated Value traffic conditions compared with the trivial classifier perform-
(CSV) spreadsheets. Data recorded was speed, acceleration, x- ing at just under 68% accuracy. All of the machine learning
coordinate, and y-coordinate of each reporting car at every classifiers outperformed the trivial classifier in terms of accu-
time step. No allowance for noise or inaccurate reporting was racy, though each also introduced false positives in doing so.
made for this initial experiment. All cars were reporting with While logistic regression barely performed better than the triv-
the exception of the crash car when present. The presence of ial case overall, the ensemble methods produced more accurate
the crash car was also recorded for use in classification train- results by 10% or greater. The most influential factor affecting
ing. Any time step beginning with the crash car present was accuracy of predictions of the machine learning algorithms was
reported as having a crash. Once the data was generated, it the amount of traffic. The algorithms produced their most ac-
needed to be converted into a Support Vector Machine format curate results under the medium-light traffic scenario, followed
able to be ingested by the scikit-learn machine learning instan- by light-light scenario. In general, the heavy traffic scenarios
tiation used for the testing [6]. This was accomplished via the were the toughest; however the variance between classifier
convert.c script which translates CSV data into SVM Light accuracy was highest during the medium-medium with no
[9]. The first column of the SVM Light text file indicated crash crashes scenario. The ensemble methods, bagging and ada-
classification and subsequent entries represented the four data boost, performed relatively the same or better than logistic re-
fields reported for each car. gression, as one would expect. The simple majority voting
method was able to improve the results in some cases but usu-
Once the converted data sets were ready, the classification ally reflected an average between the top two performers. The
algorithms needed to be instantiated and trained. Logistic re- trivial case performed poorly at detecting crashes, heavily fa-
gression is a standalone implementation, needing only to have voring a no crash verdict in all conditions. Though, it is not yet
the training data provided so it can fit its internal model to the ready for production use as a stand-alone decision solution, at
data. The bagging classifier was instantiated with logistic re- over 80% accurate for four out of six traffic conditions with
gression as its base classifier. 10 sub-classifiers were used with adaboost, the method of using passive traffic information to
the sample size and feature set parameters both set to 0.5, identify traffic accidents is promising and certainly merits
meaning sub-classifiers were trained with a random sample of more research. Fig. 2 and Fig. 3 show the results.
half of the training data instances and half of the data points for
each instance. The adaboost classifier was instantiated with 50 Fig. 2. Overall Classification Accuracy
sub-classifiers, the default number, all based on decision tree
classifiers.
Two additional classifiers were also instantiated. Voting
classifiers were introduced by version 0.17 of the scikit library.
They allow for ensemble classifiers to be formed from a heter-
ogeneous mix of base classifier types as opposed to bagging or
adaboost which use multiples of the same basic classifier type
[6]. A simple majority vote classifier was formed using logistic
regression, bagging, and adaboost classifiers as the ensemble
members. These sub-classifiers were instantiated with the same
parameters as their stand-alone counter parts described above.
The final classifier instantiated was a trivial case evaluator
that used no training and simply declared a crash present if one
or more cars reported a speed of zero. Short of a crashed car
declaring itself, this is the most obvious indicator of an incident Fig. 3. Classification Accuracy by Traffic Condition
and therefore the threshold performance acceptable of any
classification algorithm.
All four classifiers were fit to the training data applicable to
each scenario prior to each experiment. That is, they were
trained with the heavy-heavy training set simulating heavy
traffic traveling both directions prior to being fed the uniform
test data sets, all crash and no crash, for the heavy-heavy sce-
nario. This process was repeated for all six traffic conditions to
measure classifier results for each combination of traffic condi-
tions. The classifiers were re-instantiated between scenarios so
only the training data for the current scenario was considered
for any given experiment.
V. CONCLUSION AND FUTURE WORK Third International Joint Conference on Autonomous Agents and
Multiagent Systems-Volume 2. IEEE Computer Society, 2004.
As the results demonstrate, machine learning techniques [3] Ahmed, Tarem, Boris Oreshkin, and Mark Coates. "Machine learning
can identify accident conditions from passive traffic infor- approaches to network anomaly detection." Proceedings of the 2nd
mation more times than not. The data is already out there as USENIX workshop on Tackling computer systems problems with
nearly everyone today carries a smart phone. All that is needed machine learning techniques. USENIX Association, 2007.
is an application to consume it and produce actionable [4] Kamijo, S., Matsushita, Y., Ikeuchi, K., & Sakauchi, M. "Traffic
knowledge. The approach presented is promising. It demon- monitoring and accident detection at intersections." Intelligent
Transportation Systems, IEEE Transactions on 1.2 (2000): 108-118.
strates there is knowledge to be gained from available trace
[5] C. M. Bishop (2006). Pattern Recognition and Machine Learning. P. 3,
data and that ensemble methods are better suited than single 205, 655-657. Springer. ISBN 0-387-31073-8.
classifiers when dealing with highly variable dynamic envi- [6] (n.d.). Retrieved November 12, 2016, from scikit-learn: http://scikit-
ronments. learn.org
When compared to similar work, the approach performed [7] Wilensky, U. (1999). NetLogo. Center for Connected Learning and
Computer-Based Modeling, Northwestern University, Evanston, IL.
well. The machine learning algorithms performance was com- http://ccl.northwestern.edu/netlogo/.
parable to the results in [14] and they out-performed the meth- [8] Wilensky, U. (1998). NetLogo Traffic 2 Lanes model. Center for
od used by [13]. The higher percentage of vehicles able to re- Connected Learning and Computer-Based Modeling, Northwestern
port was likely a significant advantage. The approach however University, Evanston, IL.
requires more research and refinement to be useful in real- http://ccl.northwestern.edu/netlogo/models/Traffic2Lanes.
world conditions. Further, historical real-world data needs to [9] Reagents of the University of California. "convert.c. Retrieved
be used to show the approach can transition successfully from September 05, 2015. http://www.csie.ntu.edu.tw/
~cjlin/libsvm/faqfiles/convert.c.
the lab to production.
[10] CITIES, SMART. "Trace analysis and mining for smart cities: issues,
Future work will look at refining the data analysis to im- methods, and applications." IEEE Communications Magazine 121
prove prediction accuracy while using real-world data of both (2013).
traffic conditions and matching incident data. While historical [11] Anaconda. (n.d.). Retrieved November 12, 2016, from Continuum
data sets of this granularity were not publically available when Analytics: http://continuum.io
these experiments were run, efforts are already underway to [12] Waze. (n.d.). Retrieved November 12, 2016, from
https://www.waze.com
collect suitable data for use in future experiments. Individually,
[13] Asakura, Y., Kusakabe, T., Long, N. X., & Ushiki, T. (2015). Incident
the classifying algorithms performed well, with the ensemble detection methods using probe vehicles with on-board GPS equipment.
methods showing better results overall. Future efforts will look Transportation Research Procedia, 6, (pp. 17-27).
at expanding the ensemble approach to determine if a consen- [14] Kinoshita, A., Takasu, A., & Adachi, J. (2015). Real-time traffic
sus between algorithms or approaches can improve accuracy. incident detection using a probabilistic topic model. Information
Systems 54 , 169-188.
REFERENCES [15] Baiocchi, A., Cuomo, F., De Felice, M., & Fusco, G. (2015). Vehicular
ad-hoc networks sampling protocols for traffic monitoring and incident
[1] C. Gawron, Simulation-Based Traffic Assignment, Ph.D. dissertation, detection in Intelligent Transportation Systems. Transportation Research
Dept. Math, Kln Univ., Kln Germany, 1998. Part C: Emerging Technologies, 56, 177-194.
[2] Dresner, Kurt, and Peter Stone. "Multiagent traffic management: A
reservation-based intersection control mechanism." Proceedings of the

You might also like