Data Collection and Prediction of Urban Transport Flow Using Neural Networks
Data Collection and Prediction of Urban Transport Flow Using Neural Networks
Abstract—Smart cities can use artificial neural networks This set of information could be used to create an
to provide more accurate information about public artificial neural network that analyzes all th is data and
transportation schedules, and thus help the population tries to find a possible connection between them, so that it
plan their day to day activities. In this context, this paper creates an algorith m to predict situations of delay or
describes the essential steps for the acquisition and advance, becoming a tool to help p rofess ionals in the area
processing of data, and the creation of a neural network of data analysis and even to the user of the bus network.
model capable of predicting possible delays or advances
In the scenario of the bus lines of public transport, one of
on bus lines in the city of Curitiba, Paraná. The neural
the known issues is the compliance with the established
network considers traffic data, climate, time and history
schedules. Because it is a problem that in many cases is
of a public transport line. The article details all phases of
caused by factors that can not be controlled, it is not
collection and treatment, as well as how informatio n is
always possible to prevent it from happening. Predict
inserted into the network and what are the obtained
these delays allows interested parties to have this
results.
informat ion in advance and can decide how to work
Keywords—Neural networks, transport prediction, around the situation [3].
smart cities.
With population growth also increase the challenges for
I.INTRODUCTION government, business and academia [4]. The analysis of
data to create resources for intelligent cities has been the
Collective public t ransportation is of g reat impo rtance to
subject of several studies in both the academic and
Brazilian cit ies. These transit systems provide invaluable
business environments. This technique of collecting and
access and locomotion to some of the country's poorest
processing data can be of great value to companies and
citizens. Furthermore, collective public t ransportation
users who could benefit fro m a great amount of
vehicles help to reduce traffic congestion, mitigate
informat ion, planning and improving their activ ities, but
emissions from indiv idual automobiles and contribute to
also to the government that could benefit from the
an overall strategy to promote cleaner and
improvement in the service provided. Res earch shows
environmentally-friendly cities.
that the greatest cause of dissatisfaction among the
The urban public transport plays an important role in the Brazilian population with public transportation are the
current configuration of urban displacement as a means of problems with capillarity and frequency, slowness and
transport that provides the interconnection between the frequent delays, which, according to the research [5],
various regions of the cities. It is an alternative to the cause the population to us e less public transportation.
reduction of serious problems found in cit ies such as:
According to [6], congestion concerns all individuals.
congestion, traffic accidents and environmental impacts
Brazilian metropolitan areas live a n ightmare d ifficu lt to
[1].
measure, wh ich are urban congestion. The feeling o f
Forecasting public transport delays can be an optimized wasted time in front of a huge congestion is worrying,
tool that drivers and passengers could use to plan their and there are few people who know how to live with this
daily tasks. This prediction can be obtained by analyzing reality naturally. In recent years, millions of people have
data directly or indirectly linked to the line punctuality lost money and time because of congestion [7] and there
situation. Data collection is an important aspect of urban is a considerable increase in the p rice of car t rips during
computing and is a determining factor in build ing smart congestion [8, 9].
cities [2].
Some data such as the day of the week, the climate Holiday eve 0
description and the bus situation, are in text format and
Holiday 0
should be changed to number, since the neural network
model can use numbers as input data to become more After Holiday 0
optimized. In the first case the following transformation
was made, Sunday for nu mber 1, Monday for nu mber 2, Time 29
Tuesday for number 3 and so on. For the climate
Table 1: A collection of input data held on August 24 at
description the following criterion was adopted, all
noon and eleven minutes.
possible answers were listed and for each one assigned a
number, for examp le "Cloudy weather" was transformed The output data is three, the first one being a bit
into 1 and "Sunny" in nu mber 4. Fo r the situation of the representing delay or not, the second is ahead or not, and
bus the same technique was used, however using 3 the last is on time or not. In no case two of these bits can
numbers each being 0 or 1 depending on the situation, have the value 1, since the bus can not be delayed and
delayed became 100, early 010 and on schedule became advanced at the same time, for example. In Table 2 we
001. see an examp le of output data, where the condition is 100,
that is, delay at the time of collection.
An examp le o f collection is shown in Table 1 and in it the
following data are present: day of the week, which in the Output Data Value
example is the number 6 which is equivalent to a Friday,
day and month of collection which in this case is a day 24 Late 1
of August, the hour and minute of collection, in case
12:11, the temperature in the city at the time of collection, Early 0
29 degrees celsius in the example, also the description of
On Time 0
the current climate, in case 4 that is "sunny", the humidity
of the air, in the collection equal to 40%, condition slug 1 Table 2: A collection of output data held on August 24 at
that is equal to "clear day", then a code of the climate noon and eleven minutes.
condition in question (code generated by the web service When performing the first tests it was observed that it
itself), the nu mber of events collected in the region, would be better to change the qualitative data also for the
holiday and, lastly, the current time between the start and binary form, since the neural network works with weights
end of the line, at that moment was 29 minutes. and sizes when it co mes to numbers. The quantitative data
were kept in their decima l form. Leaving in the
qualitative form might seem to the neural network that
Data Value Monday is less than Saturday for examp le, or that
description 4 is larger than description 1, which is not a
Weekday 6
truth, the idea that should be passed to network is another,
Day 24 it should be something like "it's Monday", yes or no. Then
the following change was made, changing the day -of-
Month 8 week fields, description, quick description and condition
code to a binary form that would be, yes or no for each
Hour 12
possible case, 1 or 0, respectively. For the day of the
Minute 11 week, for examp le, the nu mber of the day has become 7
values, each one being equivalent to one day of the week.
Temp 29 Monday, for examp le, was 1000000, and Tuesday was
0100000. This formatting was used for all cases cited.
Description 4
A data acquisition was done for 3 months, only to verify
Humidity 40 the operation and then continue the data collection,
resulting in that time in 3000 data obtained. After this
Condition Slug 1
collection the data were used to create the neural network.
According to [25], Neural Net works, or Artificial Neural 4.3 Results obtained
Networks, find applicat ions in very diverse fields. By
The network, after the train ing, obtained a response with
virtue of their ability to learn fro m input data, with or
84.59% accuracy in the validation data and 92% in the
without a teacher, and by representing a technology
training data so far, that is, the network used some of the
rooted in various disciplines (such as neuroscience, math,
data collected to train and the rest to verify, where 90%
statistics, physics, computer science, and engineering).
was for training. When comparing the results obtained
Some examples o f these fields are modeling, time series
with the results collected in 84.59% of the cases, the
analysis, pattern recognition, signal processing, and
network obtained a correct answer (the highest percentage
control. As stated in [26], artificial neural networks can be
was the correct answer).
considered as a methodology to solve problems
characteristic of artificial intelligence. These data are presented as a chance to occur, for
example, a forecast for June 20 with rain at 12:00 was
Neural networks are massive and parallel systems, made and the following results were obtained: 23.74%
composed of simple processing units that compute certain chances that the bus is late, 9.81% chance of can be early
mathematical functions [27]. Using a set of examples and 66.43% chances that the bus will be on time. So the
presented, the networks are able to generalize the final response fro m the netwo rk is that the bus will
assimilated knowledge to a set of unknown data. They probably be on time on June 20 at noon. The result that
also have the ability to ext ract non-explicit characteristics can be verified in the day and time in question, if the
fro m a set of information provided to them as examples climatic conditions are predicted correctly.
[28].
In view of the results presented here, the network presents
4.2 Experiment setup
a reasonable response considering some field tests with
Keras is described in its documentation [29] as an open positive results and possibly when performing a larger
source neural network library written in Python. It is able data collection the network may present an even better
to work with tools like Google TensorFlow [30]. response.
Designed to enable rapid experimentation of deep neural
V.CONCLUSION
networks, it focuses on being easy to use, modular and
The population satisfaction with public services is fro m
extensible. It is an open source library for nu merical
great importance for imp roving the quality of life,
computation and machine learn ing [31], and used as the
facilitating day-to-day living, and raising the level o f
neural network of this work.
satisfaction with the government. The area of public
To make use of the tools a code in Python language was transportation has a huge problem with delays and
developed, with data input and output in a Comma - requires methods that obtain good accuracy in their
separated values (CSV) file that allo ws the creation of predictions. Considering this need, this work proposed an
tables with data separated by commas. The number o f approach for the city of Curitiba, focused on the
training times was defined, a hit and error quantity collection of informat ion that may be direct ly related to
classifier was created and an interface showing the delays. The proposed approach is based on the collection
response of the system to an input (late, early or on time). of data fro m various mo ments and sources in a way that it
The best result was obtained without changing the makes possible the use of neural networks for prediction.
optimizer and with 10 training periods. The model uses