Bda 2019 Optimization 0
Bda 2019 Optimization 0
Schedule
1 Introduction
Bus systems are the backbone of public transportation in the US, carrying over
47% of all public passenger trips and 19,380 million passenger miles in the US
[18] . For the majority of cities in the US which do not have enough urban forms
or budget to build expensive transit infrastructures like subways, the reliance is
on buses as the most important transit system since bus systems have advantages
2 Basak, Sun, Sengupta and Dubey
of relatively low cost and large capacity. Nonetheless, the bus system is also one
of the most unpredictable transit modes. Our study found that the average on-
time performance across all routes of Nashville bus system was only 57.79% (see
Section 6.1). The unpredictability of delay has been selected as the top reason
why people avoid bus systems in many cities [2].
Providing reliable transit service is a critical but difficult task for all metropo-
lis in the world. To evaluate service reliability, transit agencies have developed
various indicators to quantify public transit systems through several key perfor-
mance measurements from different perspectives [4]. In the past, a number of
technological and sociological solutions have helped to evaluate and reduce bus
delay. Common indicators of public transit system evaluation include schedule
adherence, on-time performance, total trip travel time, etc. In order to track
the transit service status, transit agencies have installed AVL on buses to track
their real-time locations. However, the accuracy of AVL in urban areas is quite
limited due to the low sampling rate (every minute) and the impact of high
buildings on GPS devices. To have some basic controls during bus operation,
public transit agencies often use time point strategies, where special timing bus
stops (time points are special public transit stops where transit vehicles try to
reach at scheduled times) are deployed in the middle of bus routes to provide
better arrival and departure time synchronizations.
An effective approach for improving bus on-time performance is creating
timetables that maximize the probability of on-time arrivals by examining the
actual delay patterns. When designing schedules for real-world transport sys-
tems (e.g. buses, trains, container ships or airlines), transport planners typi-
cally adopt a tactical-planning approach [10]. Conventionally, metro transit en-
gineers analyze the historical data and adjust the scheduled time from past
experience, which is time consuming and error prone. A number of studies have
been conducted to improve bus on-time performance by reliable and automatic
timetabling. Since the timetable scheduling problem is recognized to be an NP-
hard problem [28] , many researchers have employed heuristic algorithms to solve
the problem. The most popular solutions include ad-hoc heuristic searching algo-
rithms (e.g. greedy algorithms), neighborhood search (e.g. simulated annealing
(SA) and tabu search (TS)), evolutionary search (e.g. genetic algorithm) and
hybrid search [24].
However, there are few stochastic optimization models that focus on opti-
mizing bus timetables with the objective of maximizing the probability of bus
arrivals at timepoint with delay within a desired on-time range (e.g. one minute
early and five minutes late), which is widely used as a key indicator of bus service
quality in the US [1]. A timepoint is a bus stop that is designed to accurately
record the timestamps when buses arrive and leave the stop. Bus drivers use
timepoints to synchronize with the scheduled time. For example, to quantify bus
on-time arrival performance, many regional transit agencies use the range of [-
1,+5] minutes compared to the scheduled bus stop time as the on-time standard
to evaluate bus performance using historical data [1]. The actual operation of
bus systems is vulnerable to many internal and external factors. The external
Data-Driven Optimization of Public Transit Schedule 3
Fig. 1. The proposed toolbox for bus on-time performance optimization. City planners
use bus schedule, historical trip information and desired on-time range and layover time,
and get outputs of optimized timetable as well as estimated on-time performance.
factors include urban events (e.g., concerts, sporting events, etc.), severe weather
conditions, road construction, passenger and bicycle loading/offloading, etc. One
of the most common internal factors is the delay between two consecutive bus
trips, where the arrival delay of previous trips causes departure delay of the
next trip. Furthermore, there are monthly and seasonal variation in the actual
delay patterns, but most transit agencies publish a uniform timetable for the
next several months despite the variations. How to cluster the patterns and op-
timize timetables separately remains an open problem. Furthermore, heuristic
optimization techniques have attracted considerable attention, but finding the
optimal values of hyper-parameters are difficult, since they depend on nature
of problem and the specific implementation of the heuristic algorithms, and are
generally problem specific.
ing the on-time performance and execution time over several runs. The overall
workflow of the proposed optimization mechanisms is illustrated in Figure 1.
The rest of the paper is organized as follows: Section 2 compares our work
with related work on transit timetabling; Section 3 presents the problem for-
mulation; Section 4 presents the details of the transit data stores; Section 5
discusses the timetable optimization mechanisms used; Section 6 evaluates the
performance of the optimization mechanisms and presents sensitivity analysis
results; Section 7 presents concluding remarks and future work.
This section compares our system with related work on transit timetable schedul-
ing. A number of studies have been conducted to provide timetabling strategies
for various objectives: (1) minimizing average waiting time [27] (2) minimizing
transfer time and cost [7][12][24], (3) minimizing total travel time [17], (4) max-
imizing number of simultaneous bus arrivals [9], [13], (5) minimizing the cost
of transit operation [26], (6) minimizing a mix of cost (both the user’s and the
operator’s) [6].
The design of timetable with maximal synchronizations of bus routes without
bus bunching has been researched by Ibarra-Rojas et al. [13]. The bus synchro-
nization strategy has been discussed from the perspective of taking waiting time
into account in the transfer stops in the work of Eranki et al. [9]. An improved
GA in minimizing passenger transfer time considering traffic demands has been
explored by Yang et al. [12]. Traffic and commuter demand has also been consid-
ered in the work by Wang et al. [27]. Other than employing optimization algo-
rithms several deep learning techniques [22] have been applied in bus scheduling
problems [15].
Nayeem et al. [17] set up the optimization problem over several criteria, such
as minimizing travel time and number of transfers and maximizing passenger
satisfaction. A route design and neighborhood search through genetic algorithm
minimizing number of transfers has been discussed by Szeto et al. [24]. Zhong
et al. [29] used improved Particle Swarm Optimization for recognizing bus rapid
transit routes optimized in order to serve maximum number of passengers.
Table 1. The scheduled time and recorded actual arrival and departure time of two
sequential trips that use the same bus of route 4 on Aug. 8, 2016. The arrival delay at
the last timepoint of the first trip accumulates at the first timepoint of the second trip.
Timepoints
MCC4 14 SY19 PRGD GRFSTATO
Scheduled Time 10:50 AM 11:02 AM 11:09 AM 11:18 AM
Trip 1 Actual Arrival Time 10:36 AM 11:10 AM 11:18 AM 11:27 AM
Actual Departure Time 10:50 AM 11:10 AM 11:18 AM 11:30 AM
Scheduled Time 11:57 AM 11:40 AM 11:25 AM 11:20 AM
Trip 2 Actual Arrival Time 12:11 PM 11:51 AM 11:34 AM 11:27 AM
Actual Departure Time 12:11 PM 11:51 AM 11:34 AM 11:30 AM
3 Problem Formulation
Typically, transit delay are not only affected by external factors (such as traffic,
weather, travel demand, etc.), but also by some internal factors. For example,
the accumulated delay occurred on previous trips may cause a delay in consec-
utive trips by affecting the initial departure time of the next trip. In order to
illustrate the problem context with simplicity and without generality, we take
two sequential bus trips of route 4 in Nashville as an example (the scheduled
time and the actual arrival and departure time recorded on Aug. 8, 2016 are
6 Basak, Sun, Sengupta and Dubey
Fig. 2. (a) A route segment on bus route 3 leaving downtown; (b) The variance of
actual travel time and (c) the relative standard deviation of actual travel times on a
bus route segment in time period between Sept. 1, 2016 and Feb. 28, 2017.
For a given bus trip schedule b, let H = {h1 , h2 , ..., hm } be a set of m his-
torical trips with each trip passing n timepoints {s1 , s2 , ..., sn }. So the on-time
performance of the bus trip schedule b can be defined as a ratio of an indica-
tor function I(hi , sj ) summed over all timepoints for all historical trips to the
product of the total number of historical trips and total number of timepoints.
The indicator function I(hi , sj ) is 1 if di,j ∈ [tearly , tlate ], otherwise 0, where
di,j = tarrival
hi ,sj − Tharrival
i ,sj
Data-Driven Optimization of Public Transit Schedule 7
4 Data Store
4.1 Data Sources
We established a cloud data store and reliable transmission mechanisms to feed
our Nashville Metropolitan Transit Authority (MTA) updates the bus schedule
information every six months and provides the schedule to the public via GTFS
files. In order to coordinate and track the actual bus operations along routes,
MTA has deployed sensor devices at specially bus stops (called timepoints) to
accurately record the arrival and departure times. In Nashville, there are over
2,700 bus stops all over the city and 573 of them are timepoint stops. City
planners and MTA engineers analyze the arrival and departure records regularly
to update the transit schedule. The details of the datasets are as follows:
– Static GTFS. This dataset defines the static information of bus schedule
and associated geographic information, such as routes, trips, stops, depar-
ture times, service days, etc. The dataset is provided in a standard transit
schedule format called General Transit Feed Specification (GTFS).
– GTFS-realtime. This dataset is recorded real-time transit information in
GTFS-realtime format, which include bus locations, trip updates and service
alerts. The GTFS-realtime feed is collected and stored in one-minute interval.
– Timepoints. This dataset provides accurate and detailed historical arrival
and departure records at timepoint stops. The information include route,
trip, timepoint, direction, vehicle ID, operator, actual arrival and departure
time, etc. The dataset is not available in real-time but collected manually
by Nashville MTA at the end of each month.
Even though the same timepoint datasets are utilized in the study, the pro-
posed method is not limited to the timepoint datasets and can use some surro-
gate data sources: (1) automatic passenger counters (APC) data: APC datasets
records both passenger counts and departure/arrival times at stops (2) GTFS-
realtime feed: the real-time bus locations reported by automatic vehicle locator
(AVL) installed on buses. Compared with timepoint datasets, APC data also
provides accurate times at normal stops thus it is the most suitable alternative
dataset. However, GTFS-realtime suffers from low sampling rate and low accu-
racy in the city and may reduce the performance of the proposed mechanism.
Fig. 3. The feature vectors [mean, standard deviation, median] of the travel time in 4
months of 2016 for a segment (WE23-MCC5 5) on a bus trip of route 5.
there at least for 4 minutes (the difference between the actual and scheduled
arrival time) irrespective of presence of passengers, the dwell time caused by
passengers is calculated as the additional time taken for departure after the
scheduled time (11:04 AM - 11:02 AM = 2 minutes).
– On the other hand, if the bus arrived later at 11:05 AM and departed at
11:06 AM, then the dwell time caused by passengers is calculated as the
additional time spent after the actual arrival time (11:06 AM - 11:05 AM =
1 minutes).
Arrival Time Estimation. The arrival time of a bus at a stop is impacted by
two factors: (1) travel times at segments before the stop, and (2) dwell times at
the previous stops. We assume that a bus will wait until the scheduled time if it
arrives earlier than the scheduled time, and the historical travel time between two
timepoints will remain the same in the simulation. In order to obtain an estimate
of the arrival time, the historical dwell time caused by commuters (which in
turn is representative of the historical travel demand), is factored into account
by adding it to the arrival time at any timepoint. The simulation will stall for
an additional time till the new scheduled time is reached in the event that the
previous sum is earlier than the new scheduled time. By taking into consideration
the simulated departure time stdepart
h,sj at previous timepoint sj , the actual travel
time tarrive
sj+1 − tdepart
sj between sj and sj+1 , the dwell time tdwell
sj+1 , the simulated
departure time stdepart
h,sj+1 at a timepoint sj+1 can be found out. The new schedule
depart
time Th,ss j+1
at sj+1 is expressed as:
Then the CDF of x in range [x + tearly , x + tlate ] can be calculated using the
following equation:
vi,j and xi,j represent the velocity and position of the i-th particle in the j-th
dimension. Cognition and social acceleration coefficients are indicated by c1 and
c2 , whereas r1 and r2 are random numbers uniformly distributed between 0 to
1. pi,j represents a particle0 s personal best and pg,j represents the global best
of the population. w acts as an inertial weight factor controlling the exploration
and exploitation of new positions in the search space and t denotes the number
of iterations.
The problem is formulated as fitness maximization problem in order to bring
out optimal travel times to improve on-time performance. Hence the personal
best of a particle is updated as follows at the end of each iteration.
(
pi,j (t), if f itness(xi,j (t + 1)) ≤ f itness(pi,j (t))
pi,j (t + 1) = (10)
xi,j (t + 1), if f itness(xi,j (t + 1)) > f itness(pi,j (t))
Termination The termination condition set for PSO is the predefined max-
imum number of iterations. Since the optimized on time performance is different
for each trip, the termination condition is not set as any predefined upper limit
14 Basak, Sun, Sengupta and Dubey
of the fitness value. With other hyperparameters fixed PSO can produce the
optimal solution approximately in 30 iterations for this problem.
The pseudo code for PSO is discussed in Algorithm 3. Historical timepoint
datasets are used to conduct the particle swarm optimization algorithm for this
problem. The input includes on-time range, maximum number of iterations,
number of particles in the population size, inertia factor, cognition and social
acceleration coefficient, bus trip and upper limit of number of month clusters.
Fig. 4. The chart shows the simulation results of on-time performance and execution
times for GA and PSO to run 10 times.
Data-Driven Optimization of Public Transit Schedule 15
Fig. 5. The original on-time performance and the optimized on-time performance using
greedy algorithm, genetic algorithm with/without clustering analysis and PSO algo-
rithm.
timepoint stops and 5 segments between the 6 timepoint stops. The bus trips
with direction from Downtown are selected. The goal is to maximize the on-time
performance for these trips by optimizing the schedule time at the 6 timepoint
stops.
Figure 6(b) shows the simulation results of choosing different population
sizes. Increasing the population size from 10 to 90 results a better on-time perfor-
mance, however, increasing the size ever further doesn’t help making the on-time
performance any better. On the other hand, the total time increases linearly as
the population size grows. So a population size around 90 is the optimal size to
use.
Figure 6(c) illustrates results of using different crossover rates. The optimized
on-time performance remains almost the same for the crossover range, but there
is a significant difference in terms of the total execution time. The crossover rate
impacts the exploitation ability. A proper crossover rate in the middle of the
range can faster the process to concentrate on an optimal point.
Figure 6(d) show the simulation results when using different mutation rates.
The total execution time is small when the mutation rate is either very small or
very large. Mutation rates controls the exploration ability. During the optimiza-
tion, a small mutation rate will make sure the best individuals in a population do
not vary too much in the next iteration and thus is faster to get stable around the
optimal points. So we suggest setting a very small mutation rates when running
the proposed algorithm.
Fig. 6. (a) Timepoints on bus route 5 in Nashville [23], (b) The the on-time perfor-
mance and overall execution time for different population sizes for GA, (c) The on-time
performance and overall execution time for different crossover rates, which controls the
exploitation ability of the GA, (d) The on-time performance and overall execution time
for different mutation rates, which controls the exploration ability of the GA, (e) The
on-time performance and overall execution time for different inertia weights, exploring
new regions of search space in PSO, (f) The on-time performance and overall execution
time for different cognition acceleration coefficients c1, in PSO, (g) The on-time per-
formance and overall execution time for different social acceleration coefficients c2, in
PSO, (h) The on-time performance and overall execution time for various population
size, in PSO
18 Basak, Sun, Sengupta and Dubey
increase in execution time. So, an optimal value to choose for c1, will be within
the range specified.
Figure 6(g) shows the simulation result for optimizing for the social accelera-
tion coefficient c2 by varying it. The particle has a velocity component towards
the global best position weighted by c2, hence the term social. It is observed that
the optimized on-time performance is improved when c2 is equal to 5 with less
execution time. Also, c2 being 4 produces good results, but there is an increase
in execution time at that value. But the overall effect of parameter c2 affects
the on-time performance only within a range of two percent. Sometimes, PSO
is able to produce optimal or near optimal performance, when all other hyper-
parameters are fixed, and thus is not sensitive to a particular hyperparameter,
which is the case considered here. So an optimal value to choose for c2, may be
close to 5, maintaining approximately a ratio near to 1:1:1 among w, c1 and c2.
Figure 6(h) shows the simulation result for optimizing the number of parti-
cles by varying the population size. It is observed that the optimized on-time
performance is maximized when the number of particles reaches 30. The execu-
tion time increases with the number of particles, so it is better to choose such
number of particles that produces the best pair in the accuracy-execution time
tradeoff. So, the population size can be chosen as 30 in this case as it yields
equally efficient results with a relatively small execution time.
Although a good insight about choice of hyperparameters can be obtained
from this sensitivity analysis, variations of the hyperparameters may produce
better results in specific routes.
7 Conclusion
Acknowledgments
This work is supported by The National Science Foundation under the award
numbers CNS-1528799 and CNS-1647015 and 1818901 and a TIPS grant from
Vanderbilt University. We acknowledge the support provided by our partners
from Nashville Metropolitan Transport Authority.
References
1. Arhin, S.A., Noel, E.C., Dairo, O.: Bus stop on-time arrival performance and cri-
teria in a dense urban area. International Journal of Traffic and Transportation
Engineering 3(6), 233–238 (2014)
2. Association, A.P.T.: Ridership report archives (2017)
3. Banks, A., Vincent, J., Anyakoha, C.: A review of particle swarm optimiza-
tion. part ii: hybridisation, combinatorial, multicriteria and constrained optimiza-
tion, and indicative applications. Natural Computing 7(1), 109–124 (Mar 2008).
https://doi.org/10.1007/s11047-007-9050-z, https://doi.org/10.1007/s11047-007-
9050-z
4. Benn, H.: Bus route evaluation standards, transit cooperative research program,
synthesis of transit practice 10. Transportation Research Board, Washington, DC
(1995)
5. Bouyer, A., Hatamlou, A.: An efficient hybrid clustering method
based on improved cuckoo optimization and modified particle
swarm optimization algorithms. Applied Soft Computing 67, 172 –
182 (2018). https://doi.org/https://doi.org/10.1016/j.asoc.2018.03.011,
http://www.sciencedirect.com/science/article/pii/S1568494618301273
6. Chakroborty, P.: Genetic algorithms for optimal urban transit network design.
Computer-Aided Civil and Infrastructure Engineering 18(3), 184–200 (2003)
7. Chakroborty, P., Deb, K., Subrahmanyam, P.: Optimal scheduling of urban transit
systems using genetic algorithms. Journal of transportation Engineering 121(6),
544–553 (1995)
8. Dhabal, S., Sengupta, S.: Efficient design of high pass fir filter using quantum-
behaved particle swarm optimization with weighted mean best position. In:
Proceedings of the 2015 Third International Conference on Computer, Com-
munication, Control and Information Technology (C3IT). pp. 1–6 (Feb 2015).
https://doi.org/10.1109/C3IT.2015.7060145
9. Eranki, A.: A model to create bus timetables to attain maximum synchronization
considering waiting times at transfer stops (2004)
10. Fan, W., Machemehl, R.B.: Optimal transit route network design problem with
variable transit demand: genetic algorithm approach. Journal of transportation
engineering 132(1), 40–51 (2006)
11. Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learn-
ing. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1st edn.
(1989)
12. Hairong, Y., Dayong, L.: Optimal regional bus timetables using improved genetic
algorithm. In: Intelligent Computation Technology and Automation, 2009. ICI-
CTA’09. Second International Conference on. vol. 3, pp. 213–216. IEEE (2009)
13. Ibarra-Rojas, O.J., Rios-Solis, Y.A.: Synchronization of bus timetabling. Trans-
portation Research Part B: Methodological 46(5), 599–614 (2012)
20 Basak, Sun, Sengupta and Dubey
14. Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Neural Networks, 1995.
Proceedings., IEEE International Conference on. vol. 4, pp. 1942–1948 vol.4 (Nov
1995). https://doi.org/10.1109/ICNN.1995.488968
15. Khiari, J., Moreira-Matias, L., Cerqueira, V., Cats, O.: Automated setting of bus
schedule coverage using unsupervised machine learning. In: Bailey, J., Khan, L.,
Washio, T., Dobbie, G., Huang, J.Z., Wang, R. (eds.) Advances in Knowledge
Discovery and Data Mining. pp. 552–564. Springer International Publishing, Cham
(2016)
16. Kodinariya, T.M., Makwana, P.R.: Review on determining number of cluster in
k-means clustering. International Journal 1(6), 90–95 (2013)
17. Nayeem, M.A., Rahman, M.K., Rahman, M.S.: Transit network design by genetic
algorithm with elitism. Transportation Research Part C: Emerging Technologies
46, 30–45 (2014)
18. Neff, J., Dickens, M.: 2016 public transportation fact book (2017)
19. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation
of cluster analysis. Journal of computational and applied mathematics 20, 53–65
(1987)
20. Sengupta, S., Basak, S., Peters, R.A.: Data clustering using a hybrid of fuzzy
c-means and quantum-behaved particle swarm optimization. In: 2018 IEEE 8th
Annual Computing and Communication Workshop and Conference (CCWC). pp.
137–142 (Jan 2018). https://doi.org/10.1109/CCWC.2018.8301693
21. Sengupta, S., Basak, S., Peters, R.A.: Particle swarm optimization: A survey of his-
torical and recent developments with hybridization perspectives. Machine Learn-
ing and Knowledge Extraction 1(1), 157–191 (2018), http://www.mdpi.com/2504-
4990/1/1/10
22. Sengupta, S., Basak, S., Saikia, P., Paul, S., Tsalavoutis, V., Atiah, F., Ravi,
V., Peters, R.A.: A review of deep learning with special emphasis on ar-
chitectures, applications and recent trends. CoRR abs/1905.13294 (2019),
http://arxiv.org/abs/1905.13294
23. Sun, F., Samal, C., White, J., Dubey, A.: Unsupervised mechanisms for optimiz-
ing on-time performance of fixed schedule transit vehicles. In: Smart Computing
(SMARTCOMP), 2017 IEEE International Conference on. pp. 1–8. IEEE (2017)
24. Szeto, W.Y., Wu, Y.: A simultaneous bus route design and frequency setting prob-
lem for tin shui wai, hong kong. European Journal of Operational Research 209(2),
141–155 (2011)
25. Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a
data set via the gap statistic. Journal of the Royal Statistical Society: Series B
(Statistical Methodology) 63(2), 411–423 (2001)
26. Ting, C.J., Schonfeld, P.: Schedule coordination in a multiple hub transit network.
Journal of urban planning and development 131(2), 112–124 (2005)
27. Wang, Y., Zhang, D., Hu, L., Yang, Y., Lee, L.H.: A data-driven and optimal bus
scheduling model with time-dependent traffic and demand. IEEE Transactions on
Intelligent Transportation Systems 18(9), 2443–2452 (2017)
28. Wu, Y., Yang, H., Tang, J., Yu, Y.: Multi-objective re-synchronizing of bus
timetable: Model, complexity and solution. Transportation Research Part C:
Emerging Technologies 67, 149–168 (2016)
29. Zhong, S., Zhou, L., Ma, S., Jia, N., Zhang, L., Yao, B.:
The optimization of bus rapid transit route based on an im-
proved particle swarm optimization. Transportation Letters 10(5),
257–268 (2018). https://doi.org/10.1080/19427867.2016.1258972,
https://doi.org/10.1080/19427867.2016.1258972