Equitable IAM paper
Equitable IAM paper
A R T I C L E I N F O A B S T R A C T
Keywords: Bridges play a critical role in transportation networks; however, they are vulnerable to deterioration, aging, and
Flood risk management degradation, especially in the face of climate change and extreme weather events such as floodings. Furthermore,
Deep reinforcement learning bridges can significantly affect social vulnerability; their damage or destruction can isolate communities, inhibit
Geographic information system
emergency responses, and disrupt essential services. Maintaining critical bridges in a cost-effective and sus
Social vulnerability
Equitable infrastructure management
tainable manner is crucial to ensure their longevity and protect vulnerable communities. To address the main
tenance optimization problem of bridge systems considering the effects of time deterioration, flood degradation,
and social vulnerability, this study proposes a deep reinforcement learning algorithm to optimally allocate re
sources to bridges that are at expected cost of failure due to scour. The algorithm considers the effects of flood
degradation with different return periods and is trained using a Markov Decision Process as the environment. The
study conducts four flood simulation scenarios using Geographic Information System data. The findings suggest
that the deep reinforcement learning algorithm proposes a sequence of repair actions that outperforms the status
quo, currently employed by bridge managers. The significance of this study lies in its valuable insights for cities
worldwide on how to effectively optimize their limited resources for the maintenance and rehabilitation of
critical infrastructure systems to decrease portfolio cost and increase social equity.
1. Introduction bridges rather than considering the entire bridge portfolio in a region
with a systematic approach. This can lead to suboptimal resource allo
Bridges are vital components of the transportation network, acting as cation, increased maintenance costs, and neglect of system-wide needs
essential conduits for societal commuting needs. Maintaining the safety and vital social objectives. While several research efforts (Tao, Lin, &
and functionality of these bridge systems is vital for community well- Wang, 2021; Zhang & Alipour, 2020; Zhang & Wang, 2017) have
being and interconnectedness (Zhang, Agbelie, & Labi, 2015). In the ventured into developing systematic and optimized bridge maintenance
United States, the Federal Highway Administration (FHWA) has re strategies, notable gaps remain, particularly in addressing flood-induced
ported that over 46,000 bridges, approximately 7.5% of all US bridges, damage and integrating social equity considerations into maintenance
are structurally deficient necessitating urgent repair or replacement prioritization.
(FHWA, 2020). The situation of aged infrastructure is exacerbated by Disruptions in bridge functionality and the associated costs of failure
increasing threats from climate change and environmental extremes, significantly impact surrounding communities, particularly disadvan
heightening bridges’ vulnerability to deterioration taged ones. These communities often rely heavily on public trans
(Habibzadeh-Bigdarvish, Yu, Lei, Li, & Puppala, 2019; Ilbeigi & Mei portation and local infrastructure, including bridges, to access essential
mand, 2020; Liu & El-Gohary, 2022; Manafpour et al. 2018; Sinha, Labi, services due to limited mobility options, financial constraints, and
& Agbelie, 2017) and potential failure (Mohammadi & El-Diraby, 2021; geographic segregation (Koks, Jongman, Husby, & Botzen, 2015; Tuc
Xiong, Cai, Zhang, Shi, & Xu, 2023; Yang & Frangopol, 2018; Zhang, Liu, cillo & Spielman, 2022; Xu & Guo, 2022). As such, incorporating social
Liu, Lan, & Yang, 2022). equity into bridge maintenance prioritization is crucial to mitigate the
Current bridge management practices typically focus on individual disproportionate effects of bridge failures on vulnerable populations
* Corresponding author at: W171, Kingsbury Hall, 33 Academic way, Durham, New Hampshire 03824, USA.
E-mail address: [email protected] (F. Han).
https://doi.org/10.1016/j.scs.2024.105792
Received 22 March 2024; Received in revised form 12 July 2024; Accepted 29 August 2024
Available online 30 August 2024
2210-6707/© 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
A. Taherkhani et al. Sustainable Cities and Society 114 (2024) 105792
(Anderson, Kiddle, & Logan, 2022; Plough et al., 2013; Yang, Ng, Xu, & Despite the significant advancement in these pioneering works, there
Skitmore, 2018). Despite its importance, few studies have incorporated remain two notable gaps to be resolved: 1) the spatial consideration of
social vulnerability as a metric in their decision-making frameworks. flood-induced damages to a bridge system in maintenance strategy
Karakoc, Barker, Zobel, and Almoghathawi (2020) pioneered the inte development; and 2) the integration of social equity considerations to
gration of seven social vulnerability metrics into the repair planning of ensure the development of infrastructure solutions that are just and
disrupted interdependent infrastructure systems during a single earth equitable for all communities. This paper proposes a DRL-based opti
quake event. In the realm of flood impact on socially vulnerable com mization framework tailored to bridge systems in flood-prone areas,
munities, Chang et al. (2021) developed an interconnected integrating considerations of aging, flood damage, economic costs, and
social-ecological-technological framework to assess urban flood vul social equity. Through this research, we aim to provide stakeholders
nerabilities, using five social vulnerability metrics alongside ecological with a robust tool for enhancing bridge resilience while ensuring fair
and technological metrics to create vulnerability maps for six major U.S. and equitable infrastructure development for all communities.
cities. Additionally, Chen et al. (2021) conceptualized social vulnera Novelty of this work: 1) Flood Damage Consideration: It specifically
bility to flooding through three dimensions: exposure, sensitivity, and addresses the gap in current bridge management strategies by consid
adaptability. Their study found that affordable housing communities are ering spatial analysis of flood-induced damages. This includes modeling
less vulnerable to flooding’s adverse impacts Overall, while some studies and anticipating the effects of flooding on bridge integrity and deter
have focused on integrating social vulnerability and infrastructure mining proactive maintenance strategies to mitigate these risks; 2) So
maintenance planning, a significant gap remains for further quantitative cial Equity Integration: Another novel aspect is the integration of social
research in this area. Addressing these challenges requires a holistic and equity considerations into the maintenance prioritization process. This
systematic maintenance strategy that accounts for infrastructure aging, ensures that the developed strategies not only enhance the resilience of
natural hazards, cost optimization, and social equity over long-term bridge infrastructure but also do so in a manner that is just and equi
planning horizons (Abdelkader, Moselhi, Marzouk, & Zayed, 2022; table, minimizing the disproportionate impact of bridge load limit,
Bibri & Krogstie, 2017; Jaafaru & Agbelie, 2022). bridge closure, and bridge failure on vulnerable communities; 3)
Recent advancements in deep reinforcement learning (DRL) offer Comprehensive Framework: The proposed DRL-based framework sys
promising avenues for addressing these complex challenges. DRL algo tematically considers the dynamic interactions between bridge man
rithms excel at solving complex Markov Decision Processes (MDPs), agement decisions and decision outcomes, including economic costs,
effectively simulating real-world scenarios. As such, Reinforcement bridge conditions due to aging and flood damages, and social impact.
Learning has been employed to solve optimization problems in many The framework is adaptable to future management scenarios and
sectors such as urban energy management (Kang, Jung, Jeoung, Hong, & climate conditions.
Hong, 2023; Li & Zhou, 2023; Sepehrzad et al., 2024; Yadollahi, Gharibi,
Dashti, & Jahromi, 2024), cost management (Deng et al., 2023), and 2. Testbed description
autonomous driving (Khalid et al., 2023). In the context of bridge system
asset management, MDPs model the dynamic progression of bridge In this study, we use the Suffolk County (Boston area), MA, as the
conditions (States) influenced by decision-making (Actions, such as the area testbed to develop the DRL-based decision support framework for
timing of repairs or replacements) and external factors (environmental bridge maintenance. Suffolk County is located in Massachusetts with a
impacts like aging and flood damages). This framework allows for a population of 800,000. Demography of Suffolk County is as follows:
comprehensive understanding and optimization of maintenance strate 38.8% of the people are White, 25% are Black, 22.9% are Hispanic, 9.1%
gies. Table 1 summarizes the latest progress in the development of are Asian, and the rest have two or more races. 24.4% of the households
optimized maintenance strategies using variations of DRL, both for in have children under 18 years old, 10.2% are older than 65 in Suffolk
dividual bridges and bridge systems. These works provide stakeholders County, and 20.6% of the people are below the poverty line
and decision makers powerful tools for improving bridge conditions (CDC/ATSDR 2022). The county has experienced an increase in both
while reducing the maintenance expenditures. stormwater and coastal flooding, largely attributed to climate change.
Table 1
Recent advancement on optimized bridge maintenance strategy development using RL.
Problem Algorithm State-space Action-space Reward
Wei, Bao, and Li Maintenance for a single bridge deck DQN 7 structural components Do-Nothing, minor repair, major Action cost and
(2020) deterioration and age repair, and replacement expected cost of
failure
Cheng and Optimizing load rating planning of a DQN The number of load rating 30 interval duration as for 30 years Action cost and
Frangopol (2021) single bridge girder factor intervals expected life-cycle
cost
Zhou, Yuan, Yang, Maintenance for a single bridge deck Multi-agent 7 structural components Do-Nothing, preventive scenario, Action cost
and Zhang (2022) deterioration DQN and age corrective scenario, and rebuild
Yang (2022) Maintenance for a bridge network PPO 5 overall quality Do-Nothing, maintenance, repair, Action cost and
deterioration and expected cost of failure conditions for bridges rehabilitation, and replacement expected cost of
of deterioration failure
Yang (2022a) Maintenance for a single steel girder DCMA2C 1 reliability index Do-Nothing, inspect, and Action cost
bridge maintenance
Du and Ghavidel Maintenance for a bridge system DQN 3 structural elements with Do-Nothing, minor repair, major Action cost and
(2022) 7 indexes based on NBI repair, and replacement closure cost due to
actions
Xu and Guo (2022) Maintenance of a single bridge Q-learning 1 condition state Do-Nothing, repair, replacement Carbon emission
Asghari, Biglari, Maintenance for a single bridge DQN and 9 states including Do-Nothing, maintenance, A function of utility,
and Hsu (2023) A2C structural and managerial rehabilitation, and reconstruction agency cost, and user
features cost
Hamida and Goulet Maintenance of a bridge system A2C and Variable structural Do-Nothing, routine maintenance, Action cost
(2023) DQN components preventive maintenance, repair, and
replace
Note: DQN = Deep Q-Network; PPO= Proximal Policy Optimization; DCMA2C= Deep Centralized Multi-Agent Actor-Critic; A2C= Actor-Critic
2
A. Taherkhani et al. Sustainable Cities and Society 114 (2024) 105792
This increase is driven by more frequent and intense wet weather events, 2008; Nastev & Todorov, 2013; Ajorlou, Mousavi, Ghayoomi, & Dave,
as well as rising sea levels affecting the coastal community (Dahl, Fitz 2024; Tate, Muñoz, & Suchan, 2015). However, it is crucial to note that
patrick, & Spanger-Siegfried, 2017; Kirshen, Knee, & Ruth, 2008; Ray & the accuracy of the HAZUS framework heavily depends on the quality
Foster, 2016). The National Bridge Inventory (NBI) database (NBI, and availability of local data to precisely model recent changes in the
2020) provides detailed physical and functional attributes of the bridges built environment (Scawthorn et al., 2006), such as up-to-date digital
in the study area, from which we identified 451 highway bridges in the elevation models (DEMs). In our study, we utilized the 2020 DEM of the
initial analysis. From there, we further identified bridges that are the state of Massachusetts. In this study, four riverine flooding scenarios
most vulnerable to flooding using the HAZUS tool to be the focus of our were modeled with return periods of 10, 50, 100, and 500 years. From
study. HAZUS is a tool developed by the Federal Emergency Manage these simulations through HAZUS, we can estimate the water level near
ment Agency (FEMA) to assist in the assessment of potential losses from each bridge during flooding and further calculate the flood depth. Fig. 1
natural disasters such as hurricanes, earthquakes, and floods (Scawthorn shows the results of the simulations for the four riverine flooding sce
et al., 2006). HAZUS has undergone extensive validation and testing to narios, revealing that stronger floods lead to wider inundation that
ensure its reliability and accuracy in simulating flood scenarios. impact more bridges.
Numerous studies have successfully employed and validated HAZUS for Based on the flooding simulation results, there are 19 bridges that
flood modeling and risk assessment (Ding, White, Ullman, & Fashokun, experience inundation in all four flooding scenarios, and these bridges
Fig. 1. Flood level estimated from HAZUS flooding simulation for Sulfolk County, MA: (a) water level (measured from the ground level) during flood with 10-year
return period; (b) water level during flood with 50-year return period; (c) water level during flood with 100-year return period. (d) water level during flood with 500-
year return period.
3
A. Taherkhani et al. Sustainable Cities and Society 114 (2024) 105792
3. Methods
Table 2
Basic information and scour vulnerabilities for critical bridges in Suffolk County, MA.
ID Bridge name Average daily Number of Bridge type Current scour Bridge scour ratings after flooding with Normalized social
traffic (ADT) spans rating different return periods impact index (SII)
10 50 100 500
years years years years
Note: Data used in this table are from CDC/ATSDR 2022; NBI, 2020; Scawthorn et al. (2006)
4
A. Taherkhani et al. Sustainable Cities and Society 114 (2024) 105792
Elevation Model (DEM) of the area and flood simulation results obtained census tract j to account for the vicinity of the bridge to each influ
from HAZUS. The Manning coefficient n in this study was assumed to be enced census tract:
0.05, which is typical for a riverbed with light brush and weed vegeta
1 ∑ SV nj
tion (Arcement & Schneider, 1989). The scour vulnerability for each SIIb = (Eq. 7)
| J | jϵJ e(db,j /req )
bridge corresponding to different flood scenarios is summarized in
Table 2. √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
In addition to flood-induced degradation, bridges naturally deterio where req is the equivalent radius of the study area (req = Total Area
=
π
rate over time. We used the deterioration function proposed by Agrawal,
6.39 km), J is the set of all census tracts, and | J| is the size of set J.
Kawaguchi, and Chen (2010) to capture the time deterioration of scour
The Social Impact Index SIIb captures the impact of bridge b on its
ratings r over time (T):
surrounding socially vulnerable communities. For numerical stabil
r = 9 − 0.0464T − 0.0002T2 (Eq. 4) ity, we normalize SIIb to the 0–1 range:
SIIb
3.2. Social Vulnerability Impact Analysis SIInb = (Eq. 8)
max(SIIb )
bϵB
where I is the set of social vulnerability factors (I = {aged 65 or older, CFb,mod = SIInb α CFb 1− α
(Eq. 10)
below 150% poverty, unemployed, …}), and |I| is the size of set I.
3. Finally, we calculate the Social Impact Index SIIb for each bridge b. where CFb,mod is the modified cost of failure for bridge b, and α is the
SIIb is defined as the weighted average of the normalized social vulnerability influence factor that varies between 0 and 1. When α is set
vulnerabilities SVnj for each census tract j, with the weights being to 0, CFb,mod reverts to the original cost of failure CFb, disregarding the
determined by the distance db,j between bridge b and the centroid of social impact of potential bridge failure. On the other hand, setting α = 1
represents full consideration of bridge’s social impact. We will consider
five values of α (α = 0, 0.25, 0.50, 0.75, and 1) in this study.
5
A. Taherkhani et al. Sustainable Cities and Society 114 (2024) 105792
Fig. 3. Bridges locations, SII, and cost of failure, social vulnerability of census tracts in Suffolk County, MA; a) bridges’ SII; b) physical cost of failure of bridges.
In any given year, different flood scenarios occur with corresponding period of 50 years. It is important to note that as the bridges’ ratings are
probabilities (e.g., 50-year return period flood has a probability of affected by both time deterioration and flood-induced degradation, the
occurrence = 1/50 = 2%), leading to potential bridge failure. The ex expected cost of failure Cf changes over time.
pected cost of failure Cf for the entire system of bridges ($) for that year In addition to the expected cost of failure, we also considered the
is: maintenance cost associated with the bridge repair, including two main
( ) components: (1) the direct cost of the bridge maintenance, and (2) the
∑ ∑
Cf ($;) = CFb,mod P(failure|fld, rb ) × P(fld) (Eq. 11) potential reduction in the expected cost of failure due to the improved
bϵB fldϵF bridge condition and extended bridge lifespan (Eq. 4). The direct cost of
maintenance Crb is calculated based on the bridge’s scour rating before
where B is the set of all bridges being considered; CFb,mod is the modified the maintenance. This cost represents the immediate investment to
cost of failure for bridge b, obtained from Eq. 10; P(failure| fld, rb) is the restore the bridge’s condition. Precise data regarding the scour repair
conditional probability of failure for bridge b given a specific flood costs for bridges are notably absent in existing literature. To estimate the
scenario fld and the scour rating rb of bridge b, obtained from Table 3 repair costs in developing the optimization framework, we assume the
(adapted from Pearson, Stein, and Jones (2002)); F is the set of flood repair cost Crb for bridge b to be a fraction of the cost of failure CFb (Eq.
scenarios, and P(fld) is the probability of occurrence of flood with a 9), where the fraction η depends on the current bridge scour ratings, as
specific return period. For example, P (fld)=0.02 for flood with return summarized in Table 4. The reduction in the expected future cost of
failure is captured through the bridge’s, probability of failure, which
decreases with increased bridge condition resulting from bridge
Table 3
Annual probability of failure for bridges under different flood scenarios
depending on the NBI scour rating (Mclemore et al., 2010). Table 4
Scour Flood with 500 Flood with 100 Flood with 50 Flood with 10 Fraction of cost of failure (η) for repairing ac
Rating years return years return years return years return tions bridges depending on the bridge scour
period period period period ratings.
0 1 1 1 1 Rating 0 0.000444
1 0.01 0.01 0.01 0.01 Rating 1 0.000250
2 0.005 0.006 0.008 0.009 Rating 2 0.000160
3 0.0011 0.0013 0.0016 0.002 Rating 3 0.000111
4 0.0004 0.0005 0.0006 0.0007 Rating 4 0.000082
5 0.000007 0.000008 0.00004 0.00007 Rating 5 0.000062
6 0.00018 0.00025 0.0004 0.0005 Rating 6 0.000049
7 0.00018 0.00025 0.0004 0.0005 Rating 7 0
8 0.000004 0.000005 0.00002 0.00004 Rating 8 0
9 0.0000025 0.000003 0.000004 0.000007 Rating 9 0
6
A. Taherkhani et al. Sustainable Cities and Society 114 (2024) 105792
maintenance.
Crb = η × CFb (Eq. 12)
Moreover, the reduction in the expected cost of failure is captured
through the bridge’s, probability of failure, where the probability of
failure of a maintained bridge is reduced, according to Table 3.
Subject to:
{ ( ) ( )
A larger value for PSII-weighted suggests a management outcome in r(b,t− 1) − det r(b,t− − deg r(b,t− 1) , fld if xb,t− =0
r(b,t) = 1) 1
(Eq. 17)
which bridges with greater SII (greater social impact) end up with 7 if xb,t− 1 = 1
greater probability of failure. On the other hand, lower PSII-weighted sug ∑
gests that priority has been given to bridges with larger social impact (i. xb,t ≤ 2 (Eq. 18)
e., larger SII). By associating probability of failure with the social impact b∈B
index (SII), this metric provides insights into the potential societal im
pacts of bridge portfolio management. ∀b ∈ B = {1, …, 19}, t ∈ T = {1, …, 20}, ∀f ∈ F
= {10, 50, 100, 500}, xb,t ∈ {0, 1}, rb,t ∈ {0, 1, …, 9}
3.5. Deep reinforcement learning (DRL) modeling
Where γ is the discount factor, 0.99; CFb, mod is the modified cost of
failure for bridge b (Eq. 10); Crb is the cost associated with repairing
Framework: We use Deep Reinforcement Learning (DRL) to develop
bridge b with condition rating r and CFb, mod ; rb,t is an integer variable
a maintenance strategy aimed at minimizing the economic costs asso
representing the scour rating of bridge b in year t; xb,t is a binary decision
ciated with potential bridge failure and bridge maintenance, while
variable that takes value 1 if bridge b is repaired in year t, and 0 other
promoting social equity within neighboring communities, over a 20- ( )
wise; Cf fld, rb,t is the expected cost of flooding for bridge b with the
year span. Fig. 4 shows the structure and workflow of the DRL model ( )
framework, in which the artificial intelligent (AI) agent interacts with its scour rating r in flood scenario fld at time t; det r(b,t) is the deterioration
( )
Environment, which is modeled a Markov Decision Process (MDP), as rate of bridge b with scour rating r at time t; and deg r(b,t) , fld is the
illustrated in Fig. 5. At each time step (we consider one-year time expected flood-related degradation for bridge b at time t in flood sce
increment), the artificial intelligent (AI) agent can choose to repair one nario fld.
bridge, two bridges, or Do-Nothing (i.e., not repairing any bridge). In a reinforcement learning (RL) problem, the environment is the
Based on the maintenance decision, the state of the environment (i.e., system in which an agent operates and learns to make optimized de
bridge ratings) will be updated (e.g., improvement by maintenance, cisions. The emulator of the environment is typically modeled as a
flood-induced degradation, time deterioration), and the AI agent will Markov Decision Process (MDP). MDP is a framework used to model
receive a feedback reward from the Environment according to the decision-making problems in situations where outcomes depend on both
objective function defined by (Eq. 16). Based on the updated Environ the actions taken by a decision-maker and the state of the agent in its
ment state and the reward received from the Environment, the AI makes environment. The outcome of a decision at each time step is determined
a new maintenance decision for the next time step. This feedback loop by the current state and the chosen action. MDP is numerically defined
iterates until the end of the 20-year time horizon, concluding one by a tuple M = (S, A, T, R), consisting of four main components:
learning cycle. This learning cycle will be repeated as the AI agent States: S = {s1, s2, …, st}, where st denotes the state of the system at
continuously refines its decision strategy with the objective of maxi time t. In this study, st is represented by a vector of 19 elements, each
mizing the cumulative rewards over 20 years, until the model converges. corresponding to the rating of one of the 19 bridges.
7
A. Taherkhani et al. Sustainable Cities and Society 114 (2024) 105792
Fig. 5. Markov Decision Process (MDP) for sequential decision-making problem of bridge system maintenance.
Actions: A = {a1, a2, …, at}, where at denotes the action taken at 4. Results
time t, which is represented as a vector of 2 elements, each corre
sponding to one action. Under each state s, there are 20 options for each 4.1. Optimization considering only economic cost
action: pick one of the 19 bridges to maintain or Do-Nothing .
Transition function: Transition function T defines the dynamics of In RL, the learning curve graphically illustrates the evolution of the
the system. When the agent takes action at under state st at time t, the RL agent’s performance, showcasing how its decision-making strategy
new state st+1 of the system at time t+1 can be obtained using the progressively improves with respect to the cumulative rewards (or costs)
transition function T: through iterations. Fig. 6 shows the learning curve obtained from the
Proximal Policy Optimization (PPO). We only consider the economic
st+1 = T(at , st ) (Eq. 19)
cost for the objective function in this section (α = 0 in Eq. 10), and the
In this study, the transition function is obtained from the flood- results with incorporation of social impact will be presented in Section
induced degradation, time deterioration, and the maintenance decision. 0 and Section 5.2. Initially, there is a fast decrease in cost, suggesting
Reward function: After the agent takes an action at time t, it will that the agent has swiftly learned to avoid repairing bridges with a
receive a reward Rt+1 from the environment as feedback depending on rating of 7 or higher to avoid the substantial penalties (Eq. 20). After
the current system state st and the action at. The reward in an RL that, the cost continues decreasing at a slower rate as the agent con
framework must be designed to reflect the objective function for the tinues to explore and refine more effective decision-making strategies.
underlying optimization problem. The reward mechanism guides the We selected the strategy learned by the RL agent at the end of 80,000
agent towards learning an optimal policy by incentivizing actions that episodes.
lead to desirable outcomes. In this study, Rt+1 is defined as the negative The DRL model’s decision to prioritize certain bridges for mainte
sum of the expected cost of failure Cf(st) evaluated under state st (Eq. 11) nance over others is driven by several key factors. Firstly, bridges with
and the repairing cost Cr(at, st) (Eq. 12): high initial scour ratings (e.g., above 7) are less likely to be selected for
{ repair, as they are in good condition and have a lower risk of failure.
− Cf (st ) − Cr (at , st ) if choose bridge with rating < 7
Rt+1 = (Eq. 20) Secondly, the potential economic consequences of a bridge’s failure play
− 106 Otherwise
a significant role in the model’s decision-making process. Bridges with
The second line of the equation is introduced to discourage the agent higher costs of failure are given greater priority for maintenance to
from repairing a bridge with a rating ≥ 7 by penalizing the agent with a mitigate the financial impact of potential failures. Lastly, the rate of
large negative reward (− 106). This is done because when a bridge un flood-related degradation and time-related influences the model’s
dergoes maintenance, its scour rating is restored to a value of 7, as per choices. Bridges that deteriorate more rapidly are more likely to be
the National Bridge Inventory standards. This implies that the bridge’s selected for maintenance to prevent their condition from reaching crit
robustness to scour-related failures is improved, thereby extending its ical levels. For instance, bridge 4, which has an initial scour rating of 9,
expected lifespan and reducing its expected failure probability. Thus, experiences a slow time-related deterioration rate due to high initial
there is no benefit of maintaining a bridge with ratings ≥ 7. Fig. 5 shows scour rating (according to the quadratic behavior of the time deterio
the MDP developed for this study. ration function in Eq. 4), and its flood-related degradation is not sig
nificant (Table 2). Thus, bridge 4 is never selected by the RL agent over
the 20-year period. In contrast, bridge 15 is chosen for maintenance
Fig. 6. PPO learning curve for learning with no consideration of social equity (α=0 in Eq. 10). The dark blue curve shows a moving average with a window size
of 200.
8
A. Taherkhani et al. Sustainable Cities and Society 114 (2024) 105792
twice due to its high rate of degradation (Fig. 7) and the substantial The ultimate objective of the expert policy is to maintain the system’s
potential economic cost associated if failure ever occurs, thus receiving overall functionality rather than minimizing the total cost over the
greater priority from the RL agent in the optimization. planning horizon.
Fig. 7 illustrates the history of the scour rating for four representative
bridges as the output of the RL optimization considering only economic Table 5 shows costs associated with the proposed PPO policy and
cost in the objective function. The RL agent’s attitude towards bridges three baselines described above. In baseline 1, 197% improvement in
with lower rate of degradation (i.e., bridge 2 and bridge 4) is signifi the proposed PPO policy over Do-Nothing Policy shows a significant
cantly different than the more vulnerable bridges that are located in effectiveness of the proposed algorithm. Furthermore, a difference of
highly flood-prone areas (i.e., bridge 15). The high flood degradation 386% for the proposed PPO over random policy shows that the algo
rate necessitates repairs to avert potential failures. This observation rithm has successfully learned from the rewards and experiences, whose
suggests that the agent’s primary objective is to sustain acceptable scour outcomes is not random. In baseline 3, a 52% improvement highlights
ratings for all bridges throughout the entire horizon, motivated by the that the proposed PPO has discovered more efficient actions than the
understanding that even a single severely deteriorated bridge can status quo, due to achieving an optimal equilibrium between action cost
significantly elevate the risk profile of the entire portfolio. Additionally, and the expected cost of failure. Given these findings, it is clear that the
it is important to acknowledge that the agent’s decision-making process proposed algorithm’s performance is significant, as it outperforms the
is influenced not only by the flood degradation rate but also by other established baselines.
critical factors, including initial scour ratings (Table 2). These factors Hypothesis test: To further validate the statistical significance of
collectively shape the agent’s behavior in prioritizing and executing our DRL model’s performance, we employed a bootstrapping method to
repair actions across the bridge portfolio. compare it against a large sample of random maintenance policies. By
comparing our DRL model’s performance against this extensive set of
4.2. Model validation with baselines random policies, we can determine, with a high level of confidence,
whether our model significantly outperforms a random approach. The
To assess the performance of reinforcement learning algorithms, it is hypothesis test for this comparison is as follows:
essential to establish baseline benchmarks. For this purpose, we
compare the optimized sequence of maintenance actions generated by H0: The proposed DRL model’s performance is not significantly
PPO with four baseline scenarios: “Do-Nothing,” random policy, expert better than the performance of random maintenance policies.
policy, and policy without consideration of social vulnerability analysis H1: The proposed DRL model’s performance is significantly better
(i.e., no cost adjustment applied). than the performance of random maintenance policies.
Baseline 1- Do-Nothing policy: The Do-Nothing baseline is often Test Statistic: Total cost is used as the performance metric for
used as a fundamental reference for RL model validation. This comparing the DRL model against the random policies. Let CDRL be the
baseline corresponds to a policy where the agent takes no action total cost achieved by the DRL model and CRandom be the total cost
regardless of its current state. If the learned policy fails to outperform achieved by a random policy.
the Do-Nothing policy, it indicates that the RL algorithm is not Significance Level: We set the significance level at α = 0.01 (99%
effectively learning from the environment. Although in certain real- confidence level).
world scenarios, choosing the Do-Nothing strategy may not neces Bootstrapping Procedure:
sarily be a poor strategy, comparing the RL algorithm against this
baseline helps the modeler understand how much value is being 1. Generate 10,000 random maintenance policies without the penalty
gained through active decision-making. of repairing bridges with scour rating of 7 and higher.
Baseline 2- random policy: Another commonly used baseline is the 2. Calculate the total cost (CRandom) for each random policy.
random action policy, in which actions are chosen randomly from 3. Compare the total cost of the DRL model (CDRL) against the distri
the available action space. It represents a simple and naïve strategy bution of total costs of the random policies.
that lacks any deliberate decision-making or learning mechanism. 4. Calculate the p-value as the proportion of random policies that ach
Despite its simplicity, the random action policy serves as a valuable ieve a lower total cost than the DRL model.
baseline for evaluating the performance of RL algorithms.
Baseline 3- expert policy: The expert policy, also known as the Result: After conducting the bootstrapping procedure with 10,000
status quo, represents the existing policy traditionally employed by random policies, we found that none of the random policies achieved a
the bridge decision-makers. Under this policy, two bridges with the lower total cost than the DRL model. This results in a p-value of 0 (<
minimum ratings at each timestep are selected for maintenance. This 0.01). Thus, we reject the null hypothesis at the 99% confidence level
approach tends to minimize the number of non-functional bridges. and conclude that the proposed DRL model’s performance is signifi
cantly better than the random policy baseline in terms of minimizing the
total cost of maintenance and expected failure costs.
Table 5
Model evaluation results on cumulative costs.
Policy Action Expected cost Total Cost
cost ($) of failure ($) cost ($) differences
(%)
9
A. Taherkhani et al. Sustainable Cities and Society 114 (2024) 105792
Social vulnerability (SV) consideration: Our analysis further Fig. 8 also provides a clear and intuitive representation of how the
demonstrates the superior performance of the proposed DRL algorithm, model’s recommendation changes based on the weight of the SV, which
when it incorporates SV considerations (α = 1 in Eq. 10), in comparison allows the decision-makers to compare the repair frequencies and pri
to the established baselines. As illustrated in Table 6, the SII-weighted oritization for bridge maintenance under various trade-off of economic
average probability of failure (Eq. 15) reveals a significant improve cost and social vulnerability scenario.
ment with the application of our proposed approach over do-nothing
policy, random policy, and expert policy. According to Table 5 and 5.2. Understanding the tradeoff between the economic cost and social
Table 6, the model’s practicality is not merely in terms of portfolio cost equity
optimization but extends crucially to the enhancement of social
vulnerability outcomes. Such results underscore the significance of Fig. 9 illustrates the tradeoff between the total physical cost
integrating SV considerations into the algorithmic framework, thereby (including the repair costs Cr and the expected cost of failure CF) and
yielding not only economic but also societal benefits. PSSI-weighted (Eq. 15) associated with different α values. As expected, PSSI-
weighted decreases as the contribution of SV increases in the calculation of
5. Discussion the modified cost of failure (Eq. 10). This suggests that by factoring in
SV, bridges that are more critical to society are identified and prioritized
5.1. The effect of SV on the algorithm’s behavior for maintenance. Accordingly, the incorporation of SV in maintenance
planning results in an increased total cost for the bridge portfolio
To demonstrate the significance of social vulnerability (SV) on the because accounting for SV leads to the selection of more expensive
algorithm’s behavior and to enhance the interpretability of the model, maintenance strategies, which are necessary to mitigate risks on socially
we analyze two key factors: the normalized modified cost of failure and critical bridges. Thus, the inclusion of social vulnerability in bridge
the frequency of repairs for each bridge. The modified cost of failure for maintenance planning does not come without added costs, but it can
each bridge is normalized by dividing it by the average modified cost of lead to a socially optimized portfolio of bridges with a lower risk of
failure across all bridges, resulting in a mean value of 1. Consequently, a failure. This cost is justifiable from a societal welfare perspective to
bridge with a normalized modified cost of failure above 1 is considered balance the financial investment and societal equity. Fig. 9 provides
more critical in terms of failure costs compared to the average bridge. In bridge owners with valuable managerial insights regarding the impli
Fig. 8, we explore the variation in normalized modified costs of failure cations of SV consideration, by showing the financial impact of inte
for Bridges 5, 12, and 15 across different α values (Fig. 8a), alongside the grating SV into maintenance planning.
corresponding number of repairs dictated by the algorithm for each
bridge concerning different α value (Fig. 8b). The first column of the 5.3. Stability and transferability of the DRL algorithm
heatmaps represents scenarios devoid of SV consideration, highlighting
Bridge 15 with a normalized modified cost of failure of 2.23—a notably The proposed methodology is designed to be generalizable to other
high value within the portfolio. Bridges 5 and 12 exhibit lower geographic regions, allowing for the integration of local bridge charac
normalized modified cost of failure, leading to no repair actions for these teristics, flood hazards, and social vulnerability profiles. When applying
bridges, while Bridge 15 is repaired twice (as shown in Fig. 8b). the algorithm to new study areas, retraining the model using local data is
Conversely, when examining the scenario with α equals 1, Bridge 15, recommended to ensure optimal performance and account for regional
despite its higher cost of failure, registers a lower SII, diminishing its variations. Finally, the stability of the DRL algorithm can be influenced
priority when solely considering social vulnerability. Under this sce by changes in the underlying data sources. Regular updates to the bridge
nario, Bridges 5 and 12, characterized by higher SII values, display inventory, flood hazard maps, and social vulnerability indices should be
increased modified costs of failure, prompting the algorithm to schedule incorporated through periodic retraining to maintain the algorithm’s
Bridge 5 for two repairs and Bridge 12 for one repair, leaving Bridge 15 performance and relevance over time.
without any repair due to its reduced SII. The stark contrast between the
extreme scenarios of α = 0 and α = 1, as depicted in Eq. 10, underscores 5.4. On long-term sustainability and environmental impact decisions
the algorithm’s sensitivity to the incorporation of SV in the analysis. The
variations in repair frequency and prioritization between these scenarios Our study demonstrates that the DRL model can effectively optimize
conclusively demonstrate that SV plays a crucial role in shaping main maintenance strategies, reducing the expected cost of failure by 81.7%
tenance strategies for bridges. This analysis also highlights the algo compared to the Do-Nothing scenario and 21.4% compared to the
rithm’s adaptability to different prioritization metrics, concerning the business-as-usual (expert policy) (see Table 5). This reduction in failure
contribution of SV to bridge maintenance planning problems. risk not only minimizes the direct economic costs but also mitigates the
From an interpretability perspective, this analysis offers valuable potential environmental consequences associated with bridge failures,
insights into the decision-making process of the DRL model, highlighting such as the release of debris and pollutants into waterways and damage
how the model prioritizes based on both bridges’ physical costs and to surrounding ecosystems (Padgett & Tapia, 2013). Moreover, the
social vulnerabilities. Furthermore, the adjustable α parameter in Eq. 10 integration of social vulnerability into the optimization process ensures
serves as an interpretable knob that decision-makers can use to control that the DRL model prioritizes maintenance interventions on bridges
the balance between economic and social considerations in the main that are most critical for socially vulnerable communities. Our results
tenance planning process, based on their preferences and priorities. show that by setting the vulnerability influence factor (α) to 1, the DRL
model reduces the SII-weighted average probability of failure by 93.3%
compared to the Do-Nothing scenario and 67.7% compared to the
Table 6 business-as-usual (expert policy) (see Table 6). This targeted approach
Model evaluation results on social vulnerability. promotes a more equitable distribution of resources and long-term
Policy PSII-weighted (Eq. PSII-weighted sustainability of communities by minimizing the disproportionate im
15) differences (%) pacts of bridge failures on socially vulnerable populations.
PPO (with social vulnerability, i.e., α = 1 0.0054% -
in Eq. 10) 6. Conclusions
Baseline 1: Do-Nothing 0.0800% 1391.72%
Baseline 2: random policy 0.0505% 841.01% This study has introduced a sequential decision-making framework
Baseline 3: expert policy 0.0167% 211.42%
designed to reduce the overall costs associated with bridge maintenance
10
A. Taherkhani et al. Sustainable Cities and Society 114 (2024) 105792
Fig. 8. (a) Normalized modified cost of failure for bridge 5, 12, and 15 across various α values in Eq. 10; (b) Number of repairs for bridge 5, 12, and 15 in the
optimized episode concerning various α values in Eq. 10.
Table A1.7, Eq. A1.21, and Where θ is the angle of attack of water, L is the length of the pier, and D is the diameter of the pier.
Table A1.8 determine the correction factors for K1, K2, and K3 in Eq. 1, respectively. These values are obtained from Liang et al. (2017).
11
A. Taherkhani et al. Sustainable Cities and Society 114 (2024) 105792
Table A1.7
Correction factor K1 in Eq. 1.
Square 1.1
Round 1
Cylinder 1
Group of cylinders 1
Sharp 0.9
( )0.65
L
K2 = cosθ + sinθ × (Eq. A1.21)
D
Where θ is the angle of attack of water, L is the length of the pier, and D is the diameter of the pier.
Table A1.8
Correction factor K3 in Eq. 1.
Bed condition K3
RL encompasses two primary types of approaches: model-based and model-free methods. In model-based RL, the agent learns a model of the
environment and utilizes it to predict future states and rewards. Model-free RL directly learns optimal policies from the interaction with the envi
ronment without explicitly learning a model of the environment. Within model-free RL, there are two main subcategories: value-based and policy-
based methods. Value-based methods, such as Deep Q-Network (Mnih et al., 2015), aim to estimate the value of each state or state-action pair and
select actions that maximize these value estimates. They often employ iterative algorithms to update value estimates based on observed rewards. In
contrast, policy-based methods, like Proximal Policy Optimization (PPO) (Schulman, Wolski, Dhariwal, Radford, & Klimov, 2017), optimize the policy
(in PPO, the deep neural network) parameters directly to maximize expected rewards. These methods typically rely on gradient-based optimization (e.
g., gradient ascent) techniques and learn from the agent’s interactions with the environment.
In policy-based RL, policy refers to the decision-making strategy that an agent employs to maximize the cumulative reward. A policy function is
often parameterized using an artificial neural network. It determines the actions to be taken by the agent based on the state of the agent-environment
system. For discrete action spaces, the policy is represented using a softmax function over the action probabilities:
π (a|s; θ) = softmax(f(s; θ)) (Eq. A2.22)
where a represents action, s represents the state, θ represents the policy parameters (in this case, the weights and biases of the deep neural network),
and f(s; θ) is the output of the neural network.
PPO is a popular and effective policy-based algorithm in RL. PPO addresses the challenge of finding a balance between exploration and exploitation
by iteratively updating the policy in a way that ensures stable and efficient learning (Schulman et al., 2017). Unlike traditional policy-based methods,
PPO employs a proximal (surrogate) objective function (Eq. A2.23) that incorporates a constraint on the size of the policy update (gradient clipping).
πθ (at |st ; θ)
rt (θ) =
π θold (at |st ; θold )
A
̂ t (s, a) = Q(s, a) − V(s) (Eq. A2.24)
where LCLIP(θ) is the objective function of the PPO algorithm. πθ(at|st; θ) represents the current policy for taking action a given state s and parameter θ,
πθ_old(at|st; θold) represents the old policy for taking action a given state s and old parameter θold, ε is the hyperparameter that controls the clipping
extent. The state-action value function Q(s, a) estimates the expected cumulative reward when taking action a in state s and following a specific policy
thereafter (Eq. A2.25). It represents the quality (value) of taking the specific action a in the state s. The state value function V(s) estimates the expected
cumulative reward when starting in state s and following a specific policy thereafter, representing the overall quality (value) of being in that state (Eq.
A2.26). Â(s,a) is the advantage function (Eq. A2.24) for state s and action a. Â(s,a) is called “advantage function” because by subtracting the V(s) from
the Q(s, a), the difference represents the advantage or disadvantage of taking the specific action a in state s compared to the average performance of all
possible actions in state s.
LCLIP(θ) is the empirical average of the minimum of two terms. These terms are two competing functions. The first term aims at encouraging the
policy update, which increases the selection probability of actions with higher advantage. On the other hand, the second term introduces a constraint
12
A. Taherkhani et al. Sustainable Cities and Society 114 (2024) 105792
on the policy update by clipping the ratio of the new and old policy to prevent significant deviations from the old policy, ensuring more stable training.
The feature of clipping contributes to the success of PPO in achieving robust, stable, and reliable policy optimization, making it a popular choice in
various reinforcement learning applications. Fig. A2.10 illustrates the algorithm of PPO.
Entropy Coefficient is a hyperparameter of PPO algorithm that controls the contribution of the actor’s entropy (randomness) to the overall
objective function during training. In the context of PPO, entropy coefficient encourages exploration by discouraging the policy from becoming overly
deterministic and favoring actions with high probabilities. The entropy of the actor can be defined in terms of the PPO objective function (Eq. A2.27)
and (Eq. A2.28).
LPPO (θ) = LCLIP (θ) − cLS (πθ ) (Eq. A2.27)
∑
LS (πθ ) = − πθ (a|s)log(πθ (a|s)) (Eq. A2.28)
a
where LPPO(θ) is PPO objective function to be maximized, LCLIP(θ) is the clipped surrogate objective (Eq. A2.23), c is the entropy coefficient, LS(πθ) is
the actor’s entropy term, and πθ(a|s) is the probability of selecting action a given state s according to the policy network π with parameters θ. By
including the entropy term in the objective function and adjusting the entropy coefficient c, better control of balance between exploration and
exploitation during training is possible. A higher entropy coefficient encourages more exploration, while a lower coefficient promotes exploitation and
a more deterministic policy.
The model is developed using Python 3.8 and PyTorch 2, and it runs on a machine equipped with an Intel Core-i9–10920X CPU, 128 GB of memory,
and an NVidia RTX A5000 GPU. The PPO algorithm utilizes the Adam optimizer with a step size of 0.0005 for gradient ascent. The neural network
architecture comprises two hidden layers with 64 nodes each, same for both the actor and critic networks. The training process requires approximately
36 hours to complete. Once training is completed, the inference process for determining the optimized maintenance scenario for the whole bridge
portfolio takes a fraction of a second.
Hyperparameter tuning is a crucial step in optimizing the performance of the Proximal Policy Optimization (PPO) algorithm. To find the optimal
hyperparameter configuration for our DRL model, we conducted a comprehensive grid search. We defined a range of values for each hyperparameter
and systematically evaluated the model’s performance for each combination. The ranges explored for each hyperparameter are as follows: learning
rate α from 0.0001 to 0.003, minibatch size of from16 to 64, discount factor γ from 0.80 to 0.99, entropy coefficient from 0.0 to 0.01, and the number
of epochs from 100 to 500. The best-performing hyperparameter configuration was selected for the final model, which is: learning rate α = 0.001,
minibatch size = 64, discount factor γ = 0.99, number of epochs = 500. To encourage more exploration in the initial episodes and more exploitative
behavior as learning progresses, the entropy coefficient is designed to gradually decay from 0.05 to 0.0005.
References Arneson, L. A., Zevenbergen, L. W., Lagasse, P. F., & Clopper, P. E. (2012). Evaluating
scour at bridges. National Highway Institute (US).
Asghari, Vahid, Biglari, Ava Jahan, & Hsu, Shu-Chien (2023). Multiagent reinforcement
Abdelkader, Eslam Mohammed, Moselhi, Osama, Marzouk, Mohamed, & Zayed, Tarek
learning for project-level intervention planning under multiple uncertainties. Journal
(2022). An exponential chaotic differential evolution algorithm for optimizing
of Management in Engineering, 39(2), Article 04022075.
bridge maintenance plans. Automation in Construction, 134, Article 104107.
Bibri, Simon Elias, & Krogstie, John (2017). On the social shaping dimensions of smart
Agrawal, Anil K., Kawaguchi, Akira, & Chen, Zheng (2010). Deterioration rates of typical
sustainable cities: A study in science, technology, and society. Sustainable Cities and
bridge elements in New York. Journal of Bridge Engineering, 15(4), 419–429.
Society, 29, 219–246.
Ajorlou, Elham, Mousavi, Sayedmasoud, Ghayoomi, Majid, & Dave, Eshan V. (2024).
CDC/ATSDR. 2022. “Centers for disease control and prevention/agency for toxic
Performance of flooded flexible pavements: A data-driven sensitivity analysis
substances and disease registry/ geospatial research, analysis, and services program.
considering soil moisture fluctuations. Transportation Geotechnics, Article 101202.
CDC/ATSDR social vulnerability index 2020 database MA.”.
Anderson, MJ, Kiddle, DAF, & Logan, TM (2022). The underestimated role of the
Chang, Heejun, Pallathadka, Arun, Sauer, Jason, Grimm, Nancy B., Zimmerman, Rae,
transportation network: Improving disaster & community resilience. Transportation
Cheng, Chingwen, Iwaniec, David M., Kim, Yeowon, Lloyd, Robert, &
Research Part D: Transport and Environment, 106, Article 103218.
McPhearson, Timon (2021). Assessment of urban flood vulnerability using the social-
Arcement, George J., & Schneider, Verne R. (1989). Guide for selecting manning’s
ecological-technological systems framework in six us cities. Sustainable Cities and
roughness coefficients for natural channels and flood plains. DC: US Government
Society, 68, Article 102786.
Printing Office Washington. Vol. 2339.
Chanson, Hubert. (2004). Hydraulics of open channel flow. Elsevier.
13
A. Taherkhani et al. Sustainable Cities and Society 114 (2024) 105792
Chen, Yi, Liu, Tao, Ge, Yi, Xia, Song, Yuan, Yu, Li, Wanrong, & Xu, Haoyuan (2021). NCHRP report. (2006). Risk-based management guidelines for scour at bridges with unknown
Examining social vulnerability to flood of affordable housing communities in foundations. Washington, DC: Transportation Research Board of the National
Nanjing, China: Building long-term disaster resilience of low-income communities. Academies.
Sustainable Cities and Society, 71, Article 102939. Padgett, J. E., & Tapia, C. (2013). Sustainability of natural hazard risk mitigation: Life
Cheng, Minghui, & Frangopol, Dan M. (2021). A decision-making framework for load cycle analysis of environmental indicators for bridge infrastructure. Journal of
rating planning of aging bridges using deep reinforcement learning. Journal of Infrastructure Systems, 19(4), 395–408.
Computing in Civil Engineering, 35(6), Article 04021024. Pearson, Dave, Stein, Stuart, & Jones, J. Sterling (2002). HYRISK methodology and user
Dahl, Kristina A., Fitzpatrick, Melanie F., & Spanger-Siegfried, Erika (2017). Sea level guide. US Department of Transportation, Federal Highway Administration.
rise drives increased tidal flooding frequency at tide gauges along the US east and Pizarro, Alonso, Manfreda, Salvatore, & Tubaldi, Enrico (2020). The science behind scour
gulf coasts: Projections for 2030 and 2045. PloS One, 12(2), Article e0170949. at bridge foundations: A review. Water, 12(2), 374.
Deng, Jifei, Eklund, Miro, Sierla, Seppo, Savolainen, Jouni, Niemistö, Hannu, Plough, Alonzo, Fielding, Jonathan E., Chandra, Anita, Williams, Malcolm,
Karhela, Tommi, & Vyatkin, Valeriy (2023). Deep reinforcement learning for fuel Eisenman, David, Wells, Kenneth B., Law, Grace Y., Fogleman, Stella, &
cost optimization in district heating. Sustainable Cities and Society, 99, Article Magaña, Aizita (2013). Building community disaster resilience: perspectives from a
104955. large urban county department of public health. American Journal of Public Health,
Ding, Aiju, White, James F., Ullman, Paul W., & Fashokun, Adebola O. (2008). 103(7), 1190–1197.
Evaluation of HAZUS-MH flood model with local data and other program. Natural Ray, Richard D., & Foster, Grant (2016). Future nuisance flooding at Boston caused by
Hazards Review, 9(1), 20–28. https://doi.org/10.1061/(ASCE)1527-6988(2008)9:1 astronomical tides alone. Earth’s Future, 4(12), 578–587.
(20) Richardson, E. V. (1987). FHWA technical advisory-scour at bridges (draft report). US
Du, Ao, & Ghavidel, Alireza (2022). Parameterized deep reinforcement learning-enabled department of transportation. Federal Highway Administration.
maintenance decision-support and life-cycle risk assessment for highway bridge Scawthorn, Charles, Flores, Paul, Blais, Neil, Seligson, Hope, Tate, Eric,
Portfolios. In Structural Safety, 97. https://doi.org/10.1016/j.strusafe.2022.102221 Chang, Stephanie, Mifflin, Edward, Thomas, Will, Murphy, James, &
Federal Highway Administration. “Bridges & Structures. Bridge Condition by Highway Jones, Christopher (2006). HAZUS-MH flood loss estimation methodology. II.
System 2020.” Retrieved June 2, 2023. https://www.fhwa.dot.gov/bridge/nbi/n Damage and loss assessment. Natural Hazards Review, 7(2), 72–81.
o10/condition19.cfm. Schulman, John, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017.
Habibzadeh-Bigdarvish, Omid, Yu, Xinbao, Lei, Gang, Li, Teng, & Puppala, Anand J. “Proximal policy optimization algorithms.” arXiv Preprint arXiv:1707.06347.
(2019). Life-cycle cost-benefit analysis of bridge deck de-icing using geothermal heat Sepehrzad, Reza, Godazi Langeroudi, Amir Saman, Khodadadi, Amin, Adinehpour, Sara,
pump system: A case study of North Texas. Sustainable Cities and Society, 47, Article Al-Durra, Ahmed, & Anvari-Moghaddam, Amjad (2024). An applied deep
101492. https://doi.org/10.1016/j.scs.2019.101492 reinforcement learning approach to control active networked microgrids in smart
Hamida, Zachary, & Goulet, James-A. (2023). Hierarchical reinforcement learning for cities with multi-level participation of battery energy storage system and electric
transportation infrastructure maintenance planning. Reliability Engineering & System vehicles. Sustainable Cities and Society, 107, Article 105352.
Safety, 235, Article 109214. Sinha, Kumares C., Labi, Samuel, & Agbelie, Bismark RDK (2017). Transportation
Hung, Chung-Chan, & Yau, Wen-Gi (2017). Vulnerability evaluation of scoured bridges infrastructure asset management in the new millennium: continuing issues, and
under floods. Engineering Structures, 132, 288–299. emerging challenges and opportunities. Transportmetrica A: Transport Science, 13(7),
Ilbeigi, M., & Meimand, M. Ebrahimi (2020). Statistical forecasting of bridge 591–606.
deterioration conditions. Journal of Performance of Constructed Facilities, 34(1). Stein, Stuart, & Sedmera, Karsten (2006). Risk-based management guidelines for scour at
Jaafaru, Hussaini, & Agbelie, Bismark (2022). Bridge maintenance planning framework bridges with unknown foundations. Washington, DC: Transportation Research Board of
using machine learning, multi-attribute utility theory and evolutionary optimization the National Academies.
models. Automation in Construction, 141, Article 104460. Sturm, Terry W. (2001). Open channel hydraulics, 1. New York: McGraw-Hill.
Kang, Hyuna, Jung, Seunghoon, Jeoung, Jaewon, Hong, Juwon, & Hong, Taehoon Tao, Weifeng, Lin, Peihui, & Wang, Naiyu (2021). Optimum life-cycle maintenance
(2023). A bi-level reinforcement learning model for optimal scheduling and planning strategies of deteriorating highway bridges subject to seismic hazard by a hybrid
of battery energy storage considering uncertainty in the energy-sharing community. markov decision process model. Structural Safety, 89, Article 102042.
Sustainable Cities and Society, 94, Article 104538. Tate, Eric, Muñoz, Cristina, & Suchan, Jared (2015). Uncertainty and sensitivity analysis
Karakoc, D. B., Barker, K., Zobel, C. W., & Almoghathawi, Y. (2020). Social vulnerability of the HAZUS-MH flood model. Natural Hazards Review, 16(3), Article 04014030.
and equity perspectives on interdependent infrastructure network component https://doi.org/10.1061/(ASCE)NH.1527-6996.0000167
importance. Sustainable Cities and Society, 57, 102072. Tuccillo, Joseph V., & Spielman, Seth E. (2022). A method for measuring coupled
Khalid, Muhammad, Wang, Liang, Wang, Kezhi, Aslam, Nauman, Pan, Cunhua, & individual and social vulnerability to environmental hazards. Annals of the American
Cao, Yue (2023). Deep reinforcement learning-based long-range autonomous valet Association of Geographers, 112(6), 1702–1725.
parking for smart cities. Sustainable Cities and Society, 89, Article 104311. Wei, Shiyin, Bao, Yuequan, & Li, Hui (2020). Optimal Policy for Structure Maintenance:
Kirshen, Paul, Knee, Kelly, & Ruth, Matthias (2008). Climate change and coastal flooding A Deep Reinforcement Learning Framework. Structural Safety, 83, Article 101906.
in metro boston: impacts and adaptation strategies. Climatic Change, 90(4), 453–473. Xiong, Wen, Cai, C. S., Zhang, Rongzhao, Shi, Huiduo, & Xu, Chang (2023). Review of
Koks, Elco E., Jongman, Brenden, Husby, Trond G., & Botzen, Wouter JW (2015). hydraulic bridge failures: Historical statistic analysis, failure modes, and prediction
Combining hazard, exposure and social vulnerability to provide lessons for flood risk methods. Journal of Bridge Engineering, 28(4), Article 03123001.
management. Environmental Science & Policy, 47, 42–52. Xu, Gaowei, & Guo, Fengdi (2022). Sustainability-oriented maintenance management of
Li, Jiawen, & Zhou, Tao (2023). Multiagent deep meta reinforcement learning for sea highway bridge networks based on Q-learning. Sustainable Cities and Society, 81,
computing-based energy management of interconnected grids considering Article 103855.
renewable energy sources in sustainable cities. Sustainable Cities and Society, 99, Yadollahi, Zahra, Gharibi, Reza, Dashti, Rahman, & Jahromi, Amin Torabi (2024).
Article 104917. Optimal energy management of energy hub: A reinforcement learning approach.
Liang, Fayun, Wang, Chen, & Yu, Xiong (2019). Performance of existing methods for Sustainable Cities and Society, 102, Article 105179.
estimation and mitigation of local scour around bridges: Case studies. Journal of Yang, David Y., & Frangopol, Dan M. (2018). Risk-informed bridge ranking at project and
Performance of Constructed Facilities, 33(6), Article 4019060. network levels. Journal of Infrastructure Systems, 24(3), Article 04018018.
Liang, Fayun, Wang, Chen, Huang, Maosong, & Wang, Yu (2017). Experimental Yang, David Y. (2022a). Adaptive risk-based life-cycle management for large-scale
observations and evaluations of formulae for local scour at pile groups in steady structures using deep reinforcement learning and surrogate modeling. Journal of
currents. Marine Georesources & Geotechnology, 35(2), 245–255. Engineering Mechanics, 148(1), Article 04021126.
Liu, Kaijian, & El-Gohary, Nora (2022). Bridge Deterioration Knowledge Ontology for Yang, David Y. (2022b). Deep reinforcement learning–enabled bridge management
Supporting Bridge Document Analytics. Journal of Construction Engineering and considering asset and network risks. Journal of Infrastructure Systems, 28(3), Article
Management, 148(6). 04022023.
Manafpour, Amir, Guler, Ilgin, & Aleksandra Radlinska, Farshad Rajabipour, and Gordon Yang, Yifan, Ng, S. Thomas, Xu, Frank J., & Skitmore, Martin (2018). Towards
Warn. (2018). Stochastic analysis and time-based modeling of concrete bridge deck sustainable and resilient high density cities through better integration of
deterioration. Journal of Bridge Engineering, 23(9). infrastructure networks. Sustainable Cities and Society, 42, 407–422.
Mclemore, S., Zendegui, S., Whiteside, J., Sheppard, M., Gosselin, M., Demir, H., Zhang, Ning, & Alipour, Alice (2020). Two-stage model for optimized mitigation and
Passe, P., & Hayden, M. (2010). Unknown foundation bridges pilot study. Federal recovery of bridge network with final goal of resilience. Transportation Research
Highway Administration & Florida Department of Transportation. Record, 2674(10), 114–123. https://doi.org/10.1177/0361198120935450
Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David, Rusu, Andrei A., Veness, Joel, Zhang, Weili, & Wang, Naiyu (2017). Bridge network maintenance prioritization under
Bellemare, Marc G., Graves, Alex, Riedmiller, Martin, Fidjeland, Andreas K., & budget constraint. Structural Safety, 67, 96–104.
Ostrovski, Georg (2015). Human-level control through deep reinforcement learning. Zhang, Guojing, Liu, Yongjian, Liu, Jiang, Lan, Shiyong, & Yang, Jian (2022). Causes and
Nature, 518(7540), 529–533. statistical characteristics of bridge failures: A review. Journal of Traffic and
Mohammadi, Alireza, & El-Diraby, Tamer (2021). Toward user-oriented asset Transportation Engineering (English Edition), 9(3), 388–406. https://doi.org/10.1016/
management for urban railway systems. Sustainable Cities and Society, 70, Article j.jtte.2021.12.003
102903. Zhang, Zhibo, Agbelie, Bismark R., & Labi, Samuel (2015). Efficiency measurement of
Nastev, Miroslav, & Todorov, Nikolay (2013). Hazus: A standardized methodology for bridge management with data envelopment analysis. Transportation Research Record,
flood risk assessment in Canada. Canadian Water Resources Journal /Revue 2481(1), 1–9.
Canadienne Des Ressources Hydriques, 38(3), 223–231. https://doi.org/10.1080/ Zhou, Qi-Neng, Yuan, Ye, Yang, Dong, & Zhang, Jing (2022). An advanced multi-agent
07011784.2013.801599 reinforcement learning framework of bridge maintenance policy formulation.
National Bridge Inventory Database. “National bridge inventory 2020.” Retrieved June Sustainability, 14(16), 10050.
2, 2023. https://www.fhwa.dot.gov/bridge/nbi.cfm.
14