0% found this document useful (0 votes)
48 views

A Survey of Reinforcement Learning for Optimization in Automation

This survey article explores the application of Reinforcement Learning (RL) for optimization in automation, focusing on manufacturing, energy systems, and robotics. It reviews state-of-the-art RL methods, identifies key challenges such as sample efficiency and safety, and discusses future research directions. The paper serves as a comprehensive guide for researchers and practitioners interested in RL-driven optimization techniques in these domains.

Uploaded by

xinghong1984
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

A Survey of Reinforcement Learning for Optimization in Automation

This survey article explores the application of Reinforcement Learning (RL) for optimization in automation, focusing on manufacturing, energy systems, and robotics. It reviews state-of-the-art RL methods, identifies key challenges such as sample efficiency and safety, and discusses future research directions. The paper serves as a comprehensive guide for researchers and practitioners interested in RL-driven optimization techniques in these domains.

Uploaded by

xinghong1984
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

A Survey of Reinforcement Learning for Optimization in Automation

Ahmad Farooq*1 and Kamran Iqbal2

Abstract— Reinforcement Learning (RL) has become a criti- flexible, and robust optimization algorithms for automation
cal tool for optimization challenges within automation, leading tasks [10], [11]. This has led to a growing body of research
to significant advancements in several areas. This review article on RL-based optimization in various automation domains,
examines the current landscape of RL within automation,
with a particular focus on its roles in manufacturing, energy which is the focus of this survey.
systems, and robotics. It discusses state-of-the-art methods, B. Scope and Contributions
major challenges, and upcoming avenues of research within each
arXiv:2502.09417v1 [cs.LG] 13 Feb 2025

sector, highlighting RL’s capacity to solve intricate optimization This survey paper aims to provide a comprehensive
challenges. The paper reviews the advantages and constraints of overview of RL techniques for optimization in automation.
RL-driven optimization methods in automation. It points out We focus on three key application domains: manufacturing,
prevalent challenges encountered in RL optimization, includ- energy systems, and robotics. In each domain, we review
ing issues related to sample efficiency and scalability; safety
and robustness; interpretability and trustworthiness; transfer representative works that demonstrate the effectiveness of
learning and meta-learning; and real-world deployment and RL in solving optimization problems and discuss the unique
integration. It further explores prospective strategies and future challenges and opportunities.
research pathways to navigate these challenges. Additionally, the The main contributions of this survey are as follows:
survey includes a comprehensive list of relevant research papers, 1. We provide a systematic categorization of RL-based
making it an indispensable guide for scholars and practitioners
keen on exploring this domain. optimization approaches in automation, highlighting their
Index terms: Reinforcement Learning, Automation, strengths and limitations.
Manufacturing, Energy Systems, Robotics 2. We discuss the state-of-the-art RL algorithms used for
optimization in each application domain.
I. INTRODUCTION
3. We identify common challenges faced by RL-based
A. Motivation optimization in automation, including sample efficiency and
Reinforcement learning (RL) has emerged as a effec- scalability; safety and robustness; interpretability and trust-
tive framework for sequential decision-making problems, worthiness; transfer learning and meta-learning; and real-
enabling agents to learn optimal policies through interaction world deployment and integration, and discuss potential
with the environment [1], [2] In recent years, RL has solutions and future research directions.
achieved remarkable success in various domains, including 4. We present a comprehensive bibliography of relevant re-
manufacturing [3], energy systems [4], and robotics [5]. The search papers, serving as a valuable resource for researchers
key advantage of RL lies in its ability to learn from trial- and practitioners interested in this field.
and-error experience without requiring explicit supervision To the best of our knowledge, this is the first survey
or a predefined model. paper that specifically focuses on RL for optimization in
Simultaneously, optimization problems are ubiquitous in automation, covering a wide range of application domains
automation, spanning diverse areas such as production and providing insights into the current state and future
scheduling [6], process control [7], and inventory manage- prospects of this rapidly growing field.
ment [8]. These problems often involve complex decision- C. Organization of the Paper
making under uncertainty, large-scale combinatorial search
spaces, and dynamic environments. Traditional optimiza- The remainder of this survey is organized as follows:
tion approaches, such as mathematical programming and Section II focuses on the applications of RL-based opti-
metaheuristics, have been extensively studied and applied to mization in three major domains: manufacturing, energy
automation problems [9]. However, they often struggle with systems, and robotics. For each domain, we provide a com-
scalability, adaptability, and the need for domain-specific parative analysis of the selected papers, highlighting their key
knowledge. findings, methodologies, and contributions. We also discuss
The intersection of RL and optimization in automation the domain-specific challenges and opportunities. Section
presents a promising avenue for addressing these challenges. III discusses the common challenges faced by RL-based
By leveraging the power of RL to learn from experience and optimization in automation. We present an overview of the
adapt to changing conditions, we can develop more efficient, potential solutions and future research directions to address
these challenges. Finally, Section IV concludes the survey,
This work is not supported by any organization. This work is a preprint summarizing the key takeaways.
version of the paper published in the 2024 IEEE 20th International Conference
on Automation Science and Engineering (CASE) held from August 28 to II. APPLICATION DOMAINS
September 1, 2024, in Bari, Italy. The final version is available at IEEE
Xplore under the conference proceedings. Reinforcement Learning (RL) has revolutionized automa-
*Corresponding Author: Ahmad Farooq tion in Manufacturing, Energy Systems, and Robotics. Figure
Fig. 1: Taxonomy of Application Domains of RL for Optimization in Automation

1 shows these major domains and their sub-domains that we electricity management [29]–[34]. In microgrid management,
will discuss in this section. DRL and MARL approaches enhance grid resilience by
optimizing energy distribution and usage, resulting in im-
A. Manufacturing proved cost efficiency and increased system reliability [35]–
RL is revolutionizing manufacturing through advance- [40]. For renewable energy integration, RL’s capability to
ments in production scheduling, inventory management, handle the variability of renewable sources leads to more
maintenance planning, and process control, showcasing its effective energy dispatch strategies, ensuring grid stability
potential to tackle complex optimization challenges within and maximizing the use of renewable resources [4], [31],
this sector. In production scheduling, RL methods surpass [41]–[44]. HVAC systems, as major energy consumers, see
traditional models by adeptly handling uncertainties, thereby optimizations through DRL and batch RL methods, achieving
enhancing profitability and customer service [6], [12]–[15]. significant reductions in energy consumption while main-
For inventory management, RL techniques, particularly Deep taining occupant comfort [29], [45]–[49]. Looking ahead,
Reinforcement Learning (DRL) and Multi-agent Reinforce- the future promises advancements in adaptive strategies
ment Learning (MARL), offer innovative solutions for man- and bridging the simulation-experiment gap for demand re-
aging stochastic demands and complex supply chains, lead- sponse, enhanced learning efficiency for microgrid manage-
ing to improved sales and reduced wastage [8], [16]–[19]. ment, scalability and adaptability improvements in renewable
Maintenance planning benefits from RL’s dynamic optimiza- energy integration, and wider applicability of pre-trained
tion capabilities, utilizing real-time data for maintenance models for HVAC control. This narrative is encapsulated in
schedules, thus improving system reliability and reducing the Table II, which outlines the key objectives, challenges
downtimes [20]–[24]. In process control, RL’s adaptabil- addressed, RL methodologies, and outcomes for each subdo-
ity ensures product quality and operational efficiency, with main, alongside future research directions and representative
methodologies like Explainable RL and DRL enhancing studies illustrating RL’s significant role in advancing energy
process understanding and control strategies [7], [25]–[28]. systems.
Future directions point towards developing risk-sensitive for-
mulations, leveraging real-world data, and integrating smart C. Robotics
systems to further enhance manufacturing efficiency. Table RL is revolutionizing robotics, making significant strides
I encapsulates these insights by outlining key objectives, across motion planning, grasping and manipulation, multi-
challenges addressed, RL approaches, outcomes, and future robot coordination, and human-robot collaboration, thereby
directions, alongside representative studies that underscore addressing intricate challenges inherent in the field. In motion
RL’s transformative impact on manufacturing. planning, RL, particularly DRL and innovative methodolo-
gies like curriculum learning, empowers robots to adeptly
B. Energy Systems navigate and execute tasks in dynamic environments, en-
RL and DRL are transforming energy systems, offering hancing adaptability and task performance [50]–[52], [63].
innovative solutions across demand response, microgrid man- Grasping and manipulation benefit from DRL’s ability to
agement, renewable energy integration, and Heating, Ventila- process complex sensor inputs, enabling robots to interact
tion, and Air Conditioning (HVAC) control to optimize and with diverse objects and environments with unprecedented
enhance grid stability, sustainability, and energy efficiency. flexibility and efficiency [53], [54], [64]–[66]. Multi-robot
Demand response strategies benefit from DRL and MARL to coordination leverages DRL and MARL to facilitate so-
dynamically adjust energy usage in response to utility signals, phisticated collaborative strategies among robots, optimizing
achieving up to 22% energy savings and more efficient collective actions to achieve common goals in complex and
Feature/Criteria Production Scheduling Inventory Management Maintenance Planning Process Control
Balance supply with
Minimize downtime, Ensure product quality,
Optimize allocation of demand, minimize costs,
Key Objectives extend asset life, ensure operational efficiency, and
tasks to resources over time and ensure timely product
safety safety
availability
Stochastic demand,
Handling complexities and Dynamic maintenance
perishable goods, Controlling complex
Challenges Addressed uncertainties in scheduling planning under system
multi-echelon supply manufacturing processes
tasks degradation
chains
DQN [17], PPO [19], A2C Multi-Agent Actor Critic
DQN [6], Distributional RL [16], DRL [8], Cooperative [24], Deep Q-learning [22], TRPO [26], DDPG [7],
RL Approaches
[15], DRL [13], A2C [12] MARL [18] [23], Q-learning [21] Dynamic Q-table [25]

Superiority over traditional Comprehensive roadmap Adaptation of RL for


Analysis of RL/DRL
mixed integer linear for DRL deployment, novel Statistical Process Control
applications, dynamic
Methodology Highlights programming models, frameworks for multi-agent (SPC), integration of
maintenance policies using
competitive performance hierarchical inventory domain expertise,
Q-learning
against heuristic methods management apprenticeship learning
Maximized sales, Reduced maintenance Enhanced SPC adaptability,
Increased profitability,
minimized perishable activities, enhanced fleet improved control policies,
Outcomes reduced inventory levels,
product wastage, optimized availability, adapted handling nonlinearities in
improved customer service
supply chain needs maintenance policies manufacturing
Advanced cooperative
Development of strategies among agents, Integration with smart Utilization of real-world
Future Directions risk-sensitive formulations, leveraging custom factory systems, leveraging data for training, improving
leveraging real-world data GPU-parallelized condition monitoring data RL training efficiency
environments
Ogunfowora and Najjaran Viharos and Jakab [25],
Hubbs et al. [12], Shi et al. Boute et al. [8], Sultana et
[20], Yousefi et al. [21], Nian, Liu, and Huang [7],
[13], Guo et al. [14], Esteso al. [16], De Moor et al.
Yousefi et al. [22], Andrade Kuhnle et al. [26],
Representative Studies et al. [6], Mowbray et al. [17], Khirwar et al. [18],
et al. [23], Thomas et al. Mowbray et al. [27], and
[15] Leluc et al. [19]
[24] Li, Du, Jiang [28]

TABLE I: Comparison of RL Approaches for Optimization in Manufacturing

Renewable Energy
Feature/Criteria Demand Response Microgrid Management HVAC Control
Integration
Seamlessly integrate
Optimize energy usage and Enhance grid resilience and Optimize HVAC systems
renewable energy into
cost in response to utility efficiency, optimizing for energy efficiency
Key Objectives power systems, maximizing
signals, enhancing grid energy distribution and without compromising
utilization while ensuring
stability usage occupant comfort
grid stability
Adapting to dynamic
Managing diverse energy Addressing variability and Balancing energy savings
pricing and demand,
Challenges Addressed sources, ensuring reliable unpredictability of with thermal comfort
improving energy
and efficient operation renewable sources requirements
consumption efficiency
PPO [29], [30], MARL PPO [29], Batch
[31], [34], DQN [32], DQN [39], PPO [37], A3C MA-DRL [42], Q-learning Constrained Munchausen
RL Approaches
MADDPG [33] [35], [31] Deep Q-learning [47],
Q-learning [48], A3C [45]
Meta-learning for Architecture optimization
Expert knowledge
simulation-experiment gap, Analysis on RL’s role, for demand response, safe
integration, operational
Methodology Highlights multi-agent systems for multi-task learning for control strategies, and
flexibility with proximal
residential energy system-wide optimization energy consumption
policy optimization
management reduction
Improved energy Enhanced management of
Up to 22% energy savings, Reduction in HVAC energy
distribution and cost complex energy flows,
Outcomes efficient electricity usage consumption, improved
efficiency, increased significant performance
management operational efficiency
resilience improvements
Scalability and adaptability Adaptability to diverse
Advanced cooperative Enhanced learning
of RL methods, robustness buildings, pre-training
Future Directions strategies, bridging the efficiency, integration with
against environmental models, transfer learning
simulation-experiment gap smart grid technologies
changes applications
Nakabi and Toivanen [35], Yang et al. [41], Cao et al. Azuatalam et al. [29],
Azuatalam et al. [29], Jang
Hu and Kwasinski [36], [42], Chen et al. ( [43], Zhong et al. [45], Sierla et
et al. [30], Ahrarinouri et
Zhang et al. [37], Zhang et Sivamayil et al. [44], Perera al. [46], Liu et al. [47],
Representative Studies al. [31], Lu et al. [32], Lu et
al. [38], Shojaeighadikolaei and Kamalaruban [4], Yuan et al. [48], Biemann
al. [33], Zhang et al. [34]
et al. [39], Du and Li [40] Ahrarinouri et al. [31] et al. [49]

TABLE II: Comparison of RL Approaches for Optimization in Energy Systems


Feature/Criteria Motion Planning Manipulation Multi-robot Coordination Human-robot Collab
Enable robots to navigate Enhance robotic interaction Optimize collaborative Facilitate effective
Key Objectives and perform tasks in with objects and actions of multiple robots interaction and cooperation
dynamic environments environments for a common goal between humans and robots
Adaptation to human
Navigating complex and Adapting to diverse Resource competition,
behaviors, ensuring safety
Challenges Addressed dynamic environments, objects, leveraging obstacle avoidance in
and making intelligent
learning from interaction complex sensor inputs cooperative tasks
decisions
Multi-Robot Coordination
with Deep Reinforcement
PPO [50], Q-learning [51], DQN [57], [58], SAC [59],
DDPG [53], Double DQN Learning (MRCDRL) [55],
RL Approaches Soft Actor-Critic (SAC) [60], DDPG [61], Double
[54] Multi Agent Deep
[52] DQN [62]
Reinforcement Learning
[56]
MRCDRL [55] for Human-centered DRL,
EfficientLPT [52] for space Visuo-motor feedback,
cooperative action, MARL explainable RL for
Methodology Highlights robots, curriculum learning dexterous grasping in
for pick-and-place interaction quality
for robotic arms sparse environments
optimization enhancement
Effective resource
Enhanced coordination in
Improved planning Significant outperformance allocation and dynamic
packaging tasks,
Outcomes accuracy, learning from in grasping tasks, obstacle avoidance,
adaptability to user habits
human demonstrations adaptability to grippers applicability in smart
during collaboration
manufacturing
Scalable coordination
Integration with sensory Incorporation of more Personalized adaptation to
strategies for larger teams,
Future Directions feedback, real-time complex sensory human habits, enhancing
integration with smart
adaptation modalities, tactile feedback safety and interpretability
environments
Ghadirzadeh et al. [57],
Joshi et al. [54], Schuck et Wang and Deng [55], Lan
Wang et al. [63], Cao et al. Iucci et al. [58], Shafti et
al. [53], Han et al. [64], et al. [56], Yang [67], Lan
[52], Zhou et al. [50], Yu al. [59], Cai et al. [62],
Representative Studies Rivera et al. [65], Beigomi et al. [68], Sadhu and
and Chang [51] Thumm et al. [60],
and Zhu [66] Konar [69], Khamassi [70]
El-Shamouty et al. [61]

TABLE III: Comparison of RL Approaches for Optimization in Robotics

dynamic tasks [55], [56], [67]–[70]. Human-robot collabora- real-world applications where data collection is expensive or
tion (HRC) sees advancements through DRL’s capacity for time-consuming [71], [72].
learning from interactions and adapting to human behaviors, Current efforts to enhance sample efficiency and scalabil-
significantly improving cooperation in tasks ranging from ity include making past samples more reflective of the current
manufacturing to daily assistance [57]–[62]. Future research model [71], [72], using evolution strategies and efficient
directions emphasize the integration of sensory feedback for memory in experience replay [73], [74], incorporating offline
real-time adaptation in motion planning, enhancing grasping data for online learning [75], [76], and leveraging adaptive
tasks with complex sensory and tactile feedback, developing learning techniques [77], [78].
scalable coordination strategies for larger robot teams, and Future research should aim at algorithms with adaptive
personalizing HRC to adapt to human habits while enhancing learning rates, domain-specific knowledge integration, effi-
safety and interpretability. Table III succinctly encapsu- cient computational resource use, and cross-domain transfer
lates these domains by detailing key objectives, challenges learning to further improve sample efficiency and scalability
addressed, RL approaches, methodology highlights, out- in RL applications.
comes, and future directions, alongside representative studies
demonstrating RL’s transformative impact on robotics. B. Safety and Robustness
Ensuring safety and robustness in RL is crucial, especially
III. CHALLENGES, STATE OF THE ART, AND FUTURE for applications in critical domains like autonomous driving
DIRECTIONS and healthcare. Safe RL algorithms aim to learn policies that
satisfy safety constraints during both training and deployment
There has been a significant progress in the field of
[107].
RL for optimization in automation; however, there are still
Current strategies for ensuring safety include developing
challenges to be addressed. Table IV gives a comparison of
concepts of safety robustness [79], frameworks for robust
these challenges, along with the state of the art in the field
policies [80], tackling observational adversarial attacks [81],
and future directions that we will discuss in this section.
integrating robust-control-barrier-function layers [82], man-
aging safety requirements with robust action governor [83],
A. Sample Efficiency and Scalability
enforcing safety via robust Model Predictive Control (MPC)
Sample efficiency and scalability are vital in RL to min- [84], offering robustness guarantees [85], [86], improv-
imize training data and ensure solutions scale with task ing policy robustness through falsification-based adversarial
complexity. These challenges are particularly important in learning [87], and inducing a safety curriculum [108].
Challenges Description RL Approaches State of the Art Future Directions Related Studies
Model-based RL with Tianyue Cao [71],
PPO, SAC, planning algorithms, Focus on algorithms Florian E. Dorner
Reducing the data
Model-based Policy off-policy learning with adaptive learning [72], Suri et al. [73]
Sample Efficiency needed for learning
Optimization with prioritized rates and Yang et al. [74], Ball
and Scalability and ensuring
(MBPO), Dreamer, experience replay, and cross-domain transfer et al. [75] Li et al.
scalability
IMPALA, Acme large-scale distributed learning [76], Ly et al. [77],
RL systems Wang et al. [78]
Hao Xiong and
Xiumin Diao [79], Li
Constrained Policy
Formal methods for et al. [80], Liu et al.
Optimization (CPO),
policy verification, Integrating formal [81], Emam et al.
Ensuring RL policies Lyapunov-based
Safety and robust adversarial verification methods [82], Li et al. [83],
perform safely under approaches,
Robustness training, and safe and enhancing [84], Queeney et al.
uncertain conditions State-wise Safe RL,
exploration human-RL interaction [85], Md Asifur
Probabilistic
techniques Rahman and Sarra M.
constraint methods
Alqahtani [86], Wang
et al. [87]
Improving the
Glanois et al. [88],
foundation of
Feature attribution, Duo Xu and Faramarz
Developing RL MARL, Q-learning, interpretable models
policy distillation, and Fekri [89]), Mansour
Interpretability and models whose actions Deep RL, DQN, PPO, and applying
interpretable models et al. [90], Eckstein et
Trustworthiness are transparent and TD3, SAC self-supervised
like decision trees and al. [91], Alharin et al.
understandable learning for
attention mechanisms [92], Shi et al. [93],
interpretable
Dao et al. [94].
representations
Context-based
A3C, Meta-RL, Developing Hospedales et al. [95],
Enabling RL systems meta-learning
Meta-RL with algorithms that Guo et al. [96],
to rapidly adapt to frameworks,
Transfer Learning Context-conditioned generalize across a Narvekar et al. [97],
new tasks using multi-task learning
and Meta-learning Action Translator wider range of tasks Varma et al. [98],
knowledge from past techniques, and
(MCAT), TD3 and enhance transfer Sasso et al. [99], Ren
experiences fine-tuning
learning capabilities et al. [100]
pre-trained models
Behavior-Regularized
Model-ENsemble
(BREMEN), Scalable RL Prioritizing Dulac-Arnold et al.
Bridging the gap Distributional architectures, robust deployment efficiency, [101], Matsushima et
Real-world between theoretical Maximum a Posteriori policy deployment enhancing human-RL al. [102], Yahmed et
Deployment and advancements and Policy Optimization strategies, and interaction, and al. [103], Li et al.
Integration practical utility in RL (DMPO), Distributed open-source fostering [104], Garau-Luis et
deployment Distributional benchmarks and academia-industry al. [105], Kanso and
Deterministic Policy toolkits collaboration Patra [106]
Gradient (D4PG)

TABLE IV: Current Challenges, State of the Art, and Future Directions for RL for Optimization in Automation

Future research directions should focus on developing human feedback, advancing feature discovery techniques, and
scalable safe RL algorithms for high-dimensional continuous applying self-supervised learning for natural interpretability,
control tasks, integrating formal verification methods with aiming for a deeper human understanding of RL behaviors.
RL, and improving the adaptability of safe RL algorithms to
D. Transfer Learning and Meta-learning
dynamic environments.
Transfer learning and meta-learning address the need for
C. Interpretability and Trustworthiness RL systems to efficiently adapt to new tasks using knowledge
Ensuring RL models are interpretable and trustworthy is from past experiences, aiming to improve learning efficiency
essential for applications in healthcare, autonomous systems, and generalization across various environments.
and finance, requiring transparent, understandable, and reli- Hospedales et al. [95] highlight meta-learning’s role in
able decision-making processes. adaptability across tasks. Guo et al. [96] develop an action
Current research to improve interpretability includes dis- translator for meta-RL to enhance exploration and efficiency.
tinguishing between interpretability and explainability [88], Narvekar et al. [97] present a curriculum learning framework
integrating symbolic logic with deep RL for transparency that uses task sequencing for improved learning in complex
[89], achieving policy interpretability in structured environ- scenarios. Varma et al. [98] demonstrate the benefits of using
ments [90], interpreting RL modeling in cognitive sciences pre-trained models like ResNet50 to boost RL performance.
[91], discovering interpretable features in vision-based RL Sasso et al. [99] and Ren et al. [100] investigate multi-source
[93], and introducing sparse evidence collection for human transfer learning and meta-RL for fast adaptation based on
interpretation [94]. human preferences.
Advancements will focus on foundational improvements Future efforts will focus on algorithms that better gener-
to make models intrinsically understandable, incorporating alize across diverse tasks, with a push towards unsupervised
and self-supervised learning to advance transfer learning [6] A. Esteso, D. Peidro, J. Mula, and M. Dı́az-Madroñero, “Reinforcement
capabilities. There’s also a growing interest in models that learning applied to production planning and control,” International
Journal of Production Research, vol. 61, no. 16, pp. 5772–5789, 2023.
autonomously leverage past knowledge. [7] R. Nian, J. Liu, and B. Huang, “A review on reinforcement learning:
Introduction and applications in industrial process control,” Computers
E. Real-world Deployment and Integration & Chemical Engineering, vol. 139, p. 106886, 2020.
Real-world RL model deployment involves overcoming the [8] R. N. Boute, J. Gijsbrechts, W. Van Jaarsveld, and N. Vanvuchelen,
divide between theoretical research and practical application, “Deep reinforcement learning for inventory control: A roadmap,”
European Journal of Operational Research, vol. 298, no. 2, pp. 401–
ensuring model robustness, and aligning simulated training 412, 2022.
environments with real-world conditions. [9] C. Blum and A. Roli, “Metaheuristics in combinatorial optimization:
Dulac-Arnold et al. [101] highlight real-world RL de- Overview and conceptual comparison,” ACM computing surveys
(CSUR), vol. 35, no. 3, pp. 268–308, 2003.
ployment challenges, introducing benchmarks for complex- [10] Y. Li, “Deep reinforcement learning: An overview,” arXiv preprint
ity. Matsushima et al. [102] focus on efficient deployment arXiv:1701.07274, 2017.
with minimal data. Yahmed et al. [103] outline deployment [11] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath,
challenges and emphasize the need for solutions. Li et al. “Deep reinforcement learning: A brief survey,” IEEE Signal Process-
ing Magazine, vol. 34, no. 6, pp. 26–38, 2017.
[104] advocate for incorporating human feedback during [12] C. D. Hubbs, C. Li, N. V. Sahinidis, I. E. Grossmann, and J. M. Wassick,
deployment for safety. Garau-Luis et al. [105] discuss DRL “A deep reinforcement learning approach for chemical production
deployment advancements, while Kanso and Patra [106] scheduling,” Computers & Chemical Engineering, vol. 141, p. 106982,
2020.
discuss engineering solutions for RL scalability. [13] D. Shi, W. Fan, Y. Xiao, T. Lin, and C. Xing, “Intelligent scheduling of
Future efforts will center on algorithms and frameworks discrete automated production line via deep reinforcement learning,”
enhancing deployment efficiency and real-world relevance, International journal of production research, vol. 58, no. 11, pp. 3362–
3380, 2020.
generalization from simulations to reality, improving human- [14] F. Guo, Y. Li, A. Liu, and Z. Liu, “A reinforcement learning method to
RL interactions, and robust, scalable deployment platforms. scheduling problem of steel production process,” in Journal of Physics:
Domain-specific challenges and academia-industry collabo- Conference Series, vol. 1486, no. 7. IOP Publishing, 2020, p. 072035.
ration are pivotal for RL’s real-world success. [15] M. Mowbray, D. Zhang, and E. A. D. R. Chanona, “Distributional rein-
forcement learning for scheduling of chemical production processes,”
IV. CONCLUSION arXiv preprint arXiv:2203.00636, 2022.
[16] N. N. Sultana, H. Meisheri, V. Baniwal, S. Nath, B. Ravin-
Reinforcement Learning (RL) has showcased its vast ca- dran, and H. Khadilkar, “Reinforcement learning for multi-product
pabilities in sectors such as manufacturing, energy systems, multi-node inventory management in supply chains,” arXiv preprint
arXiv:2006.04037, 2020.
and robotics, driven by deep learning innovations that tackle
[17] B. J. De Moor, J. Gijsbrechts, and R. N. Boute, “Reward shaping to
complex challenges. Despite these advancements, real-world improve the performance of deep reinforcement learning in perishable
deployment introduces challenges requiring extensive re- inventory management,” European Journal of Operational Research,
search for practical RL implementation. This review empha- vol. 301, no. 2, pp. 535–545, 2022.
[18] M. Khirwar, K. S. Gurumoorthy, A. A. Jain, and S. Manchenahally,
sizes the need for improved sample efficiency, model safety, “Cooperative multi-agent reinforcement learning for inventory man-
interpretability, and real-world integration strategies. To meet agement,” in Joint European Conference on Machine Learning and
these requirements, a comprehensive approach is necessary, Knowledge Discovery in Databases. Springer, 2023, pp. 619–634.
[19] R. Leluc, E. Kadoche, A. Bertoncello, and S. Gourvénec, “Marlim:
integrating algorithmic advancements, domain-specific in- Multi-agent reinforcement learning for inventory management,” arXiv
sights, robust benchmarks, and understanding the balance preprint arXiv:2308.01649, 2023.
between theory and practice. Moreover, integrating human [20] O. Ogunfowora and H. Najjaran, “Reinforcement and deep reinforce-
ment learning-based solutions for machine maintenance planning,
feedback and ethical considerations is crucial for the respon- scheduling policies, and optimization,” Journal of Manufacturing
sible deployment of RL. Ultimately, RL’s transition from Systems, vol. 70, pp. 244–263, 2023.
theory to a key AI component marks significant progress, [21] N. Yousefi, S. Tsianikas, and D. W. Coit, “Reinforcement learning for
with ongoing efforts expected to overcome current obstacles, dynamic condition-based maintenance of a system with individually
repairable components,” Quality Engineering, vol. 32, no. 3, pp. 388–
leveraging RL’s full potential in intelligent decision-making 408, 2020.
and system optimization. [22] ——, “Dynamic maintenance model for a repairable multi-component
system using deep reinforcement learning,” Quality Engineering,
References vol. 34, no. 1, pp. 16–35, 2022.
[1] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. [23] P. Andrade, C. Silva, B. Ribeiro, and B. F. Santos, “Aircraft main-
MIT press, 2018. tenance check scheduling using reinforcement learning,” Aerospace,
[2] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. vol. 8, no. 4, p. 113, 2021.
Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski [24] J. Thomas, M. P. Hernández, A. K. Parlikad, and R. Piechocki,
et al., “Human-level control through deep reinforcement learning,” “Network maintenance planning via multi-agent reinforcement learn-
nature, vol. 518, no. 7540, pp. 529–533, 2015. ing,” in 2021 IEEE International Conference on Systems, Man, and
[3] C. Li, P. Zheng, Y. Yin, B. Wang, and L. Wang, “Deep reinforcement Cybernetics (SMC). IEEE, 2021, pp. 2289–2295.
learning in smart manufacturing: A review and prospects,” CIRP [25] Z. J. Viharos and R. Jakab, “Reinforcement learning for statistical
Journal of Manufacturing Science and Technology, vol. 40, pp. 75– process control in manufacturing,” Measurement, vol. 182, p. 109616,
101, 2023. 2021.
[4] A. Perera and P. Kamalaruban, “Applications of reinforcement learning [26] A. Kuhnle, M. C. May, L. Schäfer, and G. Lanza, “Explainable
in energy systems,” Renewable and Sustainable Energy Reviews, vol. reinforcement learning in production control of job shop manufacturing
137, p. 110618, 2021. system,” International Journal of Production Research, vol. 60, no. 19,
[5] J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning in pp. 5812–5834, 2022.
robotics: A survey,” The International Journal of Robotics Research, [27] M. Mowbray, R. Smith, E. A. Del Rio-Chanona, and D. Zhang, “Using
vol. 32, no. 11, pp. 1238–1274, 2013. process data to generate an optimal control policy via apprenticeship
and reinforcement learning,” AIChE Journal, vol. 67, no. 9, p. e17306, [48] X. Yuan, Y. Pan, J. Yang, W. Wang, and Z. Huang, “Study on the
2021. application of reinforcement learning in the operation optimization of
[28] Y. Li, J. Du, and W. Jiang, “Reinforcement learning for process control hvac system,” in Building Simulation, vol. 14. Springer, 2021, pp.
with application in semiconductor manufacturing,” IISE Transactions, 75–87.
pp. 1–15, 2023. [49] M. Biemann, F. Scheller, X. Liu, and L. Huang, “Experimental evalu-
[29] D. Azuatalam, W.-L. Lee, F. de Nijs, and A. Liebman, “Reinforcement ation of model-free reinforcement learning algorithms for continuous
learning for whole-building hvac control and demand response,” hvac control,” Applied Energy, vol. 298, p. 117164, 2021.
Energy and AI, vol. 2, p. 100020, 2020. [50] D. Zhou, R. Jia, and H. Yao, “Robotic arm motion planning based
[30] D. Jang, L. Spangher, M. Khattar, U. Agwan, and C. Spanos, “Using on curriculum reinforcement learning,” in 2021 6th International
meta reinforcement learning to bridge the gap between simulation and Conference on Control and Robotics Engineering (ICCRE). IEEE,
experiment in energy demand response,” in Proceedings of the Twelfth 2021, pp. 44–49.
ACM International Conference on Future Energy Systems, 2021, pp. [51] T. Yu and Q. Chang, “Reinforcement learning based user-guided
483–487. motion planning for human-robot collaboration,” arXiv preprint
[31] M. Ahrarinouri, M. Rastegar, and A. R. Seifi, “Multiagent reinforce- arXiv:2207.00492, 2022.
ment learning for energy management in residential buildings,” IEEE [52] Y. Cao, S. Wang, X. Zheng, W. Ma, X. Xie, and L. Liu, “Reinforcement
Transactions on Industrial Informatics, vol. 17, no. 1, pp. 659–666, learning with prior policy guidance for motion planning of dual-arm
2020. free-floating space robot,” Aerospace Science and Technology, vol.
[32] R. Lu, R. Bai, Z. Luo, J. Jiang, M. Sun, and H.-T. Zhang, “Deep 136, p. 108098, 2023.
reinforcement learning-based demand response for smart facilities [53] M. Schuck, J. Brüdigam, A. Capone, S. Sosnowski, and S. Hirche,
energy management,” IEEE Transactions on Industrial Electronics, “Dext-gen: Dexterous grasping in sparse reward environments with
vol. 69, no. 8, pp. 8554–8565, 2021. full orientation control,” arXiv preprint arXiv:2206.13966, 2022.
[33] R. Lu, Y.-C. Li, Y. Li, J. Jiang, and Y. Ding, “Multi-agent deep rein- [54] S. Joshi, S. Kumra, and F. Sahin, “Robotic grasping using deep
forcement learning based demand response for discrete manufacturing reinforcement learning,” in 2020 IEEE 16th International Conference
systems energy management,” Applied Energy, vol. 276, p. 115473, on Automation Science and Engineering (CASE). IEEE, 2020, pp.
2020. 1461–1466.
[34] X. Zhang, R. Lu, J. Jiang, S. H. Hong, and W. S. Song, “Testbed [55] D. Wang, H. Deng, and Z. Pan, “Mrcdrl: Multi-robot coordination with
implementation of reinforcement learning-based demand response deep reinforcement learning,” Neurocomputing, vol. 406, pp. 68–76,
energy management system,” Applied energy, vol. 297, p. 117131, 2020.
2021. [56] X. Lan, Y. Qiao, and B. Lee, “Towards pick and place multi
[35] T. A. Nakabi and P. Toivanen, “Deep reinforcement learning for energy robot coordination using multi-agent deep reinforcement learning,”
management in a microgrid with flexible demand,” Sustainable Energy, in 2021 7th International Conference on Automation, Robotics and
Grids and Networks, vol. 25, p. 100413, 2021. Applications (ICARA). IEEE, 2021, pp. 85–89.
[36] R. Hu and A. Kwasinski, “Energy management for microgrids using a [57] A. Ghadirzadeh, X. Chen, W. Yin, Z. Yi, M. Björkman, and D. Kragic,
reinforcement learning algorithm,” in 2021 IEEE Green Energy and “Human-centered collaborative robots with deep reinforcement learn-
Smart Systems Conference (IGESSC). IEEE, 2021, pp. 1–6. ing,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 566–571,
[37] B. Zhang, Z. Chen, and A. M. Ghias, “Deep reinforcement learning- 2020.
based energy management strategy for a microgrid with flexible loads,” [58] A. Iucci, A. Hata, A. Terra, R. Inam, and I. Leite, “Explainable
in 2023 International Conference on Power Energy Systems and reinforcement learning for human-robot collaboration,” in 2021 20th
Applications (ICoPESA). IEEE, 2023, pp. 187–191. International Conference on Advanced Robotics (ICAR). IEEE, 2021,
[38] W. Zhang, H. Qiao, X. Xu, J. Chen, J. Xiao, K. Zhang, Y. Long, pp. 927–934.
and Y. Zuo, “Energy management in microgrid based on deep rein- [59] A. Shafti, J. Tjomsland, W. Dudley, and A. A. Faisal, “Real-world
forcement learning with expert knowledge,” in International Workshop human-robot collaborative reinforcement learning,” in 2020 IEEE/RSJ
on Automation, Control, and Communication Engineering (IWACCE International Conference on Intelligent Robots and Systems (IROS).
2022), vol. 12492. SPIE, 2022, pp. 275–284. IEEE, 2020, pp. 11 161–11 166.
[39] A. Shojaeighadikolaei, A. Ghasemi, A. G. Bardas, R. Ahmadi, and [60] J. Thumm, F. Trost, and M. Althoff, “Human-robot gym: Benchmarking
M. Hashemi, “Weather-aware data-driven microgrid energy manage- reinforcement learning in human-robot collaboration,” arXiv preprint
ment using deep reinforcement learning,” in 2021 North American arXiv:2310.06208, 2023.
Power Symposium (NAPS). IEEE, 2021, pp. 1–6. [61] M. El-Shamouty, X. Wu, S. Yang, M. Albus, and M. F. Huber, “Towards
[40] Y. Du and F. Li, “Intelligent multi-microgrid energy management based safe human-robot collaboration using deep reinforcement learning,”
on deep neural network and model-free reinforcement learning,” IEEE in 2020 IEEE international conference on robotics and automation
Transactions on Smart Grid, vol. 11, no. 2, pp. 1066–1076, 2019. (ICRA). IEEE, 2020, pp. 4899–4905.
[41] T. Yang, L. Zhao, W. Li, and A. Y. Zomaya, “Reinforcement learning [62] Z. Cai, Z. Feng, L. Zhou, C. Ai, H. Shao, X. Yang et al., “A
in sustainable energy and electric systems: A survey,” Annual Reviews framework and algorithm for human-robot collaboration based on
in Control, vol. 49, pp. 145–163, 2020. multimodal reinforcement learning,” Computational Intelligence and
[42] D. Cao, W. Hu, J. Zhao, G. Zhang, B. Zhang, Z. Liu, Z. Chen, and Neuroscience, vol. 2022, 2022.
F. Blaabjerg, “Reinforcement learning and its applications in modern [63] J. Wang, T. Zhang, N. Ma, Z. Li, H. Ma, F. Meng, and M. Q.-H.
power and energy systems: A review,” Journal of modern power Meng, “A survey of learning-based robot motion planning,” IET Cyber-
systems and clean energy, vol. 8, no. 6, pp. 1029–1042, 2020. Systems and Robotics, vol. 3, no. 4, pp. 302–314, 2021.
[43] X. Chen, G. Qu, Y. Tang, S. Low, and N. Li, “Reinforcement learning [64] D. Han, B. Mulyana, V. Stankovic, and S. Cheng, “A survey on deep
for selective key applications in power systems: Recent advances and reinforcement learning algorithms for robotic manipulation,” Sensors,
future challenges,” IEEE Transactions on Smart Grid, vol. 13, no. 4, vol. 23, no. 7, p. 3762, 2023.
pp. 2935–2958, 2022. [65] P. Rivera, J. Oh, E. Valarezo, G. Ryu, H. Jung, J. H. Lee, J. G. Jeong,
[44] K. Sivamayil, E. Rajasekar, B. Aljafari, S. Nikolovski, S. Vairavasun- and T.-S. Kim, “Reward shaping to learn natural object manipulation
daram, and I. Vairavasundaram, “A systematic study on reinforcement with an anthropomorphic robotic hand and hand pose priors via on-
learning based applications,” Energies, vol. 16, no. 3, p. 1512, 2023. policy reinforcement learning,” in 2021 International Conference on
[45] X. Zhong, Z. Zhang, R. Zhang, and C. Zhang, “End-to-end deep Information and Communication Technology Convergence (ICTC).
reinforcement learning control for hvac systems in office buildings,” IEEE, 2021, pp. 167–171.
Designs, vol. 6, no. 3, p. 52, 2022. [66] B. Beigomi and Z. H. Zhu, “Enhancing robotic grasping of free-floating
[46] S. Sierla, H. Ihasalo, and V. Vyatkin, “A review of reinforcement learn- targets with soft actor-critic algorithm and tactile sensors: a focus on
ing applications to control of heating, ventilation and air conditioning the pre-grasp stage,” in AIAA SCITECH 2024 Forum, 2024, p. 2419.
systems,” Energies, vol. 15, no. 10, p. 3526, 2022. [67] X. Yang, “Reinforcement learning for multi-robot system: A review,” in
[47] H.-Y. Liu, B. Balaji, S. Gao, R. Gupta, and D. Hong, “Safe hvac control 2021 2nd International Conference on Computing and Data Science
via batch reinforcement learning,” in 2022 ACM/IEEE 13th Interna- (CDS). IEEE, 2021, pp. 203–213.
tional Conference on Cyber-Physical Systems (ICCPS). IEEE, 2022, [68] X. Lan, Y. Qiao, and B. Lee, “Coordination of a multi robot
pp. 181–192. system for pick and place using reinforcement learning,” in 2022 2nd
International Conference on Computers and Automation (CompAuto). [92] A. Alharin, T.-N. Doan, and M. Sartipi, “Reinforcement learning
IEEE, 2022, pp. 87–92. interpretation methods: A survey,” IEEE Access, vol. 8, pp. 171 058–
[69] A. K. Sadhu and A. Konar, Multi-agent coordination: A reinforcement 171 077, 2020.
learning approach. John Wiley & Sons, 2020. [93] W. Shi, G. Huang, S. Song, Z. Wang, T. Lin, and C. Wu, “Self-
[70] M. Khamassi, “Adaptive coordination of multiple learning strategies in supervised discovering of interpretable features for reinforcement
brains and robots,” in Theory and Practice of Natural Computing: 9th learning,” IEEE Transactions on Pattern Analysis and Machine
International Conference, TPNC 2020, Taoyuan, Taiwan, December Intelligence, vol. 44, no. 5, pp. 2712–2724, 2020.
7–9, 2020, Proceedings 9. Springer, 2020, pp. 3–22. [94] G. Dao, W. H. Huff, and M. Lee, “Learning sparse evidence-driven
[71] T. Cao, “Study of sample efficiency improvements for reinforcement interpretation to understand deep reinforcement learning agents,” in
learning algorithms,” in 2020 IEEE Integrated STEM Education 2021 IEEE Symposium Series on Computational Intelligence (SSCI).
Conference (ISEC). IEEE, 2020, pp. 1–1. IEEE, 2021, pp. 1–7.
[72] F. E. Dorner, “Measuring progress in deep reinforcement learning [95] T. Hospedales, A. Antoniou, P. Micaelli, and A. Storkey, “Meta-
sample efficiency,” arXiv preprint arXiv:2102.04881, 2021. learning in neural networks: A survey,” IEEE transactions on pattern
[73] K. Suri, X. Q. Shi, K. N. Plataniotis, and Y. A. Lawryshyn, “Maximum analysis and machine intelligence, vol. 44, no. 9, pp. 5149–5169, 2021.
mutation reinforcement learning for scalable control,” arXiv preprint [96] Y. Guo, Q. Wu, and H. Lee, “Learning action translator for meta
arXiv:2007.13690, 2020. reinforcement learning on sparse-reward tasks,” in Proceedings of the
[74] D. Yang, X. Qin, X. Xu, C. Li, and G. Wei, “Sample efficient AAAI Conference on Artificial Intelligence, vol. 36, no. 6, 2022, pp.
reinforcement learning method via high efficient episodic memory,” 6792–6800.
IEEE Access, vol. 8, pp. 129 274–129 284, 2020. [97] S. Narvekar, B. Peng, M. Leonetti, J. Sinapov, M. E. Taylor, and
[75] P. J. Ball, L. Smith, I. Kostrikov, and S. Levine, “Efficient online P. Stone, “Curriculum learning for reinforcement learning domains:
reinforcement learning with offline data,” in International Conference A framework and survey,” Journal of Machine Learning Research,
on Machine Learning. PMLR, 2023, pp. 1577–1594. vol. 21, no. 181, pp. 1–50, 2020.
[76] G. Li, Y. Wei, Y. Chi, Y. Gu, and Y. Chen, “Breaking the sample [98] N. S. Varma, V. Sinha et al., “Effective reinforcement learning using
size barrier in model-based reinforcement learning with a generative transfer learning,” in 2022 IEEE International Conference on Data
model,” Advances in neural information processing systems, vol. 33, Science and Information System (ICDSIS). IEEE, 2022, pp. 1–6.
pp. 12 861–12 872, 2020. [99] R. Sasso, “Multi-source transfer learning for deep model-based
[77] A. Ly, R. Dazeley, P. Vamplew, F. Cruz, and S. Aryal, “Elastic step ddpg: reinforcement learning,” Ph.D. dissertation, 2021.
Multi-step reinforcement learning for improved sample efficiency,” in [100] Z. Ren, A. Liu, Y. Liang, J. Peng, and J. Ma, “Efficient meta
2023 International Joint Conference on Neural Networks (IJCNN). reinforcement learning for preference-based fast adaptation,” Advances
IEEE, 2023, pp. 01–06. in Neural Information Processing Systems, vol. 35, pp. 15 502–15 515,
[78] Z. Wang, J. Wang, Q. Zhou, B. Li, and H. Li, “Sample-efficient 2022.
reinforcement learning via conservative model-based actor-critic,” in [101] G. Dulac-Arnold, N. Levine, D. J. Mankowitz, J. Li, C. Paduraru,
Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, S. Gowal, and T. Hester, “An empirical investigation of the challenges of
no. 8, 2022, pp. 8612–8620. real-world reinforcement learning,” arXiv preprint arXiv:2003.11881,
2020.
[79] H. Xiong and X. Diao, “Safety robustness of reinforcement learning
[102] T. Matsushima, H. Furuta, Y. Matsuo, O. Nachum, and S. Gu,
policies: A view from robust control,” Neurocomputing, vol. 422, pp.
“Deployment-efficient reinforcement learning via model-based offline
12–21, 2021.
optimization,” arXiv preprint arXiv:2006.03647, 2020.
[80] Z. Li, C. Hu, Y. Wang, Y. Yang, and S. E. Li, “Safe reinforcement
[103] A. H. Yahmed, A. A. Abbassi, A. Nikanjam, H. Li, and F. Khomh,
learning with dual robustness,” arXiv preprint arXiv:2309.06835,
“Deploying deep reinforcement learning systems: A taxonomy of
2023.
challenges,” in 2023 IEEE International Conference on Software
[81] Z. Liu, Z. Guo, Z. Cen, H. Zhang, J. Tan, B. Li, and D. Zhao, Maintenance and Evolution (ICSME). IEEE, 2023, pp. 26–38.
“On the robustness of safe reinforcement learning under observational [104] Z. Li, K. Xu, L. Liu, L. Li, D. Ye, and P. Zhao, “Deploying
perturbations,” arXiv preprint arXiv:2205.14691, 2022. offline reinforcement learning with human feedback,” arXiv preprint
[82] Y. Emam, G. Notomista, P. Glotfelter, Z. Kira, and M. Egerstedt, “Safe arXiv:2303.07046, 2023.
reinforcement learning using robust control barrier functions,” IEEE [105] J. J. Garau-Luis, E. Crawley, and B. Cameron, “Evaluating the
Robotics and Automation Letters, 2022. progress of deep reinforcement learning in the real world: align-
[83] Y. Li, N. Li, H. E. Tseng, A. Girard, D. Filev, and I. Kolmanovsky, “Safe ing domain-agnostic and domain-specific research,” arXiv preprint
reinforcement learning using robust action governor,” in Learning for arXiv:2107.03015, 2021.
Dynamics and Control. PMLR, 2021, pp. 1093–1104. [106] A. Kanso and K. Patra, “Engineering a platform for reinforcement
[84] M. Zanon and S. Gros, “Safe reinforcement learning using robust mpc,” learning workloads,” in Proceedings of the 1st International Con-
IEEE Transactions on Automatic Control, vol. 66, no. 8, pp. 3638– ference on AI Engineering: Software Engineering for AI, 2022, pp.
3652, 2020. 88–89.
[85] J. Queeney, E. C. Ozcan, I. C. Paschalidis, and C. G. Cassandras, [107] J. Garcıa and F. Fernández, “A comprehensive survey on safe rein-
“Optimal transport perturbations for safe reinforcement learning with forcement learning,” Journal of Machine Learning Research, vol. 16,
robustness guarantees,” arXiv preprint arXiv:2301.13375, 2023. no. 1, pp. 1437–1480, 2015.
[86] M. A. Rahman and S. Alqahtani, “Task-agnostic safety for rein- [108] M. Turchetta, A. Kolobov, S. Shah, A. Krause, and A. Agarwal, “Safe
forcement learning,” in Proceedings of the 16th ACM Workshop on reinforcement learning via curriculum induction,” Advances in Neural
Artificial Intelligence and Security, 2023, pp. 139–148. Information Processing Systems, vol. 33, pp. 12 151–12 162, 2020.
[87] X. Wang, S. Nair, and M. Althoff, “Falsification-based robust ad-
versarial reinforcement learning,” in 2020 19th IEEE International
Conference on Machine Learning and Applications (ICMLA). IEEE,
2020, pp. 205–212.
[88] C. Glanois, P. Weng, M. Zimmer, D. Li, T. Yang, J. Hao, and W. Liu,
“A survey on interpretable reinforcement learning,” arXiv preprint
arXiv:2112.13112, 2021.
[89] D. Xu and F. Fekri, “Interpretable model-based hierarchical reinforce-
ment learning using inductive logic programming,” arXiv preprint
arXiv:2106.11417, 2021.
[90] Y. Mansour, M. Moshkovitz, and C. Rudin, “There is no accuracy-
interpretability tradeoff in reinforcement learning for mazes,” arXiv
preprint arXiv:2206.04266, 2022.
[91] M. K. Eckstein, L. Wilbrecht, and A. G. Collins, “What do rein-
forcement learning models measure? interpreting model parameters in
cognition and neuroscience,” Current opinion in behavioral sciences,
vol. 41, pp. 128–137, 2021.

You might also like