A Survey of Reinforcement Learning for Optimization in Automation

This survey article explores the application of Reinforcement Learning (RL) for optimization in automation, focusing on manufacturing, energy systems, and robotics. It reviews state-of-the-art RL methods, identifies key challenges such as sample efficiency and safety, and discusses future research directions. The paper serves as a comprehensive guide for researchers and practitioners interested in RL-driven optimization techniques in these domains.

Uploaded by

xinghong1984

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views

A Survey of Reinforcement Learning for Optimization in Automation

Uploaded by

xinghong1984

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

A Survey of Reinforcement Learning for Optimization in Automation

Ahmad Farooq*1 and Kamran Iqbal2

Abstract— Reinforcement Learning (RL) has become a criti- flexible, and robust optimization algorithms for automation
cal tool for optimization challenges within automation, leading tasks [10], [11]. This has led to a growing body of research
to significant advancements in several areas. This review article on RL-based optimization in various automation domains,
examines the current landscape of RL within automation,
with a particular focus on its roles in manufacturing, energy which is the focus of this survey.
systems, and robotics. It discusses state-of-the-art methods, B. Scope and Contributions
major challenges, and upcoming avenues of research within each
arXiv:2502.09417v1 [cs.LG] 13 Feb 2025

sector, highlighting RL’s capacity to solve intricate optimization This survey paper aims to provide a comprehensive
challenges. The paper reviews the advantages and constraints of overview of RL techniques for optimization in automation.
RL-driven optimization methods in automation. It points out We focus on three key application domains: manufacturing,
prevalent challenges encountered in RL optimization, includ- energy systems, and robotics. In each domain, we review
ing issues related to sample efficiency and scalability; safety
and robustness; interpretability and trustworthiness; transfer representative works that demonstrate the effectiveness of
learning and meta-learning; and real-world deployment and RL in solving optimization problems and discuss the unique
integration. It further explores prospective strategies and future challenges and opportunities.
research pathways to navigate these challenges. Additionally, the The main contributions of this survey are as follows:
survey includes a comprehensive list of relevant research papers, 1. We provide a systematic categorization of RL-based
making it an indispensable guide for scholars and practitioners
keen on exploring this domain. optimization approaches in automation, highlighting their
Index terms: Reinforcement Learning, Automation, strengths and limitations.
Manufacturing, Energy Systems, Robotics 2. We discuss the state-of-the-art RL algorithms used for
optimization in each application domain.
I. INTRODUCTION
3. We identify common challenges faced by RL-based
A. Motivation optimization in automation, including sample efficiency and
Reinforcement learning (RL) has emerged as a effec- scalability; safety and robustness; interpretability and trust-
tive framework for sequential decision-making problems, worthiness; transfer learning and meta-learning; and real-
enabling agents to learn optimal policies through interaction world deployment and integration, and discuss potential
with the environment [1], [2] In recent years, RL has solutions and future research directions.
achieved remarkable success in various domains, including 4. We present a comprehensive bibliography of relevant re-
manufacturing [3], energy systems [4], and robotics [5]. The search papers, serving as a valuable resource for researchers
key advantage of RL lies in its ability to learn from trial- and practitioners interested in this field.
and-error experience without requiring explicit supervision To the best of our knowledge, this is the first survey
or a predefined model. paper that specifically focuses on RL for optimization in
Simultaneously, optimization problems are ubiquitous in automation, covering a wide range of application domains
automation, spanning diverse areas such as production and providing insights into the current state and future
scheduling [6], process control [7], and inventory manage- prospects of this rapidly growing field.
ment [8]. These problems often involve complex decision- C. Organization of the Paper
making under uncertainty, large-scale combinatorial search
spaces, and dynamic environments. Traditional optimiza- The remainder of this survey is organized as follows:
tion approaches, such as mathematical programming and Section II focuses on the applications of RL-based opti-
metaheuristics, have been extensively studied and applied to mization in three major domains: manufacturing, energy
automation problems [9]. However, they often struggle with systems, and robotics. For each domain, we provide a com-
scalability, adaptability, and the need for domain-specific parative analysis of the selected papers, highlighting their key
knowledge. findings, methodologies, and contributions. We also discuss
The intersection of RL and optimization in automation the domain-specific challenges and opportunities. Section
presents a promising avenue for addressing these challenges. III discusses the common challenges faced by RL-based
By leveraging the power of RL to learn from experience and optimization in automation. We present an overview of the
adapt to changing conditions, we can develop more efficient, potential solutions and future research directions to address
these challenges. Finally, Section IV concludes the survey,
This work is not supported by any organization. This work is a preprint summarizing the key takeaways.
version of the paper published in the 2024 IEEE 20th International Conference
on Automation Science and Engineering (CASE) held from August 28 to II. APPLICATION DOMAINS
September 1, 2024, in Bari, Italy. The final version is available at IEEE
Xplore under the conference proceedings. Reinforcement Learning (RL) has revolutionized automa-
*Corresponding Author: Ahmad Farooq tion in Manufacturing, Energy Systems, and Robotics. Figure
Fig. 1: Taxonomy of Application Domains of RL for Optimization in Automation

1 shows these major domains and their sub-domains that we electricity management [29]–[34]. In microgrid management,
will discuss in this section. DRL and MARL approaches enhance grid resilience by
optimizing energy distribution and usage, resulting in im-
A. Manufacturing proved cost efficiency and increased system reliability [35]–
RL is revolutionizing manufacturing through advance- [40]. For renewable energy integration, RL’s capability to
ments in production scheduling, inventory management, handle the variability of renewable sources leads to more
maintenance planning, and process control, showcasing its effective energy dispatch strategies, ensuring grid stability
potential to tackle complex optimization challenges within and maximizing the use of renewable resources [4], [31],
this sector. In production scheduling, RL methods surpass [41]–[44]. HVAC systems, as major energy consumers, see
traditional models by adeptly handling uncertainties, thereby optimizations through DRL and batch RL methods, achieving
enhancing profitability and customer service [6], [12]–[15]. significant reductions in energy consumption while main-
For inventory management, RL techniques, particularly Deep taining occupant comfort [29], [45]–[49]. Looking ahead,
Reinforcement Learning (DRL) and Multi-agent Reinforce- the future promises advancements in adaptive strategies
ment Learning (MARL), offer innovative solutions for man- and bridging the simulation-experiment gap for demand re-
aging stochastic demands and complex supply chains, lead- sponse, enhanced learning efficiency for microgrid manage-
ing to improved sales and reduced wastage [8], [16]–[19]. ment, scalability and adaptability improvements in renewable
Maintenance planning benefits from RL’s dynamic optimiza- energy integration, and wider applicability of pre-trained
tion capabilities, utilizing real-time data for maintenance models for HVAC control. This narrative is encapsulated in
schedules, thus improving system reliability and reducing the Table II, which outlines the key objectives, challenges
downtimes [20]–[24]. In process control, RL’s adaptabil- addressed, RL methodologies, and outcomes for each subdo-
ity ensures product quality and operational efficiency, with main, alongside future research directions and representative
methodologies like Explainable RL and DRL enhancing studies illustrating RL’s significant role in advancing energy
process understanding and control strategies [7], [25]–[28]. systems.
Future directions point towards developing risk-sensitive for-
mulations, leveraging real-world data, and integrating smart C. Robotics
systems to further enhance manufacturing efficiency. Table RL is revolutionizing robotics, making significant strides
I encapsulates these insights by outlining key objectives, across motion planning, grasping and manipulation, multi-
challenges addressed, RL approaches, outcomes, and future robot coordination, and human-robot collaboration, thereby
directions, alongside representative studies that underscore addressing intricate challenges inherent in the field. In motion
RL’s transformative impact on manufacturing. planning, RL, particularly DRL and innovative methodolo-
gies like curriculum learning, empowers robots to adeptly
B. Energy Systems navigate and execute tasks in dynamic environments, en-
RL and DRL are transforming energy systems, offering hancing adaptability and task performance [50]–[52], [63].
innovative solutions across demand response, microgrid man- Grasping and manipulation benefit from DRL’s ability to
agement, renewable energy integration, and Heating, Ventila- process complex sensor inputs, enabling robots to interact
tion, and Air Conditioning (HVAC) control to optimize and with diverse objects and environments with unprecedented
enhance grid stability, sustainability, and energy efficiency. flexibility and efficiency [53], [54], [64]–[66]. Multi-robot
Demand response strategies benefit from DRL and MARL to coordination leverages DRL and MARL to facilitate so-
dynamically adjust energy usage in response to utility signals, phisticated collaborative strategies among robots, optimizing
achieving up to 22% energy savings and more efficient collective actions to achieve common goals in complex and
Feature/Criteria Production Scheduling Inventory Management Maintenance Planning Process Control
Balance supply with
Minimize downtime, Ensure product quality,
Optimize allocation of demand, minimize costs,
Key Objectives extend asset life, ensure operational efficiency, and
tasks to resources over time and ensure timely product
safety safety
availability
Stochastic demand,
Handling complexities and Dynamic maintenance
perishable goods, Controlling complex
Challenges Addressed uncertainties in scheduling planning under system
multi-echelon supply manufacturing processes
tasks degradation
chains
DQN [17], PPO [19], A2C Multi-Agent Actor Critic
DQN [6], Distributional RL [16], DRL [8], Cooperative [24], Deep Q-learning [22], TRPO [26], DDPG [7],
RL Approaches
[15], DRL [13], A2C [12] MARL [18] [23], Q-learning [21] Dynamic Q-table [25]

Superiority over traditional Comprehensive roadmap Adaptation of RL for

Analysis of RL/DRL
mixed integer linear for DRL deployment, novel Statistical Process Control
applications, dynamic
Methodology Highlights programming models, frameworks for multi-agent (SPC), integration of
maintenance policies using
competitive performance hierarchical inventory domain expertise,
Q-learning
against heuristic methods management apprenticeship learning
Maximized sales, Reduced maintenance Enhanced SPC adaptability,
Increased profitability,
minimized perishable activities, enhanced fleet improved control policies,
Outcomes reduced inventory levels,
product wastage, optimized availability, adapted handling nonlinearities in
improved customer service
supply chain needs maintenance policies manufacturing
Advanced cooperative
Development of strategies among agents, Integration with smart Utilization of real-world
Future Directions risk-sensitive formulations, leveraging custom factory systems, leveraging data for training, improving
leveraging real-world data GPU-parallelized condition monitoring data RL training efficiency
environments
Ogunfowora and Najjaran Viharos and Jakab [25],
Hubbs et al. [12], Shi et al. Boute et al. [8], Sultana et
[20], Yousefi et al. [21], Nian, Liu, and Huang [7],
[13], Guo et al. [14], Esteso al. [16], De Moor et al.
Yousefi et al. [22], Andrade Kuhnle et al. [26],
Representative Studies et al. [6], Mowbray et al. [17], Khirwar et al. [18],
et al. [23], Thomas et al. Mowbray et al. [27], and
[15] Leluc et al. [19]
[24] Li, Du, Jiang [28]

TABLE I: Comparison of RL Approaches for Optimization in Manufacturing

Renewable Energy
Feature/Criteria Demand Response Microgrid Management HVAC Control
Integration
Seamlessly integrate
Optimize energy usage and Enhance grid resilience and Optimize HVAC systems
renewable energy into
cost in response to utility efficiency, optimizing for energy efficiency
Key Objectives power systems, maximizing
signals, enhancing grid energy distribution and without compromising
utilization while ensuring
stability usage occupant comfort
grid stability
Adapting to dynamic
Managing diverse energy Addressing variability and Balancing energy savings
pricing and demand,
Challenges Addressed sources, ensuring reliable unpredictability of with thermal comfort
improving energy
and efficient operation renewable sources requirements
consumption efficiency
PPO [29], [30], MARL PPO [29], Batch
[31], [34], DQN [32], DQN [39], PPO [37], A3C MA-DRL [42], Q-learning Constrained Munchausen
RL Approaches
MADDPG [33] [35], [31] Deep Q-learning [47],
Q-learning [48], A3C [45]
Meta-learning for Architecture optimization
Expert knowledge
simulation-experiment gap, Analysis on RL’s role, for demand response, safe
integration, operational
Methodology Highlights multi-agent systems for multi-task learning for control strategies, and
flexibility with proximal
residential energy system-wide optimization energy consumption
policy optimization
management reduction
Improved energy Enhanced management of
Up to 22% energy savings, Reduction in HVAC energy
distribution and cost complex energy flows,
Outcomes efficient electricity usage consumption, improved
efficiency, increased significant performance
management operational efficiency
resilience improvements
Scalability and adaptability Adaptability to diverse
Advanced cooperative Enhanced learning
of RL methods, robustness buildings, pre-training
Future Directions strategies, bridging the efficiency, integration with
against environmental models, transfer learning
simulation-experiment gap smart grid technologies
changes applications
Nakabi and Toivanen [35], Yang et al. [41], Cao et al. Azuatalam et al. [29],
Azuatalam et al. [29], Jang
Hu and Kwasinski [36], [42], Chen et al. ( [43], Zhong et al. [45], Sierla et
et al. [30], Ahrarinouri et
Zhang et al. [37], Zhang et Sivamayil et al. [44], Perera al. [46], Liu et al. [47],
Representative Studies al. [31], Lu et al. [32], Lu et
al. [38], Shojaeighadikolaei and Kamalaruban [4], Yuan et al. [48], Biemann
al. [33], Zhang et al. [34]
et al. [39], Du and Li [40] Ahrarinouri et al. [31] et al. [49]

TABLE II: Comparison of RL Approaches for Optimization in Energy Systems

Feature/Criteria Motion Planning Manipulation Multi-robot Coordination Human-robot Collab
Enable robots to navigate Enhance robotic interaction Optimize collaborative Facilitate effective
Key Objectives and perform tasks in with objects and actions of multiple robots interaction and cooperation
dynamic environments environments for a common goal between humans and robots
Adaptation to human
Navigating complex and Adapting to diverse Resource competition,
behaviors, ensuring safety
Challenges Addressed dynamic environments, objects, leveraging obstacle avoidance in
and making intelligent
learning from interaction complex sensor inputs cooperative tasks
decisions
Multi-Robot Coordination
with Deep Reinforcement
PPO [50], Q-learning [51], DQN [57], [58], SAC [59],
DDPG [53], Double DQN Learning (MRCDRL) [55],
RL Approaches Soft Actor-Critic (SAC) [60], DDPG [61], Double
[54] Multi Agent Deep
[52] DQN [62]
Reinforcement Learning
[56]
MRCDRL [55] for Human-centered DRL,
EfficientLPT [52] for space Visuo-motor feedback,
cooperative action, MARL explainable RL for
Methodology Highlights robots, curriculum learning dexterous grasping in
for pick-and-place interaction quality
for robotic arms sparse environments
optimization enhancement
Effective resource
Enhanced coordination in
Improved planning Significant outperformance allocation and dynamic
packaging tasks,
Outcomes accuracy, learning from in grasping tasks, obstacle avoidance,
adaptability to user habits
human demonstrations adaptability to grippers applicability in smart
during collaboration
manufacturing
Scalable coordination
Integration with sensory Incorporation of more Personalized adaptation to
strategies for larger teams,
Future Directions feedback, real-time complex sensory human habits, enhancing
integration with smart
adaptation modalities, tactile feedback safety and interpretability
environments
Ghadirzadeh et al. [57],
Joshi et al. [54], Schuck et Wang and Deng [55], Lan
Wang et al. [63], Cao et al. Iucci et al. [58], Shafti et
al. [53], Han et al. [64], et al. [56], Yang [67], Lan
[52], Zhou et al. [50], Yu al. [59], Cai et al. [62],
Representative Studies Rivera et al. [65], Beigomi et al. [68], Sadhu and
and Chang [51] Thumm et al. [60],
and Zhu [66] Konar [69], Khamassi [70]
El-Shamouty et al. [61]

TABLE III: Comparison of RL Approaches for Optimization in Robotics

dynamic tasks [55], [56], [67]–[70]. Human-robot collabora- real-world applications where data collection is expensive or
tion (HRC) sees advancements through DRL’s capacity for time-consuming [71], [72].
learning from interactions and adapting to human behaviors, Current efforts to enhance sample efficiency and scalabil-
significantly improving cooperation in tasks ranging from ity include making past samples more reflective of the current
manufacturing to daily assistance [57]–[62]. Future research model [71], [72], using evolution strategies and efficient
directions emphasize the integration of sensory feedback for memory in experience replay [73], [74], incorporating offline
real-time adaptation in motion planning, enhancing grasping data for online learning [75], [76], and leveraging adaptive
tasks with complex sensory and tactile feedback, developing learning techniques [77], [78].
scalable coordination strategies for larger robot teams, and Future research should aim at algorithms with adaptive
personalizing HRC to adapt to human habits while enhancing learning rates, domain-specific knowledge integration, effi-
safety and interpretability. Table III succinctly encapsu- cient computational resource use, and cross-domain transfer
lates these domains by detailing key objectives, challenges learning to further improve sample efficiency and scalability
addressed, RL approaches, methodology highlights, out- in RL applications.
comes, and future directions, alongside representative studies
demonstrating RL’s transformative impact on robotics. B. Safety and Robustness
Ensuring safety and robustness in RL is crucial, especially
III. CHALLENGES, STATE OF THE ART, AND FUTURE for applications in critical domains like autonomous driving
DIRECTIONS and healthcare. Safe RL algorithms aim to learn policies that
satisfy safety constraints during both training and deployment
There has been a significant progress in the field of
[107].
RL for optimization in automation; however, there are still
Current strategies for ensuring safety include developing
challenges to be addressed. Table IV gives a comparison of
concepts of safety robustness [79], frameworks for robust
these challenges, along with the state of the art in the field
policies [80], tackling observational adversarial attacks [81],
and future directions that we will discuss in this section.
integrating robust-control-barrier-function layers [82], man-
aging safety requirements with robust action governor [83],
A. Sample Efficiency and Scalability
enforcing safety via robust Model Predictive Control (MPC)
Sample efficiency and scalability are vital in RL to min- [84], offering robustness guarantees [85], [86], improv-
imize training data and ensure solutions scale with task ing policy robustness through falsification-based adversarial
complexity. These challenges are particularly important in learning [87], and inducing a safety curriculum [108].
Challenges Description RL Approaches State of the Art Future Directions Related Studies
Model-based RL with Tianyue Cao [71],
PPO, SAC, planning algorithms, Focus on algorithms Florian E. Dorner
Reducing the data
Model-based Policy off-policy learning with adaptive learning [72], Suri et al. [73]
Sample Efficiency needed for learning
Optimization with prioritized rates and Yang et al. [74], Ball
and Scalability and ensuring
(MBPO), Dreamer, experience replay, and cross-domain transfer et al. [75] Li et al.
scalability
IMPALA, Acme large-scale distributed learning [76], Ly et al. [77],
RL systems Wang et al. [78]
Hao Xiong and
Xiumin Diao [79], Li
Constrained Policy
Formal methods for et al. [80], Liu et al.
Optimization (CPO),
policy verification, Integrating formal [81], Emam et al.
Ensuring RL policies Lyapunov-based
Safety and robust adversarial verification methods [82], Li et al. [83],
perform safely under approaches,
Robustness training, and safe and enhancing [84], Queeney et al.
uncertain conditions State-wise Safe RL,
exploration human-RL interaction [85], Md Asifur
Probabilistic
techniques Rahman and Sarra M.
constraint methods
Alqahtani [86], Wang
et al. [87]
Improving the
Glanois et al. [88],
foundation of
Feature attribution, Duo Xu and Faramarz
Developing RL MARL, Q-learning, interpretable models
policy distillation, and Fekri [89]), Mansour
Interpretability and models whose actions Deep RL, DQN, PPO, and applying
interpretable models et al. [90], Eckstein et
Trustworthiness are transparent and TD3, SAC self-supervised
like decision trees and al. [91], Alharin et al.
understandable learning for
attention mechanisms [92], Shi et al. [93],
interpretable
Dao et al. [94].
representations
Context-based
A3C, Meta-RL, Developing Hospedales et al. [95],
Enabling RL systems meta-learning
Meta-RL with algorithms that Guo et al. [96],
to rapidly adapt to frameworks,
Transfer Learning Context-conditioned generalize across a Narvekar et al. [97],
new tasks using multi-task learning
and Meta-learning Action Translator wider range of tasks Varma et al. [98],
knowledge from past techniques, and
(MCAT), TD3 and enhance transfer Sasso et al. [99], Ren
experiences fine-tuning
learning capabilities et al. [100]
pre-trained models
Behavior-Regularized
Model-ENsemble
(BREMEN), Scalable RL Prioritizing Dulac-Arnold et al.
Bridging the gap Distributional architectures, robust deployment efficiency, [101], Matsushima et
Real-world between theoretical Maximum a Posteriori policy deployment enhancing human-RL al. [102], Yahmed et
Deployment and advancements and Policy Optimization strategies, and interaction, and al. [103], Li et al.
Integration practical utility in RL (DMPO), Distributed open-source fostering [104], Garau-Luis et
deployment Distributional benchmarks and academia-industry al. [105], Kanso and
Deterministic Policy toolkits collaboration Patra [106]
Gradient (D4PG)

TABLE IV: Current Challenges, State of the Art, and Future Directions for RL for Optimization in Automation

Future research directions should focus on developing human feedback, advancing feature discovery techniques, and
scalable safe RL algorithms for high-dimensional continuous applying self-supervised learning for natural interpretability,
control tasks, integrating formal verification methods with aiming for a deeper human understanding of RL behaviors.
RL, and improving the adaptability of safe RL algorithms to
D. Transfer Learning and Meta-learning
dynamic environments.
Transfer learning and meta-learning address the need for
C. Interpretability and Trustworthiness RL systems to efficiently adapt to new tasks using knowledge
Ensuring RL models are interpretable and trustworthy is from past experiences, aiming to improve learning efficiency
essential for applications in healthcare, autonomous systems, and generalization across various environments.
and finance, requiring transparent, understandable, and reli- Hospedales et al. [95] highlight meta-learning’s role in
able decision-making processes. adaptability across tasks. Guo et al. [96] develop an action
Current research to improve interpretability includes dis- translator for meta-RL to enhance exploration and efficiency.
tinguishing between interpretability and explainability [88], Narvekar et al. [97] present a curriculum learning framework
integrating symbolic logic with deep RL for transparency that uses task sequencing for improved learning in complex
[89], achieving policy interpretability in structured environ- scenarios. Varma et al. [98] demonstrate the benefits of using
ments [90], interpreting RL modeling in cognitive sciences pre-trained models like ResNet50 to boost RL performance.
[91], discovering interpretable features in vision-based RL Sasso et al. [99] and Ren et al. [100] investigate multi-source
[93], and introducing sparse evidence collection for human transfer learning and meta-RL for fast adaptation based on
interpretation [94]. human preferences.
Advancements will focus on foundational improvements Future efforts will focus on algorithms that better gener-
to make models intrinsically understandable, incorporating alize across diverse tasks, with a push towards unsupervised
and self-supervised learning to advance transfer learning [6] A. Esteso, D. Peidro, J. Mula, and M. Dı́az-Madroñero, “Reinforcement
capabilities. There’s also a growing interest in models that learning applied to production planning and control,” International
Journal of Production Research, vol. 61, no. 16, pp. 5772–5789, 2023.
autonomously leverage past knowledge. [7] R. Nian, J. Liu, and B. Huang, “A review on reinforcement learning:
Introduction and applications in industrial process control,” Computers
E. Real-world Deployment and Integration & Chemical Engineering, vol. 139, p. 106886, 2020.
Real-world RL model deployment involves overcoming the [8] R. N. Boute, J. Gijsbrechts, W. Van Jaarsveld, and N. Vanvuchelen,
divide between theoretical research and practical application, “Deep reinforcement learning for inventory control: A roadmap,”
European Journal of Operational Research, vol. 298, no. 2, pp. 401–
ensuring model robustness, and aligning simulated training 412, 2022.
environments with real-world conditions. [9] C. Blum and A. Roli, “Metaheuristics in combinatorial optimization:
Dulac-Arnold et al. [101] highlight real-world RL de- Overview and conceptual comparison,” ACM computing surveys
(CSUR), vol. 35, no. 3, pp. 268–308, 2003.
ployment challenges, introducing benchmarks for complex- [10] Y. Li, “Deep reinforcement learning: An overview,” arXiv preprint
ity. Matsushima et al. [102] focus on efficient deployment arXiv:1701.07274, 2017.
with minimal data. Yahmed et al. [103] outline deployment [11] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath,
challenges and emphasize the need for solutions. Li et al. “Deep reinforcement learning: A brief survey,” IEEE Signal Process-
ing Magazine, vol. 34, no. 6, pp. 26–38, 2017.
[104] advocate for incorporating human feedback during [12] C. D. Hubbs, C. Li, N. V. Sahinidis, I. E. Grossmann, and J. M. Wassick,
deployment for safety. Garau-Luis et al. [105] discuss DRL “A deep reinforcement learning approach for chemical production
deployment advancements, while Kanso and Patra [106] scheduling,” Computers & Chemical Engineering, vol. 141, p. 106982,
2020.
discuss engineering solutions for RL scalability. [13] D. Shi, W. Fan, Y. Xiao, T. Lin, and C. Xing, “Intelligent scheduling of
Future efforts will center on algorithms and frameworks discrete automated production line via deep reinforcement learning,”
enhancing deployment efficiency and real-world relevance, International journal of production research, vol. 58, no. 11, pp. 3362–
3380, 2020.
generalization from simulations to reality, improving human- [14] F. Guo, Y. Li, A. Liu, and Z. Liu, “A reinforcement learning method to
RL interactions, and robust, scalable deployment platforms. scheduling problem of steel production process,” in Journal of Physics:
Domain-specific challenges and academia-industry collabo- Conference Series, vol. 1486, no. 7. IOP Publishing, 2020, p. 072035.
ration are pivotal for RL’s real-world success. [15] M. Mowbray, D. Zhang, and E. A. D. R. Chanona, “Distributional rein-
forcement learning for scheduling of chemical production processes,”
IV. CONCLUSION arXiv preprint arXiv:2203.00636, 2022.
[16] N. N. Sultana, H. Meisheri, V. Baniwal, S. Nath, B. Ravin-
Reinforcement Learning (RL) has showcased its vast ca- dran, and H. Khadilkar, “Reinforcement learning for multi-product
pabilities in sectors such as manufacturing, energy systems, multi-node inventory management in supply chains,” arXiv preprint
arXiv:2006.04037, 2020.
and robotics, driven by deep learning innovations that tackle
[17] B. J. De Moor, J. Gijsbrechts, and R. N. Boute, “Reward shaping to
complex challenges. Despite these advancements, real-world improve the performance of deep reinforcement learning in perishable
deployment introduces challenges requiring extensive re- inventory management,” European Journal of Operational Research,
search for practical RL implementation. This review empha- vol. 301, no. 2, pp. 535–545, 2022.
[18] M. Khirwar, K. S. Gurumoorthy, A. A. Jain, and S. Manchenahally,
sizes the need for improved sample efficiency, model safety, “Cooperative multi-agent reinforcement learning for inventory man-
interpretability, and real-world integration strategies. To meet agement,” in Joint European Conference on Machine Learning and
these requirements, a comprehensive approach is necessary, Knowledge Discovery in Databases. Springer, 2023, pp. 619–634.
[19] R. Leluc, E. Kadoche, A. Bertoncello, and S. Gourvénec, “Marlim:
integrating algorithmic advancements, domain-specific in- Multi-agent reinforcement learning for inventory management,” arXiv
sights, robust benchmarks, and understanding the balance preprint arXiv:2308.01649, 2023.
between theory and practice. Moreover, integrating human [20] O. Ogunfowora and H. Najjaran, “Reinforcement and deep reinforce-
ment learning-based solutions for machine maintenance planning,
feedback and ethical considerations is crucial for the respon- scheduling policies, and optimization,” Journal of Manufacturing
sible deployment of RL. Ultimately, RL’s transition from Systems, vol. 70, pp. 244–263, 2023.
theory to a key AI component marks significant progress, [21] N. Yousefi, S. Tsianikas, and D. W. Coit, “Reinforcement learning for
with ongoing efforts expected to overcome current obstacles, dynamic condition-based maintenance of a system with individually
repairable components,” Quality Engineering, vol. 32, no. 3, pp. 388–
leveraging RL’s full potential in intelligent decision-making 408, 2020.
and system optimization. [22] ——, “Dynamic maintenance model for a repairable multi-component
system using deep reinforcement learning,” Quality Engineering,
References vol. 34, no. 1, pp. 16–35, 2022.
[1] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. [23] P. Andrade, C. Silva, B. Ribeiro, and B. F. Santos, “Aircraft main-
MIT press, 2018. tenance check scheduling using reinforcement learning,” Aerospace,
[2] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. vol. 8, no. 4, p. 113, 2021.
Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski [24] J. Thomas, M. P. Hernández, A. K. Parlikad, and R. Piechocki,
et al., “Human-level control through deep reinforcement learning,” “Network maintenance planning via multi-agent reinforcement learn-
nature, vol. 518, no. 7540, pp. 529–533, 2015. ing,” in 2021 IEEE International Conference on Systems, Man, and
[3] C. Li, P. Zheng, Y. Yin, B. Wang, and L. Wang, “Deep reinforcement Cybernetics (SMC). IEEE, 2021, pp. 2289–2295.
learning in smart manufacturing: A review and prospects,” CIRP [25] Z. J. Viharos and R. Jakab, “Reinforcement learning for statistical
Journal of Manufacturing Science and Technology, vol. 40, pp. 75– process control in manufacturing,” Measurement, vol. 182, p. 109616,
101, 2023. 2021.
[4] A. Perera and P. Kamalaruban, “Applications of reinforcement learning [26] A. Kuhnle, M. C. May, L. Schäfer, and G. Lanza, “Explainable
in energy systems,” Renewable and Sustainable Energy Reviews, vol. reinforcement learning in production control of job shop manufacturing
137, p. 110618, 2021. system,” International Journal of Production Research, vol. 60, no. 19,
[5] J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning in pp. 5812–5834, 2022.
robotics: A survey,” The International Journal of Robotics Research, [27] M. Mowbray, R. Smith, E. A. Del Rio-Chanona, and D. Zhang, “Using
vol. 32, no. 11, pp. 1238–1274, 2013. process data to generate an optimal control policy via apprenticeship
and reinforcement learning,” AIChE Journal, vol. 67, no. 9, p. e17306, [48] X. Yuan, Y. Pan, J. Yang, W. Wang, and Z. Huang, “Study on the
2021. application of reinforcement learning in the operation optimization of
[28] Y. Li, J. Du, and W. Jiang, “Reinforcement learning for process control hvac system,” in Building Simulation, vol. 14. Springer, 2021, pp.
with application in semiconductor manufacturing,” IISE Transactions, 75–87.
pp. 1–15, 2023. [49] M. Biemann, F. Scheller, X. Liu, and L. Huang, “Experimental evalu-
[29] D. Azuatalam, W.-L. Lee, F. de Nijs, and A. Liebman, “Reinforcement ation of model-free reinforcement learning algorithms for continuous
learning for whole-building hvac control and demand response,” hvac control,” Applied Energy, vol. 298, p. 117164, 2021.
Energy and AI, vol. 2, p. 100020, 2020. [50] D. Zhou, R. Jia, and H. Yao, “Robotic arm motion planning based
[30] D. Jang, L. Spangher, M. Khattar, U. Agwan, and C. Spanos, “Using on curriculum reinforcement learning,” in 2021 6th International
meta reinforcement learning to bridge the gap between simulation and Conference on Control and Robotics Engineering (ICCRE). IEEE,
experiment in energy demand response,” in Proceedings of the Twelfth 2021, pp. 44–49.
ACM International Conference on Future Energy Systems, 2021, pp. [51] T. Yu and Q. Chang, “Reinforcement learning based user-guided
483–487. motion planning for human-robot collaboration,” arXiv preprint
[31] M. Ahrarinouri, M. Rastegar, and A. R. Seifi, “Multiagent reinforce- arXiv:2207.00492, 2022.
ment learning for energy management in residential buildings,” IEEE [52] Y. Cao, S. Wang, X. Zheng, W. Ma, X. Xie, and L. Liu, “Reinforcement
Transactions on Industrial Informatics, vol. 17, no. 1, pp. 659–666, learning with prior policy guidance for motion planning of dual-arm
2020. free-floating space robot,” Aerospace Science and Technology, vol.
[32] R. Lu, R. Bai, Z. Luo, J. Jiang, M. Sun, and H.-T. Zhang, “Deep 136, p. 108098, 2023.
reinforcement learning-based demand response for smart facilities [53] M. Schuck, J. Brüdigam, A. Capone, S. Sosnowski, and S. Hirche,
energy management,” IEEE Transactions on Industrial Electronics, “Dext-gen: Dexterous grasping in sparse reward environments with
vol. 69, no. 8, pp. 8554–8565, 2021. full orientation control,” arXiv preprint arXiv:2206.13966, 2022.
[33] R. Lu, Y.-C. Li, Y. Li, J. Jiang, and Y. Ding, “Multi-agent deep rein- [54] S. Joshi, S. Kumra, and F. Sahin, “Robotic grasping using deep
forcement learning based demand response for discrete manufacturing reinforcement learning,” in 2020 IEEE 16th International Conference
systems energy management,” Applied Energy, vol. 276, p. 115473, on Automation Science and Engineering (CASE). IEEE, 2020, pp.
2020. 1461–1466.
[34] X. Zhang, R. Lu, J. Jiang, S. H. Hong, and W. S. Song, “Testbed [55] D. Wang, H. Deng, and Z. Pan, “Mrcdrl: Multi-robot coordination with
implementation of reinforcement learning-based demand response deep reinforcement learning,” Neurocomputing, vol. 406, pp. 68–76,
energy management system,” Applied energy, vol. 297, p. 117131, 2020.
2021. [56] X. Lan, Y. Qiao, and B. Lee, “Towards pick and place multi
[35] T. A. Nakabi and P. Toivanen, “Deep reinforcement learning for energy robot coordination using multi-agent deep reinforcement learning,”
management in a microgrid with flexible demand,” Sustainable Energy, in 2021 7th International Conference on Automation, Robotics and
Grids and Networks, vol. 25, p. 100413, 2021. Applications (ICARA). IEEE, 2021, pp. 85–89.
[36] R. Hu and A. Kwasinski, “Energy management for microgrids using a [57] A. Ghadirzadeh, X. Chen, W. Yin, Z. Yi, M. Björkman, and D. Kragic,
reinforcement learning algorithm,” in 2021 IEEE Green Energy and “Human-centered collaborative robots with deep reinforcement learn-
Smart Systems Conference (IGESSC). IEEE, 2021, pp. 1–6. ing,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 566–571,
[37] B. Zhang, Z. Chen, and A. M. Ghias, “Deep reinforcement learning- 2020.
based energy management strategy for a microgrid with flexible loads,” [58] A. Iucci, A. Hata, A. Terra, R. Inam, and I. Leite, “Explainable
in 2023 International Conference on Power Energy Systems and reinforcement learning for human-robot collaboration,” in 2021 20th
Applications (ICoPESA). IEEE, 2023, pp. 187–191. International Conference on Advanced Robotics (ICAR). IEEE, 2021,
[38] W. Zhang, H. Qiao, X. Xu, J. Chen, J. Xiao, K. Zhang, Y. Long, pp. 927–934.
and Y. Zuo, “Energy management in microgrid based on deep rein- [59] A. Shafti, J. Tjomsland, W. Dudley, and A. A. Faisal, “Real-world
forcement learning with expert knowledge,” in International Workshop human-robot collaborative reinforcement learning,” in 2020 IEEE/RSJ
on Automation, Control, and Communication Engineering (IWACCE International Conference on Intelligent Robots and Systems (IROS).
2022), vol. 12492. SPIE, 2022, pp. 275–284. IEEE, 2020, pp. 11 161–11 166.
[39] A. Shojaeighadikolaei, A. Ghasemi, A. G. Bardas, R. Ahmadi, and [60] J. Thumm, F. Trost, and M. Althoff, “Human-robot gym: Benchmarking
M. Hashemi, “Weather-aware data-driven microgrid energy manage- reinforcement learning in human-robot collaboration,” arXiv preprint
ment using deep reinforcement learning,” in 2021 North American arXiv:2310.06208, 2023.
Power Symposium (NAPS). IEEE, 2021, pp. 1–6. [61] M. El-Shamouty, X. Wu, S. Yang, M. Albus, and M. F. Huber, “Towards
[40] Y. Du and F. Li, “Intelligent multi-microgrid energy management based safe human-robot collaboration using deep reinforcement learning,”
on deep neural network and model-free reinforcement learning,” IEEE in 2020 IEEE international conference on robotics and automation
Transactions on Smart Grid, vol. 11, no. 2, pp. 1066–1076, 2019. (ICRA). IEEE, 2020, pp. 4899–4905.
[41] T. Yang, L. Zhao, W. Li, and A. Y. Zomaya, “Reinforcement learning [62] Z. Cai, Z. Feng, L. Zhou, C. Ai, H. Shao, X. Yang et al., “A
in sustainable energy and electric systems: A survey,” Annual Reviews framework and algorithm for human-robot collaboration based on
in Control, vol. 49, pp. 145–163, 2020. multimodal reinforcement learning,” Computational Intelligence and
[42] D. Cao, W. Hu, J. Zhao, G. Zhang, B. Zhang, Z. Liu, Z. Chen, and Neuroscience, vol. 2022, 2022.
F. Blaabjerg, “Reinforcement learning and its applications in modern [63] J. Wang, T. Zhang, N. Ma, Z. Li, H. Ma, F. Meng, and M. Q.-H.
power and energy systems: A review,” Journal of modern power Meng, “A survey of learning-based robot motion planning,” IET Cyber-
systems and clean energy, vol. 8, no. 6, pp. 1029–1042, 2020. Systems and Robotics, vol. 3, no. 4, pp. 302–314, 2021.
[43] X. Chen, G. Qu, Y. Tang, S. Low, and N. Li, “Reinforcement learning [64] D. Han, B. Mulyana, V. Stankovic, and S. Cheng, “A survey on deep
for selective key applications in power systems: Recent advances and reinforcement learning algorithms for robotic manipulation,” Sensors,
future challenges,” IEEE Transactions on Smart Grid, vol. 13, no. 4, vol. 23, no. 7, p. 3762, 2023.
pp. 2935–2958, 2022. [65] P. Rivera, J. Oh, E. Valarezo, G. Ryu, H. Jung, J. H. Lee, J. G. Jeong,
[44] K. Sivamayil, E. Rajasekar, B. Aljafari, S. Nikolovski, S. Vairavasun- and T.-S. Kim, “Reward shaping to learn natural object manipulation
daram, and I. Vairavasundaram, “A systematic study on reinforcement with an anthropomorphic robotic hand and hand pose priors via on-
learning based applications,” Energies, vol. 16, no. 3, p. 1512, 2023. policy reinforcement learning,” in 2021 International Conference on
[45] X. Zhong, Z. Zhang, R. Zhang, and C. Zhang, “End-to-end deep Information and Communication Technology Convergence (ICTC).
reinforcement learning control for hvac systems in office buildings,” IEEE, 2021, pp. 167–171.
Designs, vol. 6, no. 3, p. 52, 2022. [66] B. Beigomi and Z. H. Zhu, “Enhancing robotic grasping of free-floating
[46] S. Sierla, H. Ihasalo, and V. Vyatkin, “A review of reinforcement learn- targets with soft actor-critic algorithm and tactile sensors: a focus on
ing applications to control of heating, ventilation and air conditioning the pre-grasp stage,” in AIAA SCITECH 2024 Forum, 2024, p. 2419.
systems,” Energies, vol. 15, no. 10, p. 3526, 2022. [67] X. Yang, “Reinforcement learning for multi-robot system: A review,” in
[47] H.-Y. Liu, B. Balaji, S. Gao, R. Gupta, and D. Hong, “Safe hvac control 2021 2nd International Conference on Computing and Data Science
via batch reinforcement learning,” in 2022 ACM/IEEE 13th Interna- (CDS). IEEE, 2021, pp. 203–213.
tional Conference on Cyber-Physical Systems (ICCPS). IEEE, 2022, [68] X. Lan, Y. Qiao, and B. Lee, “Coordination of a multi robot
pp. 181–192. system for pick and place using reinforcement learning,” in 2022 2nd
International Conference on Computers and Automation (CompAuto). [92] A. Alharin, T.-N. Doan, and M. Sartipi, “Reinforcement learning
IEEE, 2022, pp. 87–92. interpretation methods: A survey,” IEEE Access, vol. 8, pp. 171 058–
[69] A. K. Sadhu and A. Konar, Multi-agent coordination: A reinforcement 171 077, 2020.
learning approach. John Wiley & Sons, 2020. [93] W. Shi, G. Huang, S. Song, Z. Wang, T. Lin, and C. Wu, “Self-
[70] M. Khamassi, “Adaptive coordination of multiple learning strategies in supervised discovering of interpretable features for reinforcement
brains and robots,” in Theory and Practice of Natural Computing: 9th learning,” IEEE Transactions on Pattern Analysis and Machine
International Conference, TPNC 2020, Taoyuan, Taiwan, December Intelligence, vol. 44, no. 5, pp. 2712–2724, 2020.
7–9, 2020, Proceedings 9. Springer, 2020, pp. 3–22. [94] G. Dao, W. H. Huff, and M. Lee, “Learning sparse evidence-driven
[71] T. Cao, “Study of sample efficiency improvements for reinforcement interpretation to understand deep reinforcement learning agents,” in
learning algorithms,” in 2020 IEEE Integrated STEM Education 2021 IEEE Symposium Series on Computational Intelligence (SSCI).
Conference (ISEC). IEEE, 2020, pp. 1–1. IEEE, 2021, pp. 1–7.
[72] F. E. Dorner, “Measuring progress in deep reinforcement learning [95] T. Hospedales, A. Antoniou, P. Micaelli, and A. Storkey, “Meta-
sample efficiency,” arXiv preprint arXiv:2102.04881, 2021. learning in neural networks: A survey,” IEEE transactions on pattern
[73] K. Suri, X. Q. Shi, K. N. Plataniotis, and Y. A. Lawryshyn, “Maximum analysis and machine intelligence, vol. 44, no. 9, pp. 5149–5169, 2021.
mutation reinforcement learning for scalable control,” arXiv preprint [96] Y. Guo, Q. Wu, and H. Lee, “Learning action translator for meta
arXiv:2007.13690, 2020. reinforcement learning on sparse-reward tasks,” in Proceedings of the
[74] D. Yang, X. Qin, X. Xu, C. Li, and G. Wei, “Sample efficient AAAI Conference on Artificial Intelligence, vol. 36, no. 6, 2022, pp.
reinforcement learning method via high efficient episodic memory,” 6792–6800.
IEEE Access, vol. 8, pp. 129 274–129 284, 2020. [97] S. Narvekar, B. Peng, M. Leonetti, J. Sinapov, M. E. Taylor, and
[75] P. J. Ball, L. Smith, I. Kostrikov, and S. Levine, “Efficient online P. Stone, “Curriculum learning for reinforcement learning domains:
reinforcement learning with offline data,” in International Conference A framework and survey,” Journal of Machine Learning Research,
on Machine Learning. PMLR, 2023, pp. 1577–1594. vol. 21, no. 181, pp. 1–50, 2020.
[76] G. Li, Y. Wei, Y. Chi, Y. Gu, and Y. Chen, “Breaking the sample [98] N. S. Varma, V. Sinha et al., “Effective reinforcement learning using
size barrier in model-based reinforcement learning with a generative transfer learning,” in 2022 IEEE International Conference on Data
model,” Advances in neural information processing systems, vol. 33, Science and Information System (ICDSIS). IEEE, 2022, pp. 1–6.
pp. 12 861–12 872, 2020. [99] R. Sasso, “Multi-source transfer learning for deep model-based
[77] A. Ly, R. Dazeley, P. Vamplew, F. Cruz, and S. Aryal, “Elastic step ddpg: reinforcement learning,” Ph.D. dissertation, 2021.
Multi-step reinforcement learning for improved sample efficiency,” in [100] Z. Ren, A. Liu, Y. Liang, J. Peng, and J. Ma, “Efficient meta
2023 International Joint Conference on Neural Networks (IJCNN). reinforcement learning for preference-based fast adaptation,” Advances
IEEE, 2023, pp. 01–06. in Neural Information Processing Systems, vol. 35, pp. 15 502–15 515,
[78] Z. Wang, J. Wang, Q. Zhou, B. Li, and H. Li, “Sample-efficient 2022.
reinforcement learning via conservative model-based actor-critic,” in [101] G. Dulac-Arnold, N. Levine, D. J. Mankowitz, J. Li, C. Paduraru,
Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, S. Gowal, and T. Hester, “An empirical investigation of the challenges of
no. 8, 2022, pp. 8612–8620. real-world reinforcement learning,” arXiv preprint arXiv:2003.11881,
2020.
[79] H. Xiong and X. Diao, “Safety robustness of reinforcement learning
[102] T. Matsushima, H. Furuta, Y. Matsuo, O. Nachum, and S. Gu,
policies: A view from robust control,” Neurocomputing, vol. 422, pp.
“Deployment-efficient reinforcement learning via model-based offline
12–21, 2021.
optimization,” arXiv preprint arXiv:2006.03647, 2020.
[80] Z. Li, C. Hu, Y. Wang, Y. Yang, and S. E. Li, “Safe reinforcement
[103] A. H. Yahmed, A. A. Abbassi, A. Nikanjam, H. Li, and F. Khomh,
learning with dual robustness,” arXiv preprint arXiv:2309.06835,
“Deploying deep reinforcement learning systems: A taxonomy of
2023.
challenges,” in 2023 IEEE International Conference on Software
[81] Z. Liu, Z. Guo, Z. Cen, H. Zhang, J. Tan, B. Li, and D. Zhao, Maintenance and Evolution (ICSME). IEEE, 2023, pp. 26–38.
“On the robustness of safe reinforcement learning under observational [104] Z. Li, K. Xu, L. Liu, L. Li, D. Ye, and P. Zhao, “Deploying
perturbations,” arXiv preprint arXiv:2205.14691, 2022. offline reinforcement learning with human feedback,” arXiv preprint
[82] Y. Emam, G. Notomista, P. Glotfelter, Z. Kira, and M. Egerstedt, “Safe arXiv:2303.07046, 2023.
reinforcement learning using robust control barrier functions,” IEEE [105] J. J. Garau-Luis, E. Crawley, and B. Cameron, “Evaluating the
Robotics and Automation Letters, 2022. progress of deep reinforcement learning in the real world: align-
[83] Y. Li, N. Li, H. E. Tseng, A. Girard, D. Filev, and I. Kolmanovsky, “Safe ing domain-agnostic and domain-specific research,” arXiv preprint
reinforcement learning using robust action governor,” in Learning for arXiv:2107.03015, 2021.
Dynamics and Control. PMLR, 2021, pp. 1093–1104. [106] A. Kanso and K. Patra, “Engineering a platform for reinforcement
[84] M. Zanon and S. Gros, “Safe reinforcement learning using robust mpc,” learning workloads,” in Proceedings of the 1st International Con-
IEEE Transactions on Automatic Control, vol. 66, no. 8, pp. 3638– ference on AI Engineering: Software Engineering for AI, 2022, pp.
3652, 2020. 88–89.
[85] J. Queeney, E. C. Ozcan, I. C. Paschalidis, and C. G. Cassandras, [107] J. Garcıa and F. Fernández, “A comprehensive survey on safe rein-
“Optimal transport perturbations for safe reinforcement learning with forcement learning,” Journal of Machine Learning Research, vol. 16,
robustness guarantees,” arXiv preprint arXiv:2301.13375, 2023. no. 1, pp. 1437–1480, 2015.
[86] M. A. Rahman and S. Alqahtani, “Task-agnostic safety for rein- [108] M. Turchetta, A. Kolobov, S. Shah, A. Krause, and A. Agarwal, “Safe
forcement learning,” in Proceedings of the 16th ACM Workshop on reinforcement learning via curriculum induction,” Advances in Neural
Artificial Intelligence and Security, 2023, pp. 139–148. Information Processing Systems, vol. 33, pp. 12 151–12 162, 2020.
[87] X. Wang, S. Nair, and M. Althoff, “Falsification-based robust ad-
versarial reinforcement learning,” in 2020 19th IEEE International
Conference on Machine Learning and Applications (ICMLA). IEEE,
2020, pp. 205–212.
[88] C. Glanois, P. Weng, M. Zimmer, D. Li, T. Yang, J. Hao, and W. Liu,
“A survey on interpretable reinforcement learning,” arXiv preprint
arXiv:2112.13112, 2021.
[89] D. Xu and F. Fekri, “Interpretable model-based hierarchical reinforce-
ment learning using inductive logic programming,” arXiv preprint
arXiv:2106.11417, 2021.
[90] Y. Mansour, M. Moshkovitz, and C. Rudin, “There is no accuracy-
interpretability tradeoff in reinforcement learning for mazes,” arXiv
preprint arXiv:2206.04266, 2022.
[91] M. K. Eckstein, L. Wilbrecht, and A. G. Collins, “What do rein-
forcement learning models measure? interpreting model parameters in
cognition and neuroscience,” Current opinion in behavioral sciences,
vol. 41, pp. 128–137, 2021.

1997 2001 Prelude Electrical Troubleshooting Manua
No ratings yet
1997 2001 Prelude Electrical Troubleshooting Manua
427 pages
Transformer Test - Report
No ratings yet
Transformer Test - Report
9 pages
Manual (24 Abs 318-360)
100% (1)
Manual (24 Abs 318-360)
66 pages
Keeney & Raiffa. (1976) - Decisions With Multiple Objectives
No ratings yet
Keeney & Raiffa. (1976) - Decisions With Multiple Objectives
2 pages
A Review of Reinforcement Learning Based Intelligent Optimization for Manufacturing Scheduling (1)
No ratings yet
A Review of Reinforcement Learning Based Intelligent Optimization for Manufacturing Scheduling (1)
14 pages
Panzer 2021 - Deep Reinforcement Learning In Production Planning And Control A Systematic Literature Review - CPSL2021
No ratings yet
Panzer 2021 - Deep Reinforcement Learning In Production Planning And Control A Systematic Literature Review - CPSL2021
11 pages
Survey on Evaluation of LLM-based Agents
No ratings yet
Survey on Evaluation of LLM-based Agents
20 pages
11
No ratings yet
11
15 pages
2402.14558v1 (1)
No ratings yet
2402.14558v1 (1)
25 pages
Multi Agent Reinforcement Learning a Rev
No ratings yet
Multi Agent Reinforcement Learning a Rev
25 pages
Applying_Quantitative_Model_Checking_to_Analyze_Safety_in_Reinforcement_Learning
No ratings yet
Applying_Quantitative_Model_Checking_to_Analyze_Safety_in_Reinforcement_Learning
15 pages
2504.01963v1
No ratings yet
2504.01963v1
12 pages
New Solutions On LLM Acceleration Optimization
No ratings yet
New Solutions On LLM Acceleration Optimization
12 pages
Bioengineering 10 01410
No ratings yet
Bioengineering 10 01410
21 pages
Deep Reinforcement Learning for Autonomous Driving a Survey
No ratings yet
Deep Reinforcement Learning for Autonomous Driving a Survey
18 pages
A Survey On LLM-Based Agents: Common Workflows and Reusable LLM-Profiled Components
No ratings yet
A Survey On LLM-Based Agents: Common Workflows and Reusable LLM-Profiled Components
20 pages
OR_Ebook_1_47514
No ratings yet
OR_Ebook_1_47514
56 pages
Unlocking Underrepresented Use-Cases For Large Language Model-Driven Human-Robot Task Planning
No ratings yet
Unlocking Underrepresented Use-Cases For Large Language Model-Driven Human-Robot Task Planning
15 pages
Object Oriented Analysis is It Just Theo-6
No ratings yet
Object Oriented Analysis is It Just Theo-6
8 pages
survey
No ratings yet
survey
23 pages
3712003
No ratings yet
3712003
29 pages
GNBG: A Generalized and Configurable Benchmark Generator For Continuous Numerical Optimization
No ratings yet
GNBG: A Generalized and Configurable Benchmark Generator For Continuous Numerical Optimization
22 pages
Deep Reinforcement Learning For Autonomous Driving A Survey
No ratings yet
Deep Reinforcement Learning For Autonomous Driving A Survey
18 pages
2411.15594v1
No ratings yet
2411.15594v1
33 pages
Electronics 13 01459
No ratings yet
Electronics 13 01459
17 pages
a case study on LLM agents and RAG
No ratings yet
a case study on LLM agents and RAG
18 pages
Landscape-Aware Performance Prediction For Evolutionary Multi-Objective Optimization
No ratings yet
Landscape-Aware Performance Prediction For Evolutionary Multi-Objective Optimization
15 pages
15703wPg#s
No ratings yet
15703wPg#s
70 pages
A Comprehensive Survey of Multi-Agent Reinforcement Learning
No ratings yet
A Comprehensive Survey of Multi-Agent Reinforcement Learning
18 pages
Applied Mathematical Modelling: Hakyeon Lee, Moon-Soo Kim, Yongtae Park
No ratings yet
Applied Mathematical Modelling: Hakyeon Lee, Moon-Soo Kim, Yongtae Park
13 pages
A_Survey_on_Distributed_Reinforcement_Learning
No ratings yet
A_Survey_on_Distributed_Reinforcement_Learning
7 pages
From LLMs To LLM Based Agents For Software Engineering 1723301316
100% (1)
From LLMs To LLM Based Agents For Software Engineering 1723301316
42 pages
Field Guide to Automatic Evaluation
No ratings yet
Field Guide to Automatic Evaluation
5 pages
A Systematic Survey and Critical Review on Evaluating Large Language Models- Challenges, Limitations, and Recommendations
No ratings yet
A Systematic Survey and Critical Review on Evaluating Large Language Models- Challenges, Limitations, and Recommendations
32 pages
A Comprehensive Survey of Multiagent
No ratings yet
A Comprehensive Survey of Multiagent
17 pages
LLM4RE_Final_submitted1
No ratings yet
LLM4RE_Final_submitted1
25 pages
Non-Functional Requirements For Machine Learning: Challenges and New Directions
No ratings yet
Non-Functional Requirements For Machine Learning: Challenges and New Directions
6 pages
1 s2.0 S2666720723001170 Main
No ratings yet
1 s2.0 S2666720723001170 Main
21 pages
Advancements in Reinforcement Learning Algorithms For Autonomous Systems
No ratings yet
Advancements in Reinforcement Learning Algorithms For Autonomous Systems
6 pages
ARTICLEONnlp
No ratings yet
ARTICLEONnlp
18 pages
E Nhancing C Luster R Esilience
No ratings yet
E Nhancing C Luster R Esilience
12 pages
IMPPPP- Auditing Large Language Model- A Three‑Layered Approach
No ratings yet
IMPPPP- Auditing Large Language Model- A Three‑Layered Approach
31 pages
2012 Psychogios
No ratings yet
2012 Psychogios
18 pages
A Systematic Literature Review On Large Language Models
No ratings yet
A Systematic Literature Review On Large Language Models
39 pages
Large Language Models As Surrogate Models in Evolutionary Algorithms - A Preliminary Study
No ratings yet
Large Language Models As Surrogate Models in Evolutionary Algorithms - A Preliminary Study
15 pages
Prof OrG Intro1
No ratings yet
Prof OrG Intro1
7 pages
Comprehensive Survey of Reinforcement Learning From Algorithms to Practical Challenges
No ratings yet
Comprehensive Survey of Reinforcement Learning From Algorithms to Practical Challenges
79 pages
11 Requirements Engineering in Machine Learning Projects
No ratings yet
11 Requirements Engineering in Machine Learning Projects
23 pages
The Challenges of Evaluating LLM Applications
No ratings yet
The Challenges of Evaluating LLM Applications
7 pages
Requirements Engineering in Scrum Framework 1jw24i520j
No ratings yet
Requirements Engineering in Scrum Framework 1jw24i520j
6 pages
Reasoning Language Models- A Blueprint
No ratings yet
Reasoning Language Models- A Blueprint
44 pages
Explainable Artificial Intelligence XAI Techniques For Energy - 2022 - Energy
No ratings yet
Explainable Artificial Intelligence XAI Techniques For Energy - 2022 - Energy
13 pages
LLM Compression Techniques
No ratings yet
LLM Compression Techniques
14 pages
Metaheuristic_algorithms_and_their_applications
No ratings yet
Metaheuristic_algorithms_and_their_applications
32 pages
2503.16219v1
No ratings yet
2503.16219v1
17 pages
2405.11874v3
No ratings yet
2405.11874v3
43 pages
From_LLMs to_LLM_based_Agents
No ratings yet
From_LLMs to_LLM_based_Agents
42 pages
Beyond Self-Talk- A Communication-Centric Survey of LLM-Based Multi-Agent Systems
No ratings yet
Beyond Self-Talk- A Communication-Centric Survey of LLM-Based Multi-Agent Systems
9 pages
Prompting LLM for RecSys
No ratings yet
Prompting LLM for RecSys
46 pages
Operational Research Notes
No ratings yet
Operational Research Notes
4 pages
Agile Software Requirements Engineering Challenges
No ratings yet
Agile Software Requirements Engineering Challenges
19 pages
Ray A Distributed Framework
No ratings yet
Ray A Distributed Framework
17 pages
Deep Reinforcement Learning: An Essential Guide
From Everand
Deep Reinforcement Learning: An Essential Guide
Robert Johnson
No ratings yet
Anandu PPT 1
No ratings yet
Anandu PPT 1
11 pages
Semiconductor
No ratings yet
Semiconductor
36 pages
A Review of Heat Pump Drying Part 1 - Systems, Models and Studies
No ratings yet
A Review of Heat Pump Drying Part 1 - Systems, Models and Studies
7 pages
Solis-1P (4-5) K-4G: Solis Single Phase Inverters
No ratings yet
Solis-1P (4-5) K-4G: Solis Single Phase Inverters
2 pages
Solar Resource Solutions: Complete Package
No ratings yet
Solar Resource Solutions: Complete Package
8 pages
Power Plants Layout MWM 06 2019 en
100% (1)
Power Plants Layout MWM 06 2019 en
286 pages
Gard Analog Transmitter: Description
No ratings yet
Gard Analog Transmitter: Description
3 pages
WS1.5, WS2, WS2L Service Manual
No ratings yet
WS1.5, WS2, WS2L Service Manual
36 pages
Report Book For Industrial 2021)
No ratings yet
Report Book For Industrial 2021)
14 pages
Alternator
No ratings yet
Alternator
3 pages
REPSOL GXR EURO 5W40 v1 1
No ratings yet
REPSOL GXR EURO 5W40 v1 1
1 page
SRS Annual Report 2020-21
No ratings yet
SRS Annual Report 2020-21
268 pages
Lot W17 Beme Supply and Installation of 7nos Transformers
No ratings yet
Lot W17 Beme Supply and Installation of 7nos Transformers
13 pages
Pyq Test 03 - Test Paper and Hints & Solutions English || Parakh Pyq Neet 2025 Test Series
No ratings yet
Pyq Test 03 - Test Paper and Hints & Solutions English || Parakh Pyq Neet 2025 Test Series
65 pages
Manual MPC
No ratings yet
Manual MPC
1 page
Agolar PCB Catalogue 2022-23
No ratings yet
Agolar PCB Catalogue 2022-23
41 pages
Modeling, Validation and Simulation of Electric Vehicles Using MATLAB
No ratings yet
Modeling, Validation and Simulation of Electric Vehicles Using MATLAB
11 pages
India's Solar Power Dream
No ratings yet
India's Solar Power Dream
7 pages
Magnet Skew in Cogging Torque Minimization of Axial Gap Permanent Magnet Motors
No ratings yet
Magnet Skew in Cogging Torque Minimization of Axial Gap Permanent Magnet Motors
6 pages
Gennerator Maintenance Chart
No ratings yet
Gennerator Maintenance Chart
2 pages
SEVEN ENERGI BROSUR PJU + Project - 230223
No ratings yet
SEVEN ENERGI BROSUR PJU + Project - 230223
10 pages
Graphitic Furnace Production
No ratings yet
Graphitic Furnace Production
14 pages
A Deep Learning Bidirectional Long-Short Term Memory Model For Short-Term Wind Speed Forecasting
No ratings yet
A Deep Learning Bidirectional Long-Short Term Memory Model For Short-Term Wind Speed Forecasting
7 pages
H2S Scavenger
No ratings yet
H2S Scavenger
7 pages
IGCSE Electricity
No ratings yet
IGCSE Electricity
13 pages
Fronius_Wattpilot_EN_AU-Manual
No ratings yet
Fronius_Wattpilot_EN_AU-Manual
80 pages
Rayleigh Flow - 1
No ratings yet
Rayleigh Flow - 1
16 pages

A Survey of Reinforcement Learning for Optimization in Automation

Uploaded by

A Survey of Reinforcement Learning for Optimization in Automation

Uploaded by

A Survey of Reinforcement Learning for Optimization in Automation

Ahmad Farooq*1 and Kamran Iqbal2

Superiority over traditional Comprehensive roadmap Adaptation of RL for

TABLE I: Comparison of RL Approaches for Optimization in Manufacturing

TABLE II: Comparison of RL Approaches for Optimization in Energy Systems

TABLE III: Comparison of RL Approaches for Optimization in Robotics

You might also like