Special Issue On Multi-Objective Reinforcement Learning

This editorial introduces a special issue of the journal Neurocomputing focused on multi-objective reinforcement learning (MORL). MORL aims to solve sequential decision problems with multiple conflicting objectives. The special issue contains 7 papers that propose new MORL algorithms and evaluate them on benchmark environments with 2-3 objectives. The papers explore using variants of Q-learning, reward shaping, and other techniques to learn Pareto fronts of optimal policies for multi-objective problems.

Uploaded by

master Q

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views

Special Issue On Multi-Objective Reinforcement Learning

Uploaded by

master Q

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Neurocomputing 263 (2017) 1–2

Contents lists available at ScienceDirect

Neurocomputing
journal homepage: www.elsevier.com/locate/neucom

Editorial

Special issue on multi-objective reinforcement learning

Madalina Drugan a, Marco Wiering b, Peter Vamplew c,∗, Madhu Chetty c
a
Technical University of Eindhoven, Eindhoven, Netherlands
b
Institute of Artiﬁcial Intelligence and Cognitive Engineering, University of Groningen, Groningen, Netherlands
c
School of Engineering and Information Technology, Federation University, Ballarat, Australia

a r t i c l e i n f o a b s t r a c t

Article history: Many real-life problems involve dealing with multiple objectives. For example, in network routing the
Received 2 June 2017 criteria may consist of energy consumption, latency, and channel capacity, which are in essence conflict-
Accepted 3 June 2017
ing objectives. As in many problems there may be multiple (conflicting) objectives, there usually does
Available online 26 June 2017
not exist a single optimal solution. In those cases, it is desirable to obtain a set of trade-off solutions be-
tween the objectives. This problem has in the last decade also gained the attention of many researchers in
the field of reinforcement learning (RL). RL addresses sequential decision problems in initially (possibly)
unknown stochastic environments. The goal is the maximization of the agent’s reward in an environ-
ment that is not always completely observable. The purpose of this special issue is to obtain a broader
picture on the algorithmic techniques at the confluence between multi-objective optimization and rein-
forcement learning. The growing interest in multi-objective reinforcement learning (MORL) was reflected
in the quantity and quality of submissions received for this special issue. After a rigorous review process,
seven papers were accepted for publication, and they reflect the diversity of research being carried out
within this emerging field of research. The accepted papers consider many different aspects of algorith-
mic design and the evaluation and this editorial puts them in a unified framework.
© 2017 Elsevier B.V. All rights reserved.

1. Test environment scheduling electricity generators to meet the customers’ demand

while minimizing the fuel cost and emissions. Resource Gathering
The practical motivation, such as novel approaches for challeng- [2] and Predator Prey [5] are three objective environments with
ing real-world applications or developing new algorithms with an stochastic transitions which are related to strategic games. Two of
improved computational efficiency for particular problems, is es- the above multi-objective environments have stochastic transition
sential for any proposed technique. In this issue, all the papers use functions [1,2]; the other environments are deterministic. In [4], an
benchmark environments with two or three objectives. The Deep agent navigates through a maze with continuous states that con-
Sea Treasure task [2,3,6] is a bi-objective environment consisting tains obstacles and different kinds of areas. The problem has one
of ten Pareto-optimal states, which has often been used for testing primary objective, while other secondary objectives are found with
MORL algorithms. The Bonus World used in [7] is an original three an unsupervised learning method which are subsequently solved
objective environment. Another bi-objective environment that has with off-policy RL techniques.
been used to evaluate a novel multi-objective RL algorithm is the
Linked Rings problem [3]. Some of the used environments con- 2. The methodological approach
sist of continuous state variables. The Cart Pole problem [5] has
two objectives with continuous state values that reflect the posi- Many of the proposed MORL algorithms use variants of the
tion and velocity of the cart and the angle and the angular veloc- Q-learning algorithm [2–7]. In [5], multi-objectivization is used to
ity of the pole. The Water Reservoir problem [1] models an agent create additional objectives next to solving the primary goal in or-
controlling the water level from a reservoir with three conflict- der to improve the empirical efficiency. The objectives are assumed
ing objectives, which are flooding, water and electricity demand. to be independent, and Q-values for each objective are learned
The Dynamic Economic Emissions Dispatcher problem [6] involves in parallel. On top of the multi-objectivization mechanism, reward
shaping is used to incorporate heuristical knowledge. The goal is to
∗
learn the Pareto front of optimal policies. The algorithm proposed
Corresponding author.
E-mail addresses: [email protected] (M. Drugan),
in [6] uses scalarization functions and the hypervolume unary in-
[email protected] (M. Wiering), [email protected] (P. Vamplew), dicator to transform the reward vectors into scalar reward val-
[email protected] (M. Chetty). ues. Similarly to [5], the goal is to identify the Pareto front of

optimal policies when additional rewards are added to each objec- Q-learning can be used to learn, at least partially, the values as-
tive through reward shaping functions. The hypervolume unary in- sociated with these additional objectives in parallel with learn-
dicator is also used in [1] to measure the performance of a policy- ing to solve the primary goal, thereby minimizing the need for
search MORL algorithm. The empirical performance is improved additional exploration in case the goal would change.
using multiple importance sampling estimators. In [3], the authors • “Multi-objectivization and Ensembles of Shapings in Reinforce-
use a variant of geometric steering for multi-objective stochas- ment Learning” by Tim Brys, Anna Harutyunyan, Peter Vrancx,
tic games with scalarized reward vectors. The MORL algorithm in Matthew Taylor and Ann Nowé: This paper examines the use
[4] is an interesting mixture of on-line learning for the first objec- of multi-objectivization to improve the performance of a rein-
tive and off-line learning for two independently found secondary forcement learning agent on a single-objective task. Additional
objectives. The secondary objectives are found using unsupervised objectives are introduced either by decomposition of the orig-
learning and their corresponding learned policies are useful when inal objective or based on external heuristic knowledge. This
the primary task changes in the environment. In [7], one objective introduces an additional source of diversity, which supports
is considered more important than the second objective with so- the use of ensemble methods which significantly improve the
called lexicographic ordering, and to solve this problem the RL al- learning performance.
gorithm is integrated with new variants of the softmax exploration
The final two papers in the issue examine how methods which
strategy. In another line of reasoning, in [2] the authors use Pareto
are widely used in single-objective reinforcement learning can be
dominance to partially order policies. Not one, but several policies
applied in the context of multiobjective reinforcement learning.
with associated Q-value vectors are simultaneously optimized.
• “Policy Invariance under Reward Transformations for Multi-
3. Theoretical analysis Objective Reinforcement Learning” by Patrick Mannion, Sam
Devlin, Karl Mason, Jim Duggan and Enda Howley: Potential-
There are only some papers in this special issue that give the- Based Reward Shaping (PBRS) has been shown to be an ef-
oretical guarantees on the expected behavior of the algorithms. fective means of accelerating learning in single-objective prob-
In this issue, in [5] and [6] proofs are provided for the convergence lems, with proven guarantees that it does not interfere with the
of MORL variants with reward shaping functions. final optimal policy. This paper extends these theoretical guar-
antees to the case of multiple objectives, for both single-agent
and multi-agent systems. It also provides the first empirical re-
4. Short summary of papers in the current issue
sults for the use of PBRS within multiobjective reinforcement
learning.
The first three papers propose and evaluate the performance
• “Softmax Exploration Strategies for Multiobjective Reinforce-
of reinforcement learning algorithms designed specifically for tasks
ment Learning” by Peter Vamplew, Richard Dazeley, and
involving multiple conflicting objectives.
Cameron Foale: The effectiveness of exploration strategies has
• “Manifold-Based Multi-Objective Policy Search with Sample been widely studied in single-objective reinforcement learning,
Reuse” by Simone Parisi, Matteo Pirotta, and Jan Peters: but this paper provides one of the first intensive studies of
This paper extends prior approaches to policy-search learn- these techniques in the context of multiple objectives, show-
ing of multiobjective policies by learning a manifold in policy- ing that unexpected complications may arise due to the in-
parameter space. Sampling points on this manifold can produce troduction of additional objectives. It also proposes and evalu-
policies which accurately approximate the Pareto front of poli- ates two multiobjective adaptations of the widely used softmax
cies, which is more efficient than directly learning a set of these approach to exploration.
policies.
• “A Temporal Difference Method for Multi-Objective Reinforce- Acknowledgments
ment Learning” by Manuela Ruiz-Montiel, Lawrence Mandow
and José-Luis Pérez-de-la-Cruz: Like Parisi et al., this work ad- We would like to thank all of the authors who submitted their
dresses the task of learning multiple policies which represent work for this issue, as well as the reviewers who generously gave
different Pareto-optimal tradeoffs between objectives. However, their time and expertise during the review process. We also wish
rather than policy-search, this paper extends the temporal- to thank the editors of Neurocomputing who supervised an in-
difference Q-learning algorithm to the task of learning multiple dependent review process for those papers for which we had a
Pareto-optimal policies. conflict of interest.
• “Steering Approaches to Pareto-Optimal Multiobjective Rein- References
forcement Learning” by Peter Vamplew, Rustam Issabekov,
Richard Dazeley, Cameron Foale, Adam Berry, Tim Moore, and [1] S. Parisi, M. Pirotta, J. Peters, Manifold-based multi-objective policy search
Douglas Creighton: This paper adapts the geometric steering al- with sample reuse, Neurocomputing (2017) Special issue on multi-objective
reinforcement learning.
gorithm, originally designed for stochastic multi-criteria games, [2] M. Ruiz-Montiel, L. Mandow, J.-L. Pérez-de-la-Cruz, A temporal difference
to learning Pareto-optimal non-stationary policies for multiob- method for multi-objective reinforcement learning, Neurocomputing (2017)
jective Markov Decision Processes. It also provides an example Special issue on multi-objective reinforcement learning.
[3] P. Vamplew, R. Issabekov, R. Dazeley, C. Foale, A. Berry, T. Moore, D. Creighton,
of the application of the steering approach to the problem of Steering approaches to Pareto-optimal multiobjective reinforcement learning,
controlling local battery storage for a household’s solar power Neurocomputing (2017) Special issue on multi-objective reinforcement learning.
system. [4] T.G. Karimpanal, E. Wilhelm, Identification and off-policy learning of multiple
objectives using adaptive clustering, Neurocomputing (2017) Special issue on
The next two papers address the incorporation of additional multi-objective reinforcement learning.
[5] T. Brys, A. Harutyunyan, P. Vrancx, M. Taylor, A. Nowé, Multi-objectivization and
objectives into an existing reinforcement learning task. ensembles of shapings in reinforcement learning, Neurocomputing (2017) Spe-
cial issue on multi-objective reinforcement learning.
• “Identification and Off-Policy Learning of Multiple Objectives [6] P. Mannion, S. Devlin, K. Mason, J. Duggan, E. Howley, Policy invariance under
Using Adaptive Clustering” by Thommen Karimpanal George reward transformations for multi-objective reinforcement learning, Neurocom-
and Erik Wilhelm: In this paper additional objectives are dis- puting (2017) Special issue on multi-objective reinforcement learning.
[7] P. Vamplew, R. Dazeley, C. Foale, Softmax exploration strategies for multiobjec-
covered by the agent itself during its exploration of the envi- tive reinforcement learning, Neurocomputing (2017) Special issue on multi-ob-
ronment, using online unsupervised clustering. It is shown that jective reinforcement learning.

12270-Article (PDF) - 25835-1-10-20210120
No ratings yet
12270-Article (PDF) - 25835-1-10-20210120
31 pages
R-Learning Model Update
No ratings yet
R-Learning Model Update
13 pages
19年总述
No ratings yet
19年总述
19 pages
OR-Gym: A Reinforcement Learning Library For Operations Research Problems
No ratings yet
OR-Gym: A Reinforcement Learning Library For Operations Research Problems
28 pages
2008.06319v2
No ratings yet
2008.06319v2
29 pages
Kormushev ROB2013
No ratings yet
Kormushev ROB2013
28 pages
The Recurrent Reinforcement Learning Crypto Agent: Gabriel Borrageiro Nick Firoozye Paolo Barucca
No ratings yet
The Recurrent Reinforcement Learning Crypto Agent: Gabriel Borrageiro Nick Firoozye Paolo Barucca
16 pages
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
s10489-022-04105-y
No ratings yet
s10489-022-04105-y
46 pages
Reinforcement Learning in Dynamic Environments Optimizing Real Time Decision Making for Complex Systems MAR 2025
No ratings yet
Reinforcement Learning in Dynamic Environments Optimizing Real Time Decision Making for Complex Systems MAR 2025
8 pages
Hindsight Experience Replay
No ratings yet
Hindsight Experience Replay
15 pages
Distral Original Paper
No ratings yet
Distral Original Paper
13 pages
Multi Agent Reinforcement Learning a Rev
No ratings yet
Multi Agent Reinforcement Learning a Rev
25 pages
Optimistic Linear Support and Successor Features As A Basis For Optimal Policy Transfer (Alegre, 2022)
No ratings yet
Optimistic Linear Support and Successor Features As A Basis For Optimal Policy Transfer (Alegre, 2022)
20 pages
Unit 5 Notes
No ratings yet
Unit 5 Notes
9 pages
Flow Network Based Generative Models For Non-Iterative Diverse Candidate Generation
No ratings yet
Flow Network Based Generative Models For Non-Iterative Diverse Candidate Generation
25 pages
10.3934 Geosci.2022027
No ratings yet
10.3934 Geosci.2022027
15 pages
Deep MARL
No ratings yet
Deep MARL
205 pages
Usage of GAMS-Based Digital Twins and Clustering To Improve Energetic Systems Control
No ratings yet
Usage of GAMS-Based Digital Twins and Clustering To Improve Energetic Systems Control
17 pages
Smart Grid Optimiization by Deep Reinforcement Learning Over Discreet and Continuous Action Space
No ratings yet
Smart Grid Optimiization by Deep Reinforcement Learning Over Discreet and Continuous Action Space
4 pages
Lecture 30 Reinforcement-Learning
No ratings yet
Lecture 30 Reinforcement-Learning
50 pages
ACC23 Tutorial Paulson
No ratings yet
ACC23 Tutorial Paulson
12 pages
3003 o Ine Reinforcement Learning W
No ratings yet
3003 o Ine Reinforcement Learning W
15 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I PDF
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I PDF
38 pages
MODRL/D-EL: Multiobjective Deep Reinforcement Learning With Evolutionary Learning For Multiobjective Optimization
No ratings yet
MODRL/D-EL: Multiobjective Deep Reinforcement Learning With Evolutionary Learning For Multiobjective Optimization
8 pages
A Deep Reinforcement Learning Algorithm For Robotic Manipulation Tasks in Simulated Environments
No ratings yet
A Deep Reinforcement Learning Algorithm For Robotic Manipulation Tasks in Simulated Environments
10 pages
2504.03804v1
No ratings yet
2504.03804v1
7 pages
Report On Reinforcement Learning
No ratings yet
Report On Reinforcement Learning
26 pages
Promptable Behaviors: Personalizing Multi-Objective Rewards From Human Preferences
No ratings yet
Promptable Behaviors: Personalizing Multi-Objective Rewards From Human Preferences
21 pages
1-s2.0-S1364815221001778-main
No ratings yet
1-s2.0-S1364815221001778-main
18 pages
Economic Dispatch Using Metaheuristics Algorithms, Problems,
No ratings yet
Economic Dispatch Using Metaheuristics Algorithms, Problems,
43 pages
Deep Reinforcement Learning For Multi-Objective Optimization
No ratings yet
Deep Reinforcement Learning For Multi-Objective Optimization
12 pages
M - O GF N: Ulti Bjective LOW ETS
No ratings yet
M - O GF N: Ulti Bjective LOW ETS
25 pages
Multi Objectives Reactive Dispatch Optimization of An Electrical Network
No ratings yet
Multi Objectives Reactive Dispatch Optimization of An Electrical Network
14 pages
Visual Reinforcement Learning With Imagined Goals: Equal Contribution. Order Was Determined by Coin Flip
No ratings yet
Visual Reinforcement Learning With Imagined Goals: Equal Contribution. Order Was Determined by Coin Flip
15 pages
PD Control Based On Reinforcement Learning Compensation For A DC Servo Drive
No ratings yet
PD Control Based On Reinforcement Learning Compensation For A DC Servo Drive
6 pages
1reinforcement Learning-Based Model Predictive Control For Discrete-Time Systems
No ratings yet
1reinforcement Learning-Based Model Predictive Control For Discrete-Time Systems
13 pages
Towards Delivering a Coherent Self-Contained Explanation of Proximal Policy Optimization
No ratings yet
Towards Delivering a Coherent Self-Contained Explanation of Proximal Policy Optimization
36 pages
case
No ratings yet
case
6 pages
2312.08365v2
No ratings yet
2312.08365v2
39 pages
High-Dimensional Continuous Control Using Generalized Advantage Estimation-1506.02438v5
No ratings yet
High-Dimensional Continuous Control Using Generalized Advantage Estimation-1506.02438v5
14 pages
Deep_Reinforcement_Learning_for_Smart_Grid_Operations_Algorithms_Applications_and_Prospects
No ratings yet
Deep_Reinforcement_Learning_for_Smart_Grid_Operations_Algorithms_Applications_and_Prospects
42 pages
Oakes
No ratings yet
Oakes
14 pages
EECS_602_Theory_of_RL_Final_Project_Report_Team1
No ratings yet
EECS_602_Theory_of_RL_Final_Project_Report_Team1
11 pages
Non-Maximizing Policies That Fulfill Multi-Criterion Aspirations in Expectation
No ratings yet
Non-Maximizing Policies That Fulfill Multi-Criterion Aspirations in Expectation
19 pages
[3] Deep Reinforcement Learning for Spacecraft Proximity Operations Guidance
No ratings yet
[3] Deep Reinforcement Learning for Spacecraft Proximity Operations Guidance
11 pages
bilil2012 (1)
No ratings yet
bilil2012 (1)
6 pages
2403.01013v2
No ratings yet
2403.01013v2
25 pages
On-Line Building Energy Optimization Using Deep Reinforcement Learning
No ratings yet
On-Line Building Energy Optimization Using Deep Reinforcement Learning
11 pages
[Ebooks PDF] download Predictive Control for Linear and Hybrid Systems 1st Edition Francesco Borrelli full chapters
100% (1)
[Ebooks PDF] download Predictive Control for Linear and Hybrid Systems 1st Edition Francesco Borrelli full chapters
81 pages
ParsopoulosV08_Multi-objective_optimization_in_CI_LT_Bui_S_Alam_eds_Chapter_2_pp20-42_IGI_Global_Hershey_USA_2008
No ratings yet
ParsopoulosV08_Multi-objective_optimization_in_CI_LT_Bui_S_Alam_eds_Chapter_2_pp20-42_IGI_Global_Hershey_USA_2008
24 pages
Yang 10124273 Thesis Revised
No ratings yet
Yang 10124273 Thesis Revised
327 pages
NeurIPS 2018 On Learning Intrinsic Rewards For Policy Gradient Methods Paper
No ratings yet
NeurIPS 2018 On Learning Intrinsic Rewards For Policy Gradient Methods Paper
11 pages
Flexible Transmission Network Expansion Planning B
No ratings yet
Flexible Transmission Network Expansion Planning B
21 pages
Enhancing Robotic Manipulation: Harnessing The Power of Multi-Task Reinforcement Learning and Single Life Reinforcement Learning in Meta-World
No ratings yet
Enhancing Robotic Manipulation: Harnessing The Power of Multi-Task Reinforcement Learning and Single Life Reinforcement Learning in Meta-World
13 pages
Paper Fiuri
No ratings yet
Paper Fiuri
17 pages
Adaptive and Learning Based Control of Safety Critical Systems
No ratings yet
Adaptive and Learning Based Control of Safety Critical Systems
209 pages
Reinforcement Learning: A Tutorial
No ratings yet
Reinforcement Learning: A Tutorial
17 pages
Deep Reinforcement Learning: An Essential Guide
From Everand
Deep Reinforcement Learning: An Essential Guide
Robert Johnson
No ratings yet
Subgoal Discovery For Hierarchical Reinforcement Learning Using Learned Policies
No ratings yet
Subgoal Discovery For Hierarchical Reinforcement Learning Using Learned Policies
5 pages
203 Assignment 1 Observations
No ratings yet
203 Assignment 1 Observations
2 pages
SBI PO Reasoning High Level PDF
No ratings yet
SBI PO Reasoning High Level PDF
8 pages
Carnot Cycle - Working Principle & Processes With (PV - Ts Diagram)
No ratings yet
Carnot Cycle - Working Principle & Processes With (PV - Ts Diagram)
8 pages
Lexan™ Exell D STP - Sheet Datasheet
No ratings yet
Lexan™ Exell D STP - Sheet Datasheet
2 pages
Bilbao TTL 101 (Reporting)
No ratings yet
Bilbao TTL 101 (Reporting)
27 pages
Cream and Brown Illustration Social Science Class Education Presentation
No ratings yet
Cream and Brown Illustration Social Science Class Education Presentation
18 pages
Class Ix Wisdom
No ratings yet
Class Ix Wisdom
9 pages
Kumpulan Soal English
No ratings yet
Kumpulan Soal English
41 pages
1.1 Introduction Packaging: 1.1.1 Unused Time
No ratings yet
1.1 Introduction Packaging: 1.1.1 Unused Time
22 pages
Transnationalism and Identity Outline - Kyla Uribe-2
No ratings yet
Transnationalism and Identity Outline - Kyla Uribe-2
9 pages
Planes de Clases de Prácticas
No ratings yet
Planes de Clases de Prácticas
52 pages
(Ebooks PDF) Download A Guide To Sample Size For Animal Based Studies 1st Edition Reynolds Full Chapters
100% (6)
(Ebooks PDF) Download A Guide To Sample Size For Animal Based Studies 1st Edition Reynolds Full Chapters
62 pages
LeMaps Dec 1 Small
No ratings yet
LeMaps Dec 1 Small
37 pages
Venue and Timetabling Details Form: Simmonsralph@yahoo - Co.in
No ratings yet
Venue and Timetabling Details Form: Simmonsralph@yahoo - Co.in
3 pages
ADM-FR-003 Student Directory Form
No ratings yet
ADM-FR-003 Student Directory Form
2 pages
Full Download Encyclopedia of The Anthropocene (5 Volumes) Dominick A. Dellasala PDF
100% (4)
Full Download Encyclopedia of The Anthropocene (5 Volumes) Dominick A. Dellasala PDF
64 pages
John Rovic Bacalla: Contact Details
No ratings yet
John Rovic Bacalla: Contact Details
2 pages
Filipino Values Activity Proposal
No ratings yet
Filipino Values Activity Proposal
2 pages
San Isidro College Integrated Basic Education City of Malaybalay English For Academic and Professional Purposes First Quarter School Year 2021-2022
No ratings yet
San Isidro College Integrated Basic Education City of Malaybalay English For Academic and Professional Purposes First Quarter School Year 2021-2022
5 pages
Presidency University Kolkata
No ratings yet
Presidency University Kolkata
17 pages
Problem Research Methodology
No ratings yet
Problem Research Methodology
2 pages
Engineering Design Guidelines-Instrumentation and Control
100% (1)
Engineering Design Guidelines-Instrumentation and Control
17 pages
Secondary Storage: Sequential and Direct-Access Devices
No ratings yet
Secondary Storage: Sequential and Direct-Access Devices
6 pages
Theories of Entrepreneurship
No ratings yet
Theories of Entrepreneurship
7 pages
Listening Test REVISI
No ratings yet
Listening Test REVISI
27 pages
1.4 What Is Proficiency Testing
No ratings yet
1.4 What Is Proficiency Testing
2 pages
Shimon Sir Chemical Kinetics
No ratings yet
Shimon Sir Chemical Kinetics
43 pages
Grammar in 15 Minutes a Day Junior Skill Builders Learningexpress - Get the ebook instantly with just one click
100% (1)
Grammar in 15 Minutes a Day Junior Skill Builders Learningexpress - Get the ebook instantly with just one click
49 pages
1 s2.0 S0144861717311074 Main
No ratings yet
1 s2.0 S0144861717311074 Main
7 pages
Indzara ReportCard Basic v1
No ratings yet
Indzara ReportCard Basic v1
28 pages

Special Issue On Multi-Objective Reinforcement Learning

Uploaded by

Special Issue On Multi-Objective Reinforcement Learning

Uploaded by

Neurocomputing 263 (2017) 1–2

Contents lists available at ScienceDirect

Special issue on multi-objective reinforcement learning

1. Test environment scheduling electricity generators to meet the customers’ demand

You might also like