Energy Management System for an Industrial Microgrid Using Optimization Algorithms-Based Reinforcement Learning Technique
Energy Management System for an Industrial Microgrid Using Optimization Algorithms-Based Reinforcement Learning Technique
Article
Energy Management System for an Industrial Microgrid
Using Optimization Algorithms-Based Reinforcement
Learning Technique
Saugat Upadhyay , Ibrahim Ahmed and Lucian Mihet-Popa *
Faculty of Information Technology, Engineering and Economics, Østfold University College, Kobberslagerstredet 5,
1671 Fredrikstad, Norway; [email protected] (S.U.); [email protected] (I.A.)
* Correspondence: [email protected]
Abstract: The climate crisis necessitates a global shift to achieve a secure, sustainable, and affordable
energy system toward a green energy transition reaching climate neutrality by 2050. Because of this,
renewable energy sources have come to the forefront, and the research interest in microgrids that
rely on distributed generation and storage systems has exploded. Furthermore, many new markets
for energy trading, ancillary services, and frequency reserve markets have provided attractive
investment opportunities in exchange for balancing the supply and demand of electricity. Artificial
intelligence can be utilized to locally optimize energy consumption, trade energy with the main
grid, and participate in these markets. Reinforcement learning (RL) is one of the most promising
approaches to achieve this goal because it enables an agent to learn optimal behavior in a microgrid
by executing specific actions that maximize the long-term reward signal/function. The study focuses
on testing two optimization algorithms: logic-based optimization and reinforcement learning. This
paper builds on the existing research framework by combining PPO with machine learning-based
load forecasting to produce an optimal solution for an industrial microgrid in Norway under different
pricing schemes, including day-ahead pricing and peak pricing. It addresses the peak shaving and
price arbitrage challenges by taking the historical data into the algorithm and making the decisions
according to the energy consumption pattern, battery characteristics, PV production, and energy
price. The RL-based approach is implemented in Python based on real data from the site and in
combination with MATLAB-Simulink to validate its results. The application of the RL algorithm
achieved an average monthly cost saving of 20% compared with logic-based optimization. These
Citation: Upadhyay, S.; Ahmed, I.; findings contribute to digitalization and decarbonization of energy technology, and support the
Mihet-Popa, L. Energy Management fundamental goals and policies of the European Green Deal.
System for an Industrial Microgrid
Using Optimization Algorithms-
Keywords: EMS; PPO; BESS; optimization algorithm; peak shaving; price arbitrage
Based Reinforcement Learning
Technique. Energies 2024, 17, 3898.
https://doi.org/10.3390/en17163898
residential microgrids [2,3]. Such kind of microgrids aid in lowering long-distance power
transmission losses while simultaneously reducing the pollution from heavy industry [4].
IMGs are an effective instrument for adapting to diverse energy requirements. A battery
energy storage system (BESS), for example, may be controlled by a microgrid to provide
different backup power and enhance the reliability of the IMG [5].
An energy management system (EMS) is used to optimally coordinate the power
exchange throughout the IMG and with the main grid, reducing energy costs while im-
proving flexibility and energy efficiency [6–8]. Designing and developing EMS algorithms
for day-ahead and real-time scheduling is challenging because of the complexity of the
microgrid, intermittent nature of DERs, and unpredictable load requirements [6,9]. Battery
energy storage systems (BESSs) can be effectively utilized to balance these demands and
trade energy with the main grid based on the renewable production and price of electricity.
Energy optimization in industrial microgrids has been extensively studied in the
literature. Authors in [10] developed a day-ahead multi-objective optimization framework
for industrial plant energy management, assuming that the facility had installed RESs.
Meanwhile, Ref. [11] created an optimal energy management method in the industrial
sector to minimize the total electricity cost with renewable generation, energy storage,
and day-ahead pricing using state task network and mixed integer linear programming,
while [12] presented a demand response strategy to reduce energy for industrial facilities,
using energy storage and distributed generation investigated under day-ahead, time-of-use,
and peak pricing schemes. These studies utilized basic optimization approaches and did not
utilize forecasting. On the other hand, ref. [13] introduced an online energy management
system (EMS) for an industrial facility equipped with energy storage. The optimization
employed a rolling horizon strategy and used an artificial neural network model to forecast
and minimize the uncertainty of electricity prices. The system solved a mixed-integer
optimization problem based on the most recent forecast results for each sliding window,
which helped in scheduling responsive demands. Additionally, ref. [14] presented a real-
time EMS that used a data distribution service that incorporated an online optimization
scheme for microgrids with residential energy consumption and irradiance data from
Florida. It utilized a feed-forward neural network to predict the power consumption and
renewable energy generation. A review of energy optimization in industrial microgrids
utilizing distributed energy resources is presented in [15]. Furthermore, resource efficiency
and resiliency are important aspects of microgrid design as they affect the overall system
performance. A comparison between the microgrid stage efficiencies for each mode of
operation is presented in [16], while resilience analysis and methods of improving resiliency
in microgrids can be found in [17].
Reinforcement learning has emerged as a method to solve complex problems with
large state spaces. An RL agent starts with a random policy, which is a mapping between
the observations (inputs) and the actions (outputs), and then incrementally learns to update
and improve its policy using a reward signal that is given by the environment as an
evaluation of the quality of the action performed. The goal of the agent is to maximize the
reward signal over time. This can be achieved using a variety of methods but, generally,
there are two high-level approaches, learning through the value function, and learning
through the policy. A value function is an estimate of the future rewards obtained by taking
an action and then following a specific policy. RL agents can learn either by optimizing
the value function, the policy, or both [18]. Actor–critic learning approaches make use of
both the policy and the value function. The actor modifies the policy by updating the value
function estimate provided by the critic.
Reinforcement learning has been used to optimize microgrids in the residential
sector because it has proven to be a viable strategy for optimizing complex dynamic
systems [19,20]. For instance, a microgrid in Belgium saw a reduction in cost and an in-
crease in efficiency when the Deep-Q-Network (DQN) technique was implemented in [21],
assuming a fixed price of electricity. The suggested method in [22] generated three dis-
tinct consumption profiles according to the needs of the customers and used the Deep
Energies 2024, 17, 3898 3 of 18
2. Microgrid Architecture
The microgrid at the industrial site in Norway is a grid-connected system with 200 kWp
of PV generation, a 1.1 MWh battery storage system, a 360 kW electric vehicle charger, and
two types of loads. The overall system diagram can be seen in Figure 1. There are several
smart meters (denoted by SM) installed to record the energy flow. Load 1 and load 2 are
the main electricity loads, where load 1 is an industrial load and load 2 is a smaller load
from an existing old building.
The 1.1 MW battery energy storage system (BESS) is used for backup energy supply
and storage. This stored energy is sold back to the grid when the electricity prices are
high. The 360 kW electric vehicle (EV) charger is present at the facility to charge the electric
lorries and trucks.
2.1. PV System
The PV system is distributed in three different areas in three buildings. The south
building is a facade configuration with 44 panels with 310 watts each, while the southeast
Energies 2024, 17, 3898 4 of 18
building is equipped with 96 modules with an 11° inclination and a roof-mounted config-
uration. Similarly, the northwest building is configured with 74 solar panels with an 11°
inclination towards the northwest. The PV system also contains three inverters to couple it
with the IMG. Based on the irradiance in the area, the anticipated PV energy production
throughout 2024 was calculated using PVSOL software (Version: PVSOL premium 2024
(R2)), and the results are displayed in Figure 2. Table 1 shows the general parameters of the
PV system.
Parameters Values
PV Generator Output 200.88 kWp
PV Generator Surface 1059.6 m²
Number of PV Modules 648
Number of Inverters 3
PV Module Used JAM60S01-310/PR
Speculated Annual Yield 87,594 kWh/kWp
Parameters Values
Battery Type LPF Lithium-ion
Battery Capacity 1105 kWh
Rated Battery Voltage 768 Vdc
Battery Voltage Range 672–852 Vdc
Max. Charge/Discharge Current 186 A
Max. Charge/Discharge Power 1000 kW
An essential component of controlling the energy transfer between the battery storage
system and the electrical grid is the bidirectional inverter or Power Conversion System
(PCS). Its primary job is to charge the batteries by converting alternating current (AC) from
Energies 2024, 17, 3898 5 of 18
the IMG into direct current (DC), and vice versa. For applications such as peak shaving,
where excess energy is kept during low demand times and released at peak demand to
sustain grid operations, this bidirectional capability is essential.
The inverter or PCS system has the ability to operate in both grid-tied and off-grid
modes. This system is adaptable for a range of energy storage requirements since it can
handle broad battery voltage ranging from 600 V to 900 V, generate up to 500 kW of
nominal power, and support up to eight battery strings. It has an efficiency above 97%. For
efficient thermal management, the PCS unit uses forced air cooling, which ensures peak
performance, even at full load. The inverter specifications are displayed in Table 3.
Parameters Values
Rated Voltage 400 V (L-L)
Rated Frequency 50/60 Hz
AC Connection 3W+N
Rated Power 2 × 500 kW
Rated Current Imax 2 × 721.7 A
Power Factor 0.8–1 (leading or lagging, load-dependent)
In addition to the main components, the system also contains other IoT devices, smart
meters, GPC (Grid Power Controller), etc. These devices function as a gateway to the
battery system so that it can be controlled with the help of software programming. They
operate on LINUX (Version: Ubuntu 22.04.4 LTS) and use the MODBUS TCP protocol [27]
for communication with local or remote servers and to send data to the cloud, as shown
in Figure 3.
IoT Device
MODBUS
Communication
Data Flow and
PV
BESS
Smart Meter
Figure 1 illustrates the four smart meters in the industrial microgrid, out of which
SM1 is a virtual smart meter while SM2, SM3, and SM4 are the physically present meters
connected to the loads and DERs. These smart meters measure apparent power, active
Energies 2024, 17, 3898 6 of 18
power, and reactive power using the true RMS value measurement (TRMS) up to the 63rd
harmonic in all four quadrants [28].
The energy price data were collected from the ‘www.hvakosterstrommen.no’ (accessed
on 15 February 2024) website [29]. This website provides an open and free API to retrieve
Norwegian electricity prices along with historical data. They collect the data from ENTSO-E
in euros and convert it to the local currency using the latest exchange rate [30]. ENTSO-E is a
transparency platform where data and information on electricity generation, transportation,
and consumption are centralized and published for the benefit of the whole European
market [30].
Energies 2024, 17, 3898 7 of 18
where:
• f ( X ) is the prediction function of the Random Forest.
• B is the number of trees.
• Tb ( X; Θb ) represents a single decision tree indexed by b, which is a function of the
features, X, and random parameters, Θb .
Predictions are adjusted based on PV production before making a forecast for a specific
month. Finally, the results are compiled and saved, completing the process.
Figure 6 shows the graph from forecasted data. It shows the grid import (denoted by
blue line) and site load (denoted by green line) of the site. The grid import is negative as
time goes by because, following the month of March, there is more PV production and due
to this more energy is supplied to the grid.
Start
Measure Gi,Pload,
SOCinit, Ecost, PPV
Yes Is PPV No
>0? Hold
Yes Is SOC > SOCmin? No
Hold Discharge
No Yes
Is Ecost<=Emin?
Hold Charge
Hold
Discharge
The system will not charge the battery to save expensive energy expenses if the
cost is unfavorable and the battery SOC is below a maximum threshold (SOCmax ). Upon
reaching a certain power threshold (Pthres ), the system determines if it is necessary to use
the grid to satisfy energy requirements. The battery health is maintained by the system
maintaining the battery SOC above a minimum allowable level (SOCmin ). On the other
hand, the algorithm will discharge the battery if the SOC is above SOCmin . To maximize
both economic and energy efficiency, the system additionally incorporates some logic to
manage energy from the PV system and use it directly for the load or to charge the battery
with any excess generation.
For peak shaving, the algorithm uses an energy management technique called “dy-
namic peak shaving”, which is used to lower the greatest power demand or load in the
system throughout the day. By setting a peak shaving threshold, the power demand, or
grid import per hour, is kept below a certain level. This is accomplished using a battery
storage system to supplement the grid supply during times of high demand. Dynamic
peak shaving aims to minimize energy expenses, prevent peak demand charges, and lessen
the burden on the electrical system. The peak shaving threshold is dynamically determined
using the maximum load estimate for each day. This algorithm is intended to run on a
daily basis.
The given equations describe the battery’s charging and discharging operations. The
charging equation limits the charge added to the battery by either the maximum charge
rate or the remaining capacity adjusted for efficiency. Similarly, the discharging equation
limits the energy discharged by either the maximum discharge rate or the current storage
Energies 2024, 17, 3898 9 of 18
level adjusted for efficiency. These equations ensure the battery operates within its physical
and efficiency constraints, optimizing its performance.
of action (charge, discharge, or hold) at various points in time, given the site load, PV
production, grid import, and electricity price state inputs. Through a comparison of the
operational expenses with and without battery optimization, a reward signal is calculated
based on the performance of the RL agent. To quantify the economic advantages of strategic
battery management, the costs are computed using the agent’s actions and the current
power prices. After the action is carried out and the reward is assigned, the model updates
and enhances its internal policy by observing the reward and the altered condition of
the environment (next state). This cycle keeps going until an episode ends, which is the
achievement of a predetermined state or the conclusion of a series of states. The agent resets
and moves on to the next episode and keeps learning until the training session is finalized.
The RL agent is ultimately intended to learn a policy that reduces energy expenses and
earns as much profit as possible by selling the excess energy through these recurrent cycles.
Industrial Microgrid
Reinforcement
Environment
Learning Agent
Action
Policy Function BESS PV
Charge/Discharge/Hold
Grid
Reward
Episodic Training
EV
Load
Charger
Pload P PV
Energy
Pgrid
Price
Battery SOC
Observations
Figure 8. Workflow of PPO algorithm.
Peak hour pricing scheme (taken from the highest peak in the month)
Winter: November–March (84 NOK/kW/month)
Summer: April–October (35 NOK/kW/month)
Peak hour pricing scheme for reactive power (taken from the highest peak in the month)
Winter: November–March (35 NOK/kVAr/month)
Summer: April–October (15 NOK/kW/month)
Energies 2024, 17, 3898 11 of 18
Therefore, the total energy cost depends not only on the consumption in kWh but
also on the highest peak in kW per month as it will be added to the cost, as shown in
Equation (5).
Total Cost = EkWh × Price NOK/kWh + PeakkW × Peak Price NOK/kW/month (5)
To help illustrate this point, take for instance the two power profiles displayed in
Figure 9. Even though the total energy consumption is the same for both profiles (area
under the curve), the cost of the blue profile is higher than the red profile because of
the higher peak power consumption that results in additional penalties under the peak
pricing scheme.
Figure 9. An example illustrating the cost difference under the peak pricing scheme for two consump-
tion profiles with the same total consumed energy (area).
Grid Import and Site Load Forecast Energy Price (NOK per kWh)
Grid Import kW Forecast Energy Price NOK/kWh Forecast
Site Load kW Forecast
150
0.80
100
0.75
Values in NOK
Values in kW
50
0.70
0
0.65
50
0.60
00
03
06
09
12
15
18
21
00
00
03
06
09
12
15
18
21
00
-02
-02
-02
-02
-02
-02
-02
-02
-03
-02
-02
-02
-02
-02
-02
-02
-02
-03
07
07
07
07
07
07
07
07
07
07
07
07
07
07
07
07
07
07
Datetime Datetime
Battery Charged/Discharged kW Battery SOC (%)
200 Battery Charged/Discharged kW 100 Battery SOC (%)
150
90
100
50 80
Values in kW
SOC %
0
70
50
100 60
150 50
200
00
03
06
09
12
15
18
21
00
00
03
06
09
12
15
18
21
00
-02
-02
-02
-02
-02
-02
-02
-02
-03
-02
-02
-02
-02
-02
-02
-02
-02
-03
07
07
07
07
07
07
07
07
07
07
07
07
07
07
07
07
07
07
Datetime Datetime
Figure 11. Grid import, site load, energy price, battery power, and SOC for a day in July.
When comparing the cost reductions achieved by the RL optimization between Febru-
ary and July, the RL algorithm produced savings that were 20% greater on average than the
logic-based algorithm. This significant result highlights the powerful nature of RL as an
approach for optimization. This highlights a substantial improvement in cost efficiency,
achieved through a combination of peak shaving and price arbitrage, both dynamically
managed by reinforcement learning. Peak shaving minimizes penalties by reducing peak
load during high-cost periods, while price arbitrage optimizes energy costs by charging
the battery during low-cost periods and discharging during high-cost periods. The RL
algorithm enhances efficiency by continuously learning and adapting to energy price fluctu-
ations and load demand, ensuring optimal battery operation. These strategies collectively
contribute to the significant cost savings.
short-term price fluctuations while simultaneously managing long-term peak power costs.
However, it is important to note that this complex trade-off between spot price trading and
peak penalty management requires longer training sessions to optimize effectively. Future
research could focus on extending training periods and fine-tuning hyperparameters to
further improve performance in this challenging scenario.
SpotPrice SpotPrice
SOC SOC
0.81 Battery Power 0.81 Battery Power
0.6 0.6
0.4 0.4
Normalized Units
Normalized Units
0.2 0.2
0 0
−0.2 −0.2
−0.4 −0.4
−0.6 −0.6
−0.8 −0.8
−1 −1
5 10 15 20 25 30 35 40 45 5 10 15 20 25 30 35 40 45
Time (hrs) Time (hrs)
(a) (b)
Figure 13. Normalized results for the spot price, SOC, and battery. (a) Results with PPO; (b) results
with TD3.
60 80
Pbattery (TD3) Electricity Cost (TD3)
Pbattery (PPO) Electricity Cost (PPO)
70
40
60
50
20
40
Power (kW)
Price (NOK)
0 30
20
−20
10
0
−40
−10
−60 −20
Time (days) Week 2 Week 3 Week 4
(a) (b)
0.5 50
Pgrid (TD3) SOC (TD3)
Pgrid (PPO) SOC (PPO)
45
40
0
35
Power (pu)
SOC (%)
30
−0.5 25
20
15
−1
10
M y
ay
Sa ay
Sa ay
Tu y
ay
ay
ay
ay
ay
Th ay
Th ay
a
da
W sda
nd
nd
id
id
sd
rd
sd
rd
sd
sd
sd
Fr
tu
tu
ur
ur
e
ne
e
ne
Su
Su
Tu
ed
ed
W
(c) (d)
Figure 14. Comparison of the Pbattery , electricity cost, Pgrid , and the SOC between TD3 and PPO.
(a) Comparison of Pbattery ; (b) comparison of the electricity cost; (c) comparison of Pgrid ; (d) compari-
son of the SOC.
Energies 2024, 17, 3898 16 of 18
5. Conclusions
In this paper, the optimization of an industrial microgrid using logic-based and RL-
based algorithms was performed. Load forecasting and simulation validation were carried
out, and two algorithms were benchmarked against one another. Notably, the RL algorithm
achieved an average monthly cost reduction of 20% compared with logic-based optimiza-
tion. The RL algorithm effectively manages battery energy storage systems (BESSs) by
dynamically adapting peak shaving logic to varying load projections. Battery charging
and discharging respond to energy prices and load conditions, ensuring efficient operation.
Future research directions include investigating scalability for larger microgrids, and testing
robustness under diverse scenarios.
Author Contributions: Conceptualization, S.U. and L.M.-P.; Methodology, S.U. and L.M.-P.; Software,
S.U. and I.A.; Formal analysis, I.A. and L.M.-P.; Writing—original draft, S.U. and I.A.; Writing—review
& editing, L.M.-P.; Supervision, L.M.-P. All authors have read and agreed to the published version of
the manuscript.
Funding: This work was supported in part by EEA and Norway Grants financed by Innovation
Norway in DOITSMARTER project, Ref. 2022/337335.
Data Availability Statement: The original contributions presented in the study are included in the
article, further inquiries can be directed to the corresponding author.
Conflicts of Interest: The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
References
1. Department of Energy, Office of Electricity Delivery and Energy Reliability. Summary Report: 2012 DOE Microgrid Workshop.
2012. Available online: https://www.energy.gov/oe/articles/2012-doe-microgrid-workshop-summary-report-september-2012
(accessed on 24 May 2022).
2. Lu, R.; Bai, R.; Ding, Y.; Wei, M.; Jiang, J.; Sun, M.; Xiao, F.; Zhang, H.T. A hybrid deep learning-based online energy management
scheme for industrial microgrid. Appl. Energy 2021, 304, 117857. [CrossRef]
3. Wang, C.; Yan, J.; Marnay, C.; Djilali, N.; Dahlquist, E.; Wu, J.; Jia, H. Distributed Energy and Microgrids (DEM). Appl. Energy
2018, 210, 685–689. [CrossRef]
4. Brem, A.; Adrita, M.M.; O’Sullivan, D.T.; Bruton, K. Industrial smart and micro grid systems—A systematic mapping study.
J. Clean. Prod. 2020, 244, 118828. [CrossRef]
5. Mehta, R. A microgrid case study for ensuring reliable power for commercial and industrial sites. In Proceedings of the 2019 IEEE
PES GTD Grand International Conference and Exposition Asia (GTD Asia), Bangkok, Thailand, 19–23 March 2019; pp. 594–598.
Energies 2024, 17, 3898 17 of 18
6. Roslan, M.; Hannan, M.; Ker, P.J.; Begum, R.; Mahlia, T.I.; Dong, Z. Scheduling controller for microgrids energy management
system using optimization algorithm in achieving cost saving and emission reduction. Appl. Energy 2021, 292, 116883. [CrossRef]
7. Roslan, M.; Hannan, M.; Ker, P.J.; Uddin, M. Microgrid control methods toward achieving sustainable energy management. Appl.
Energy 2019, 240, 583–607. [CrossRef]
8. Pourmousavi, S.A.; Nehrir, M.H.; Colson, C.M.; Wang, C. Real-time energy management of a stand-alone hybrid wind-
microturbine energy system using particle swarm optimization. IEEE Trans. Sustain. Energy 2010, 1, 193–201. [CrossRef]
9. Marzband, M.; Sumper, A.; Ruiz-Alvarez, A.; Domínguez-García, J.L.; Tomoiagă, B. Experimental evaluation of a real time energy
management system for stand-alone microgrids in day-ahead markets. Appl. Energy 2013, 106, 365–376. [CrossRef]
10. Choobineh, M.; Mohagheghi, S. A multi-objective optimization framework for energy and asset management in an industrial
Microgrid. J. Clean. Prod. 2016, 139, 1326–1338. [CrossRef]
11. Ding, Y.M.; Hong, S.H.; Li, X.H. A demand response energy management scheme for industrial facilities in smart grid. IEEE
Trans. Ind. Inform. 2014, 10, 2257–2269. [CrossRef]
12. Gholian, A.; Mohsenian-Rad, H.; Hua, Y. Optimal industrial load control in smart grid. IEEE Trans. Smart Grid 2015, 7, 2305–2316.
[CrossRef]
13. Huang, X.; Hong, S.H.; Li, Y. Hour-ahead price based energy management scheme for industrial facilities. IEEE Trans. Ind. Inform.
2017, 13, 2886–2898. [CrossRef]
14. Youssef, T.A.; El Hariri, M.; Elsayed, A.T.; Mohammed, O.A. A DDS-based energy management framework for small microgrid
operation and control. IEEE Trans. Ind. Inform. 2017, 14, 958–968. [CrossRef]
15. Gutiérrez-Oliva, D.; Colmenar-Santos, A.; Rosales-Asensio, E. A review of the state of the art of industrial microgrids based on
renewable energy. Electronics 2022, 11, 1002. [CrossRef]
16. Correia, A.F.; Moura, P.; de Almeida, A.T. Technical and economic assessment of battery storage and vehicle-to-grid systems in
building microgrids. Energies 2022, 15, 8905. [CrossRef]
17. Hussain, A.; Bui, V.H.; Kim, H.M. Microgrids as a resilience resource and strategies used by microgrids for enhancing resilience.
Appl. Energy 2019, 240, 56–72. [CrossRef]
18. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018.
19. Arwa, E.O.; Folly, K.A. Reinforcement learning techniques for optimal power control in grid-connected microgrids: A compre-
hensive review. IEEE Access 2020, 8, 208992–209007. [CrossRef]
20. Mughees, N.; Jaffery, M.H.; Mughees, A.; Ansari, E.A.; Mughees, A. Reinforcement learning-based composite differential
evolution for integrated demand response scheme in industrial microgrids. Appl. Energy 2023, 342, 121150. [CrossRef]
21. François-Lavet, V.; Taralla, D.; Ernst, D.; Fonteneau, R. Deep reinforcement learning solutions for energy microgrids management.
In Proceedings of the European Workshop on Reinforcement Learning (EWRL 2016), Barcelona, Spain, 3–4 December 2016.
22. Chen, P.; Liu, M.; Chen, C.; Shang, X. A battery management strategy in microgrid for personalized customer requirements.
Energy 2019, 189, 116245. [CrossRef]
23. Nakabi, T.A.; Toivanen, P. Deep reinforcement learning for energy management in a microgrid with flexible demand. Sustain.
Energy Grids Netw. 2021, 25, 100413. [CrossRef]
24. Ji, Y.; Wang, J.; Xu, J.; Fang, X.; Zhang, H. Real-time energy management of a microgrid using deep reinforcement learning.
Energies 2019, 12, 2291. [CrossRef]
25. Lee, S.; Seon, J.; Sun, Y.G.; Kim, S.H.; Kyeong, C.; Kim, D.I.; Kim, J.Y. Novel architecture of energy management systems based on
deep reinforcement learning in microgrid. IEEE Trans. Smart Grid 2023, 15, 1646–1658. [CrossRef]
26. Ahmed, I.; Pedersen, A.; Mihet-Popa, L. Smart Microgrid Optimization using Deep Reinforcement Learning by utilizing the
Energy Storage Systems. In Proceedings of the 2024 4th International Conference on Smart Grid and Renewable Energy (SGRE),
Doha, Qatar, 8–10 January 2024; pp. 1–7.
27. ProSoft Technology. Introduction to Modbus TCP/IP; Acromag, Inc.: Wixom, MI, USA, 2024.
28. EEM-MA771—Measuring Instrument. 2024. Available online: https://www.phoenixcontact.com/en-no/products/measuring-
instrument-eem-ma771-2908286 (accessed on 10 March 2024)
29. Hva Koster Strommen. What Does Strømmen.no Cost? 2024. Available online: https://www.hvakosterstrommen.no/ (accessed
on 15 March 2024).
30. ENTSOE. Entso-e Transparency Platform. 2024. Available online: https://transparency.entsoe.eu/ (accessed on 13 March 2024).
31. RandomForestRegressor. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.
RandomForestRegressor.html (accessed on 30 May 2024).
32. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [CrossRef]
33. Amazon Web Services. What Is Reinforcement Learning? Available online: https://aws.amazon.com/what-is/reinforcement-
learning/ (accessed on 30 May 2024).
34. OpenAI. Proximal Policy Optimization. Available online: https://spinningup.openai.com/en/latest/algorithms/ppo.html
(accessed on 30 May 2024).
35. Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017,
arXiv:1707.06347.
Energies 2024, 17, 3898 18 of 18
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.