0% found this document useful (0 votes)
5 views

Energy Management System for an Industrial Microgrid Using Optimization Algorithms-Based Reinforcement Learning Technique

The paper presents an Energy Management System (EMS) for an industrial microgrid in Norway, utilizing Proximal Policy Optimization (PPO) combined with machine learning for load forecasting to optimize energy consumption and trading under various pricing schemes. The study demonstrates that the RL-based approach can achieve a 20% cost saving compared to traditional logic-based optimization methods. This research contributes to the digitalization and decarbonization of energy technology, aligning with the goals of the European Green Deal.

Uploaded by

ialameen102
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Energy Management System for an Industrial Microgrid Using Optimization Algorithms-Based Reinforcement Learning Technique

The paper presents an Energy Management System (EMS) for an industrial microgrid in Norway, utilizing Proximal Policy Optimization (PPO) combined with machine learning for load forecasting to optimize energy consumption and trading under various pricing schemes. The study demonstrates that the RL-based approach can achieve a 20% cost saving compared to traditional logic-based optimization methods. This research contributes to the digitalization and decarbonization of energy technology, aligning with the goals of the European Green Deal.

Uploaded by

ialameen102
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

energies

Article
Energy Management System for an Industrial Microgrid
Using Optimization Algorithms-Based Reinforcement
Learning Technique
Saugat Upadhyay , Ibrahim Ahmed and Lucian Mihet-Popa *

Faculty of Information Technology, Engineering and Economics, Østfold University College, Kobberslagerstredet 5,
1671 Fredrikstad, Norway; [email protected] (S.U.); [email protected] (I.A.)
* Correspondence: [email protected]

Abstract: The climate crisis necessitates a global shift to achieve a secure, sustainable, and affordable
energy system toward a green energy transition reaching climate neutrality by 2050. Because of this,
renewable energy sources have come to the forefront, and the research interest in microgrids that
rely on distributed generation and storage systems has exploded. Furthermore, many new markets
for energy trading, ancillary services, and frequency reserve markets have provided attractive
investment opportunities in exchange for balancing the supply and demand of electricity. Artificial
intelligence can be utilized to locally optimize energy consumption, trade energy with the main
grid, and participate in these markets. Reinforcement learning (RL) is one of the most promising
approaches to achieve this goal because it enables an agent to learn optimal behavior in a microgrid
by executing specific actions that maximize the long-term reward signal/function. The study focuses
on testing two optimization algorithms: logic-based optimization and reinforcement learning. This
paper builds on the existing research framework by combining PPO with machine learning-based
load forecasting to produce an optimal solution for an industrial microgrid in Norway under different
pricing schemes, including day-ahead pricing and peak pricing. It addresses the peak shaving and
price arbitrage challenges by taking the historical data into the algorithm and making the decisions
according to the energy consumption pattern, battery characteristics, PV production, and energy
price. The RL-based approach is implemented in Python based on real data from the site and in
combination with MATLAB-Simulink to validate its results. The application of the RL algorithm
achieved an average monthly cost saving of 20% compared with logic-based optimization. These
Citation: Upadhyay, S.; Ahmed, I.; findings contribute to digitalization and decarbonization of energy technology, and support the
Mihet-Popa, L. Energy Management fundamental goals and policies of the European Green Deal.
System for an Industrial Microgrid
Using Optimization Algorithms-
Keywords: EMS; PPO; BESS; optimization algorithm; peak shaving; price arbitrage
Based Reinforcement Learning
Technique. Energies 2024, 17, 3898.
https://doi.org/10.3390/en17163898

Received: 10 June 2024 1. Introduction


Revised: 6 July 2024
Clean energy sources such as hydropower, wind energy, and solar energy are gradually
Accepted: 31 July 2024
replacing more conventional energy sources based on fossil fuels and coal. This shift is a
Published: 7 August 2024
result of the environmental duty to become sustainable and reduce carbon emissions, in
addition to the outcomes of economic and technological progress. Therefore, as the globe
moves towards more sustainable solutions, the significance of microgrids and distributed
Copyright: © 2024 by the authors.
generation, especially those that use and incorporate more renewable energy sources, has
Licensee MDPI, Basel, Switzerland. grown. There has been a significant shift in how the power system operates, and thus
This article is an open access article microgrids have emerged as the new method of managing distributed generation. One
distributed under the terms and of the definitions of the term “microgrid”, according to the US Department of Energy, is
conditions of the Creative Commons “a group of interconnected loads and distributed energy resources within clearly defined
Attribution (CC BY) license (https:// electrical boundaries that act as a single controllable entity with respect to the grid” [1].
creativecommons.org/licenses/by/ Industrial microgrids (IMGs) are made up of industrial loads, energy storage systems (ESSs),
4.0/). and renewable energy sources, and have different operational requirements compared with

Energies 2024, 17, 3898. https://doi.org/10.3390/en17163898 https://www.mdpi.com/journal/energies


Energies 2024, 17, 3898 2 of 18

residential microgrids [2,3]. Such kind of microgrids aid in lowering long-distance power
transmission losses while simultaneously reducing the pollution from heavy industry [4].
IMGs are an effective instrument for adapting to diverse energy requirements. A battery
energy storage system (BESS), for example, may be controlled by a microgrid to provide
different backup power and enhance the reliability of the IMG [5].
An energy management system (EMS) is used to optimally coordinate the power
exchange throughout the IMG and with the main grid, reducing energy costs while im-
proving flexibility and energy efficiency [6–8]. Designing and developing EMS algorithms
for day-ahead and real-time scheduling is challenging because of the complexity of the
microgrid, intermittent nature of DERs, and unpredictable load requirements [6,9]. Battery
energy storage systems (BESSs) can be effectively utilized to balance these demands and
trade energy with the main grid based on the renewable production and price of electricity.
Energy optimization in industrial microgrids has been extensively studied in the
literature. Authors in [10] developed a day-ahead multi-objective optimization framework
for industrial plant energy management, assuming that the facility had installed RESs.
Meanwhile, Ref. [11] created an optimal energy management method in the industrial
sector to minimize the total electricity cost with renewable generation, energy storage,
and day-ahead pricing using state task network and mixed integer linear programming,
while [12] presented a demand response strategy to reduce energy for industrial facilities,
using energy storage and distributed generation investigated under day-ahead, time-of-use,
and peak pricing schemes. These studies utilized basic optimization approaches and did not
utilize forecasting. On the other hand, ref. [13] introduced an online energy management
system (EMS) for an industrial facility equipped with energy storage. The optimization
employed a rolling horizon strategy and used an artificial neural network model to forecast
and minimize the uncertainty of electricity prices. The system solved a mixed-integer
optimization problem based on the most recent forecast results for each sliding window,
which helped in scheduling responsive demands. Additionally, ref. [14] presented a real-
time EMS that used a data distribution service that incorporated an online optimization
scheme for microgrids with residential energy consumption and irradiance data from
Florida. It utilized a feed-forward neural network to predict the power consumption and
renewable energy generation. A review of energy optimization in industrial microgrids
utilizing distributed energy resources is presented in [15]. Furthermore, resource efficiency
and resiliency are important aspects of microgrid design as they affect the overall system
performance. A comparison between the microgrid stage efficiencies for each mode of
operation is presented in [16], while resilience analysis and methods of improving resiliency
in microgrids can be found in [17].
Reinforcement learning has emerged as a method to solve complex problems with
large state spaces. An RL agent starts with a random policy, which is a mapping between
the observations (inputs) and the actions (outputs), and then incrementally learns to update
and improve its policy using a reward signal that is given by the environment as an
evaluation of the quality of the action performed. The goal of the agent is to maximize the
reward signal over time. This can be achieved using a variety of methods but, generally,
there are two high-level approaches, learning through the value function, and learning
through the policy. A value function is an estimate of the future rewards obtained by taking
an action and then following a specific policy. RL agents can learn either by optimizing
the value function, the policy, or both [18]. Actor–critic learning approaches make use of
both the policy and the value function. The actor modifies the policy by updating the value
function estimate provided by the critic.
Reinforcement learning has been used to optimize microgrids in the residential
sector because it has proven to be a viable strategy for optimizing complex dynamic
systems [19,20]. For instance, a microgrid in Belgium saw a reduction in cost and an in-
crease in efficiency when the Deep-Q-Network (DQN) technique was implemented in [21],
assuming a fixed price of electricity. The suggested method in [22] generated three dis-
tinct consumption profiles according to the needs of the customers and used the Deep
Energies 2024, 17, 3898 3 of 18

Deterministic Policy Gradient (DDPG) algorithm to produce a very profitable scheme.


However, the results were only observed over a few weeks, and, for one of the plans, the
battery was simply discharged at the end, which does not demonstrate how the trained
algorithm would function over an extended period of time. Several distinct reinforcement
learning algorithms were compared over a ten-day period in using data from Finland,
and an enhanced version of the Advantage Actor–Critic algorithm (A3C++) achieved the
best performance [23]. In a different instance, ref. [24] reduced the operational cost by
20.75% using RL. More recently, Proximal Policy Optimization (PPO) has emerged as a
powerful RL algorithm and was utilized in [25,26] to optimize energy costs in a microgrid
with promising results. However, load forecasting was not included.
This paper builds on the existing research framework by combining PPO with machine
learning-based load forecasting to produce an optimal solution for an industrial microgrid
in Norway under different pricing schemes, including day-ahead pricing and peak pricing.
It addresses the peak shaving and price arbitrage challenges by taking the historical data
into the algorithm and making the decisions according to the pattern of energy consumption,
battery characteristics, PV production, and energy price.
The paper is distributed into four different sections. The microgrid architecture is
discussed in Section 2, with the components of the microgrid at the industrial site in
Norway. In Section 3, the design and workflow of the EMS algorithms are discussed. The
results from the algorithms are presented in Section 4, while Section 5 concludes the paper.

2. Microgrid Architecture
The microgrid at the industrial site in Norway is a grid-connected system with 200 kWp
of PV generation, a 1.1 MWh battery storage system, a 360 kW electric vehicle charger, and
two types of loads. The overall system diagram can be seen in Figure 1. There are several
smart meters (denoted by SM) installed to record the energy flow. Load 1 and load 2 are
the main electricity loads, where load 1 is an industrial load and load 2 is a smaller load
from an existing old building.

Figure 1. Overall microgrid system diagram.

The 1.1 MW battery energy storage system (BESS) is used for backup energy supply
and storage. This stored energy is sold back to the grid when the electricity prices are
high. The 360 kW electric vehicle (EV) charger is present at the facility to charge the electric
lorries and trucks.

2.1. PV System
The PV system is distributed in three different areas in three buildings. The south
building is a facade configuration with 44 panels with 310 watts each, while the southeast
Energies 2024, 17, 3898 4 of 18

building is equipped with 96 modules with an 11° inclination and a roof-mounted config-
uration. Similarly, the northwest building is configured with 74 solar panels with an 11°
inclination towards the northwest. The PV system also contains three inverters to couple it
with the IMG. Based on the irradiance in the area, the anticipated PV energy production
throughout 2024 was calculated using PVSOL software (Version: PVSOL premium 2024
(R2)), and the results are displayed in Figure 2. Table 1 shows the general parameters of the
PV system.

Table 1. General PV system parameters.

Parameters Values
PV Generator Output 200.88 kWp
PV Generator Surface 1059.6 m²
Number of PV Modules 648
Number of Inverters 3
PV Module Used JAM60S01-310/PR
Speculated Annual Yield 87,594 kWh/kWp

Figure 2. Forecasted PV power generation throughout 2024.

2.2. Battery Energy Storage System


The BESS used is a 1.1 MWh container unit equipped with bidirectional inverters,
also called a Power Conversion System (PCS). It is outfitted with high-precision sensors
to monitor all its internal parameters such as temperature, humidity, voltage, and current,
and protect against overcharging, flooding, or fire. This is achieved using a series of
logical interlocks and a mix of hardware and software safeguards. The battery and inverter
specifications are given in Table 2.

Table 2. Table showing battery system specifications.

Parameters Values
Battery Type LPF Lithium-ion
Battery Capacity 1105 kWh
Rated Battery Voltage 768 Vdc
Battery Voltage Range 672–852 Vdc
Max. Charge/Discharge Current 186 A
Max. Charge/Discharge Power 1000 kW

An essential component of controlling the energy transfer between the battery storage
system and the electrical grid is the bidirectional inverter or Power Conversion System
(PCS). Its primary job is to charge the batteries by converting alternating current (AC) from
Energies 2024, 17, 3898 5 of 18

the IMG into direct current (DC), and vice versa. For applications such as peak shaving,
where excess energy is kept during low demand times and released at peak demand to
sustain grid operations, this bidirectional capability is essential.
The inverter or PCS system has the ability to operate in both grid-tied and off-grid
modes. This system is adaptable for a range of energy storage requirements since it can
handle broad battery voltage ranging from 600 V to 900 V, generate up to 500 kW of
nominal power, and support up to eight battery strings. It has an efficiency above 97%. For
efficient thermal management, the PCS unit uses forced air cooling, which ensures peak
performance, even at full load. The inverter specifications are displayed in Table 3.

Table 3. Table showing inverter specifications.

Parameters Values
Rated Voltage 400 V (L-L)
Rated Frequency 50/60 Hz
AC Connection 3W+N
Rated Power 2 × 500 kW
Rated Current Imax 2 × 721.7 A
Power Factor 0.8–1 (leading or lagging, load-dependent)

In addition to the main components, the system also contains other IoT devices, smart
meters, GPC (Grid Power Controller), etc. These devices function as a gateway to the
battery system so that it can be controlled with the help of software programming. They
operate on LINUX (Version: Ubuntu 22.04.4 LTS) and use the MODBUS TCP protocol [27]
for communication with local or remote servers and to send data to the cloud, as shown
in Figure 3.

IoT Device

Cloud and Local Devices

MODBUS
Communication
Data Flow and

PV
BESS

Smart Meter

Figure 3. Schematic of IoT device communication.

Figure 1 illustrates the four smart meters in the industrial microgrid, out of which
SM1 is a virtual smart meter while SM2, SM3, and SM4 are the physically present meters
connected to the loads and DERs. These smart meters measure apparent power, active
Energies 2024, 17, 3898 6 of 18

power, and reactive power using the true RMS value measurement (TRMS) up to the 63rd
harmonic in all four quadrants [28].

3. Energy Management System


The basic block diagram of the energy management system is shown in Figure 4.
It receives the measurements from the IMG, processes all the data, and uses different
optimization algorithms to produce energy dispatch commands that are sent back to the
IMG. These algorithms are explained in the following sections.

Figure 4. Overview of the energy management system.

3.1. Data Acquisition and Processing


The EMS development steps are shown in Figure 5. The first step to developing an
energy management system is to collect data from different components such as PV, battery
storage system, grid, etc. The data can be collected using various sources such as smart
meters, data loggers, a database or cloud system, or publicly available API services. The PV
irradiance data were taken from PVSOL simulation software. Other important data to be read
for the EMS development were the consumption data from the loads present at the industrial
site and the grid import. Since the area is primarily a manufacturing site, the majority of its
load or consumption is from heavy machines used for manufacturing. The load values and
grid import values were collected using the Phoenix Contact smart meter [28].

Figure 5. EMS development steps.

The energy price data were collected from the ‘www.hvakosterstrommen.no’ (accessed
on 15 February 2024) website [29]. This website provides an open and free API to retrieve
Norwegian electricity prices along with historical data. They collect the data from ENTSO-E
in euros and convert it to the local currency using the latest exchange rate [30]. ENTSO-E is a
transparency platform where data and information on electricity generation, transportation,
and consumption are centralized and published for the benefit of the whole European
market [30].
Energies 2024, 17, 3898 7 of 18

3.2. Data Analysis and Forecasting


The data analysis and forecasting part consisted of four main steps: data preparation
and feature engineering, model training, forecasting and adjustment, and compilation and
output. The initial step of this process was taking the historical data and arranging them in
a specific format, removing outliers and missing values, etc. The data were collected on an
hourly basis and were aggregated from different sources including PV production, battery
state-of-charge, grid power import, and site load values, as well as the hourly electricity
prices. The forecasting process begins with loading historical data, preparing time-based
features, and defining features and target variables. A Random Forest Regressor model [31]
is trained using these data.
The Random Forest Regressor is a meta estimator based on decision trees that em-
ploys averaging to increase prediction accuracy and manage over-fitting after fitting several
decision tree regressors on different subsamples of the dataset. The Random Forest structure
can be represented conceptually in Equation (1) as follows [32]:
B
1
f (X) =
B ∑ Tb (X; Θb ) (1)
b =1

where:
• f ( X ) is the prediction function of the Random Forest.
• B is the number of trees.
• Tb ( X; Θb ) represents a single decision tree indexed by b, which is a function of the
features, X, and random parameters, Θb .
Predictions are adjusted based on PV production before making a forecast for a specific
month. Finally, the results are compiled and saved, completing the process.
Figure 6 shows the graph from forecasted data. It shows the grid import (denoted by
blue line) and site load (denoted by green line) of the site. The grid import is negative as
time goes by because, following the month of March, there is more PV production and due
to this more energy is supplied to the grid.

Figure 6. Forecasted graph of grid import and site load.

3.3. Logic-Based Optimization


A logic-based optimization algorithm was developed to use as a benchmark, and the
flowchart of the algorithm is displayed in Figure 7. The energy price and battery SOC play
an important role in the optimization process. The system starts by measuring the following
important parameters: the power generated by the PV system (PPV ), the load/consumption
(Pload ), the cost of energy (Ecost ), the power imported from the grid (Gi ), and the initial state
of charge of the battery (SOCinit ). The viability of using stored energy vs. grid energy is
then evaluated based on economic factors, such as if the current cost of energy is less than
a predetermined minimum (Emin ).
Energies 2024, 17, 3898 8 of 18

Start

Measure Gi,Pload,
SOCinit, Ecost, PPV

Yes Is Ecost = Emax? No

Yes Is PPV No
>0? Hold
Yes Is SOC > SOCmin? No

Use PV (excess PV to Use Grid


battery)

Hold Discharge

No Yes
Is Ecost<=Emin?

No Is PLoad Yes Yes No


Is SOCinit < SOCmax?
or Gi >= Pthres?

Hold Charge

Use Grid (Hold No Yes


Is SOC>SOCmin?
Battery)

Hold
Discharge

Figure 7. Flowchart of the logic-based optimization algorithm.

The system will not charge the battery to save expensive energy expenses if the
cost is unfavorable and the battery SOC is below a maximum threshold (SOCmax ). Upon
reaching a certain power threshold (Pthres ), the system determines if it is necessary to use
the grid to satisfy energy requirements. The battery health is maintained by the system
maintaining the battery SOC above a minimum allowable level (SOCmin ). On the other
hand, the algorithm will discharge the battery if the SOC is above SOCmin . To maximize
both economic and energy efficiency, the system additionally incorporates some logic to
manage energy from the PV system and use it directly for the load or to charge the battery
with any excess generation.
For peak shaving, the algorithm uses an energy management technique called “dy-
namic peak shaving”, which is used to lower the greatest power demand or load in the
system throughout the day. By setting a peak shaving threshold, the power demand, or
grid import per hour, is kept below a certain level. This is accomplished using a battery
storage system to supplement the grid supply during times of high demand. Dynamic
peak shaving aims to minimize energy expenses, prevent peak demand charges, and lessen
the burden on the electrical system. The peak shaving threshold is dynamically determined
using the maximum load estimate for each day. This algorithm is intended to run on a
daily basis.
The given equations describe the battery’s charging and discharging operations. The
charging equation limits the charge added to the battery by either the maximum charge
rate or the remaining capacity adjusted for efficiency. Similarly, the discharging equation
limits the energy discharged by either the maximum discharge rate or the current storage
Energies 2024, 17, 3898 9 of 18

level adjusted for efficiency. These equations ensure the battery operates within its physical
and efficiency constraints, optimizing its performance.

Max Capacity − Battery Storage


Charget = min(Charge Rate, ) (2)
Efficiency
Battery Storage
Discharget = min(Discharge Rate, ) (3)
Efficiency
The algorithm determines the battery’s charge, discharge, or hold state each hour
based on site load and projected energy price. It charges the battery when prices are low,
ensuring it does not the exceed maximum SOC, and discharges when prices are high or
the site load surpasses the dynamic peak shaving level, maintaining the SOC above the
minimum. If neither condition is met, the battery remains in the “hold” state. The algorithm
adjusts the battery’s SOC and power output based on these decisions, ensuring the SOC
stays within operating limits and optimizing battery usage for cost and load needs. This
process preserves battery efficiency and lifespan while managing energy flow.

3.4. Reinforcement Learning Algorithm


The reinforcement learning algorithm was developed using the same parameters
to compare its output for cost saving with the results of the logic-based optimization.
The RL agent was specifically designed to minimize costs associated with energy and
peak load charges. It leverages a reinforcement learning (RL) algorithm [33], Proximal
Policy Optimization (PPO) [34], implemented through the Stable Baselines3 library. The
first step in developing the RL agent using the PPO algorithm was to build a custom
environment, which is built on the OpenAI Gymnasium framework, which is a standard
for developing and comparing reinforcement learning algorithms. This environment
simulates the microgrid and allows the agent to control the battery storage system. It
includes the battery charging, discharging, and holding, and defines a discrete action space
and a continuous observation space, where the state includes normalized values of the
forecasted site load, grid import, PV production, and battery SOC.
The Proximal Policy Optimization algorithm works by iteratively enhancing its policy
without introducing significant, harmful revisions. The clipped surrogate objective function,
which has the following mathematical expression shown in Equation (4), is the approach
used by the PPO algorithm to limit the undesirable policy changes [35].

LCLIP (θ ) = Êt min rt (θ ) Ât , clip(rt (θ ), 1 − ϵ, 1 + ϵ) Ât


 
(4)
where:
πθ ( at |st )
• rt (θ ) = πθ ( at |st )
is the probability ratio of the current policy, πθ , to the old
old
policy, πθold .
• Ât is an estimator of the advantage function at timestep t.
• ϵ is a small value (e.g., 0.1 or 0.2) that defines the clipping range to keep the updates
stable [35].
The objective of the RL agent is to identify the optimal strategy that reduces power
costs while respecting operational limitations such as battery SOC and capacity. The RL
agent is then trained for 50 million timesteps. The model gives feedback in the form of
rewards during this process, which are intended to motivate cost-cutting behaviors. For
example, the agent is rewarded when it takes advantage of cheap energy price hours to
charge, and minimizes grid usage by discharging during peak costs. It eventually learns
how to maximize battery utilization for cost optimization by iteratively improving its policy.
Figure 8 shows the general diagram of the PPO algorithm. The actions are evaluated
based on the rewards that are generated, to minimize costs and maximize efficiency, and
the system iteratively improves its decision-making strategy through continuous training
episodes. During training, the agent is used in a simulation to calculate the best course
Energies 2024, 17, 3898 10 of 18

of action (charge, discharge, or hold) at various points in time, given the site load, PV
production, grid import, and electricity price state inputs. Through a comparison of the
operational expenses with and without battery optimization, a reward signal is calculated
based on the performance of the RL agent. To quantify the economic advantages of strategic
battery management, the costs are computed using the agent’s actions and the current
power prices. After the action is carried out and the reward is assigned, the model updates
and enhances its internal policy by observing the reward and the altered condition of
the environment (next state). This cycle keeps going until an episode ends, which is the
achievement of a predetermined state or the conclusion of a series of states. The agent resets
and moves on to the next episode and keeps learning until the training session is finalized.
The RL agent is ultimately intended to learn a policy that reduces energy expenses and
earns as much profit as possible by selling the excess energy through these recurrent cycles.

Industrial Microgrid
Reinforcement
Environment
Learning Agent
Action
Policy Function BESS PV
Charge/Discharge/Hold

Grid
Reward
Episodic Training
EV
Load
Charger

Pload P PV

Energy
Pgrid
Price

Battery SOC

Observations
Figure 8. Workflow of PPO algorithm.

3.5. Grid Pricing Scheme


The pricing scheme of the main grid is taken from Nordpool, which is the Pan-European
power exchange market [36]. Two pricing schemes were tested, the normal pricing scheme
in which the hourly price is given from Nordpool data without any additional costs, and
the peak hour pricing scheme in which, in addition to the normal hourly price, there is a
penalty each month given for the highest power consumption in kW. The peak hour pricing
information is given in Table 4 where (NOK) stands for Norwegian Krone.

Table 4. Summary of the microgrid peak hour pricing scheme.

Peak hour pricing scheme (taken from the highest peak in the month)
Winter: November–March (84 NOK/kW/month)
Summer: April–October (35 NOK/kW/month)
Peak hour pricing scheme for reactive power (taken from the highest peak in the month)
Winter: November–March (35 NOK/kVAr/month)
Summer: April–October (15 NOK/kW/month)
Energies 2024, 17, 3898 11 of 18

Therefore, the total energy cost depends not only on the consumption in kWh but
also on the highest peak in kW per month as it will be added to the cost, as shown in
Equation (5).

Total Cost = EkWh × Price NOK/kWh + PeakkW × Peak Price NOK/kW/month (5)

To help illustrate this point, take for instance the two power profiles displayed in
Figure 9. Even though the total energy consumption is the same for both profiles (area
under the curve), the cost of the blue profile is higher than the red profile because of
the higher peak power consumption that results in additional penalties under the peak
pricing scheme.

Figure 9. An example illustrating the cost difference under the peak pricing scheme for two consump-
tion profiles with the same total consumed energy (area).

3.6. Simulation Approach


The simulation approach involved several steps. Initially, the data were acquired from
smart meters, PV production sources, and battery energy storage systems (BESSs). Next,
they were processed by removing outliers and handling missing values, standardizing the
data and implementing load forecasting. Subsequently, Phasor models and complex models
were developed using MATLAB-Simulink (Version: R2023b) to test the energy management
system (EMS). Additionally, a Python environment model based on the Markov Decision
Process (MDP) was created to train reinforcement learning (RL) agents. Hyperparameter
tuning and training took place within the Python environment. Finally, the RL agents were
evaluated by testing them both in the Python environment and through co-simulation with
MATLAB-Simulink. The overall process is summarized in Figure 10.
Energies 2024, 17, 3898 12 of 18

Acquire and Develop Simulink Phasor


Process the Data
Explore the Data and Complex Models

Test Agents on Python and Perform hyperparameter Develop Python


Train RL Agents
Simulink-Python Co-simulation Tuning Environment

Figure 10. Overview of the simulation approach and steps followed.

4. Results and Discussion


4.1. Battery Scheduling with Peak Shaving
The peak shaving algorithm is used to obtain an automatic battery charging and
discharging schedule. This schedule enables the EMS to control the BESS in an advanced
and organized way and it can be used to communicate with the EMS.
Figure 11 shows the grid import, site load, energy price, battery power, and SOC
for a day in July obtained from the logic-based algorithm for automatic scheduling with
peak shaving. Here, dynamic peak shaving logic is used to determine the peak shaving
value for each day. Based on the highest anticipated load for a particular day, the dynamic
peak shaving algorithm determines the threshold for controlling peak power consumption.
The highest anticipated electricity consumption for the day is captured by the variable
‘daily-max-load’. After that, the peak shaving threshold is dynamically adjusted using
this value. If the daily maximum load is 200 kW or more, the threshold is set at 150 kW;
if it is 150 kW or less, the threshold is set at 100 kW. In addition, 150 kW is the threshold
that is maintained for load projections that fall in between these ranges. Utilizing battery
storage to its full potential to minimize peak power prices, this approach enables a flexible
response to changing load circumstances.
It can be observed from Figure 11 that the battery charges when energy prices are low
during the day and discharges under two conditions. The first is when the consumption
or site load exceeds the threshold value and the second is when the energy prices are at
maximum for the day. It also checks that the SOC is not below 20% SOC.
This shows that the algorithm properly applies demand response by accurately using
the battery when the consumption exceeds the threshold value. After this, the battery
discharges, and the grid import is restricted to that value for that hour.
Energies 2024, 17, 3898 13 of 18

Grid Import and Site Load Forecast Energy Price (NOK per kWh)
Grid Import kW Forecast Energy Price NOK/kWh Forecast
Site Load kW Forecast
150
0.80
100
0.75

Values in NOK
Values in kW
50
0.70
0
0.65
50
0.60
00

03

06

09

12

15

18

21

00

00

03

06

09

12

15

18

21

00
-02

-02

-02

-02

-02

-02

-02

-02

-03

-02

-02

-02

-02

-02

-02

-02

-02

-03
07

07

07

07

07

07

07

07

07

07

07

07

07

07

07

07

07

07
Datetime Datetime
Battery Charged/Discharged kW Battery SOC (%)
200 Battery Charged/Discharged kW 100 Battery SOC (%)
150
90
100
50 80
Values in kW

SOC %
0
70
50
100 60
150 50
200
00

03

06

09

12

15

18

21

00

00

03

06

09

12

15

18

21

00
-02

-02

-02

-02

-02

-02

-02

-02

-03

-02

-02

-02

-02

-02

-02

-02

-02

-03
07

07

07

07

07

07

07

07

07

07

07

07

07

07

07

07

07

07
Datetime Datetime

Figure 11. Grid import, site load, energy price, battery power, and SOC for a day in July.

4.2. Results Comparison from Algorithms


The developed algorithms were tested and implemented with the hourly data of each
month from February 2024 to July 2024. The bar chart in Figure 12 shows the cost savings
achieved by the RL algorithm and logic-based optimization algorithm during the six-month
period. The RL approach (shown in red) consistently outperforms the logic-based approach
(shown in blue).

Figure 12. Monthly savings results comparison of both algorithms.


Energies 2024, 17, 3898 14 of 18

When comparing the cost reductions achieved by the RL optimization between Febru-
ary and July, the RL algorithm produced savings that were 20% greater on average than the
logic-based algorithm. This significant result highlights the powerful nature of RL as an
approach for optimization. This highlights a substantial improvement in cost efficiency,
achieved through a combination of peak shaving and price arbitrage, both dynamically
managed by reinforcement learning. Peak shaving minimizes penalties by reducing peak
load during high-cost periods, while price arbitrage optimizes energy costs by charging
the battery during low-cost periods and discharging during high-cost periods. The RL
algorithm enhances efficiency by continuously learning and adapting to energy price fluctu-
ations and load demand, ensuring optimal battery operation. These strategies collectively
contribute to the significant cost savings.

4.3. Economic Optimization Based on a Peak Pricing Scheme


The peak pricing scenario presents a complex optimization challenge for the rein-
forcement learning (RL) agent. In this scenario, the agent must not only manage power
exchanges with the main grid but also carefully control the peak power drawn from the
grid to avoid significant cost increases. This dual objective makes the optimization problem
more intricate than the normal pricing scheme. Two RL algorithms were employed to
tackle this challenge: Proximal Policy Optimization (PPO) and Twin Delayed Deep De-
terministic Policy Gradient (TD3). TD3, an enhanced version of the traditional DDPG,
incorporates three key improvements. First, it utilizes two Q-functions instead of one
(“twin”), then it updates Q-functions less frequently (“delayed”), and, finally, it smooths
actions by introducing random action noise [37].
The performance of both algorithms was evaluated through co-simulation with
MATLAB-Simulink and a Python-based mathematical model. Figure 13 shows the normal-
ized spot price, SOC, and battery power for both the TD3 and the PPO agent. The results
reveal distinct behavioral patterns for each algorithm:
TD3 Algorithm:
• Exhibits more aggressive behavior (higher peaks in Figure 14a,c).
• Responds quickly to price fluctuations.
• Discharges rapidly when prices start to increase (Figure 14d).
• Achieves lower positive peaks in grid power, indicating effective peak shaving (Figure 14c).
PPO Algorithm:
• Demonstrates less aggressive behavior (lower peaks in Figure 14a,c).
• Responds more smoothly to price fluctuations (Figure 14b).
• Focuses more on spot price trading rather than peak shaving.
Both algorithms show similar general trends, such as discharging during price peaks
(e.g., at 10 and 40 h) and charging during troughs (e.g., at 25 h), as illustrated in Figure 13.
However, TD3’s more aggressive approach seems to yield better overall performance,
particularly in managing peak power draw from the grid.
The financial flow in this peak pricing scheme can be summarized using the following
key points:
1. Spot price trading: both algorithms attempt to capitalize on price differentials by
charging when prices are low and discharging when prices are high.
2. Peak penalty avoidance: TD3, in particular, appears to prioritize reducing peak power
draw from the grid, which helps minimize the monthly peak penalty.
3. Battery utilization: the algorithms must balance the costs of battery degradation
against the potential savings from energy arbitrage and peak shaving.
4. Long-term vs. short-term optimization: the agents must weigh immediate gains from
spot price trading against long-term benefits of peak shaving.
The superior performance of TD3 can be attributed to its ability to better balance
these competing financial objectives. Its more aggressive behavior allows it to capitalize on
Energies 2024, 17, 3898 15 of 18

short-term price fluctuations while simultaneously managing long-term peak power costs.
However, it is important to note that this complex trade-off between spot price trading and
peak penalty management requires longer training sessions to optimize effectively. Future
research could focus on extending training periods and fine-tuning hyperparameters to
further improve performance in this challenging scenario.

SpotPrice SpotPrice
SOC SOC
0.81 Battery Power 0.81 Battery Power

0.6 0.6

0.4 0.4

Normalized Units
Normalized Units

0.2 0.2

0 0

−0.2 −0.2

−0.4 −0.4

−0.6 −0.6

−0.8 −0.8

−1 −1
5 10 15 20 25 30 35 40 45 5 10 15 20 25 30 35 40 45
Time (hrs) Time (hrs)

(a) (b)

Figure 13. Normalized results for the spot price, SOC, and battery. (a) Results with PPO; (b) results
with TD3.

60 80
Pbattery (TD3) Electricity Cost (TD3)
Pbattery (PPO) Electricity Cost (PPO)
70
40
60

50
20
40
Power (kW)

Price (NOK)

0 30

20
−20
10

0
−40
−10

−60 −20
Time (days) Week 2 Week 3 Week 4

(a) (b)
0.5 50
Pgrid (TD3) SOC (TD3)
Pgrid (PPO) SOC (PPO)
45

40
0
35
Power (pu)

SOC (%)

30

−0.5 25

20

15
−1
10
M y

ay
Sa ay

Sa ay
Tu y
ay

ay
ay

ay
ay

Th ay

Th ay
a
da

W sda
nd

nd
id

id
sd

rd

sd

rd
sd

sd

sd

Week 2 Week 3 Week 4


on
Fr

Fr
tu

tu
ur

ur
e
ne

e
ne
Su

Su
Tu

ed

ed
W

(c) (d)

Figure 14. Comparison of the Pbattery , electricity cost, Pgrid , and the SOC between TD3 and PPO.
(a) Comparison of Pbattery ; (b) comparison of the electricity cost; (c) comparison of Pgrid ; (d) compari-
son of the SOC.
Energies 2024, 17, 3898 16 of 18

5. Conclusions
In this paper, the optimization of an industrial microgrid using logic-based and RL-
based algorithms was performed. Load forecasting and simulation validation were carried
out, and two algorithms were benchmarked against one another. Notably, the RL algorithm
achieved an average monthly cost reduction of 20% compared with logic-based optimiza-
tion. The RL algorithm effectively manages battery energy storage systems (BESSs) by
dynamically adapting peak shaving logic to varying load projections. Battery charging
and discharging respond to energy prices and load conditions, ensuring efficient operation.
Future research directions include investigating scalability for larger microgrids, and testing
robustness under diverse scenarios.

Author Contributions: Conceptualization, S.U. and L.M.-P.; Methodology, S.U. and L.M.-P.; Software,
S.U. and I.A.; Formal analysis, I.A. and L.M.-P.; Writing—original draft, S.U. and I.A.; Writing—review
& editing, L.M.-P.; Supervision, L.M.-P. All authors have read and agreed to the published version of
the manuscript.
Funding: This work was supported in part by EEA and Norway Grants financed by Innovation
Norway in DOITSMARTER project, Ref. 2022/337335.
Data Availability Statement: The original contributions presented in the study are included in the
article, further inquiries can be directed to the corresponding author.
Conflicts of Interest: The authors declare no conflicts of interest.

Abbreviations
The following abbreviations are used in this manuscript:

BESS Battery energy storage system


EMS Energy management system
RES Renewable energy sources
PCS Power Conversion System
GPC Grid Power Controller
PPO Proximal Policy Optimization
TD3 Twin-Delayed Deep Deterministic Policy Gradient
DR Demand response
MG Microgrid
IMG Industrial microgrid
PV Photovoltaics
ESS Energy Storage System
DERs Distributed energy resources
RL Reinforcement learning
EV Electric vehicle
IoT Internet of Things
API Application Programming Interface
TCP Transmission Control Protocol

References
1. Department of Energy, Office of Electricity Delivery and Energy Reliability. Summary Report: 2012 DOE Microgrid Workshop.
2012. Available online: https://www.energy.gov/oe/articles/2012-doe-microgrid-workshop-summary-report-september-2012
(accessed on 24 May 2022).
2. Lu, R.; Bai, R.; Ding, Y.; Wei, M.; Jiang, J.; Sun, M.; Xiao, F.; Zhang, H.T. A hybrid deep learning-based online energy management
scheme for industrial microgrid. Appl. Energy 2021, 304, 117857. [CrossRef]
3. Wang, C.; Yan, J.; Marnay, C.; Djilali, N.; Dahlquist, E.; Wu, J.; Jia, H. Distributed Energy and Microgrids (DEM). Appl. Energy
2018, 210, 685–689. [CrossRef]
4. Brem, A.; Adrita, M.M.; O’Sullivan, D.T.; Bruton, K. Industrial smart and micro grid systems—A systematic mapping study.
J. Clean. Prod. 2020, 244, 118828. [CrossRef]
5. Mehta, R. A microgrid case study for ensuring reliable power for commercial and industrial sites. In Proceedings of the 2019 IEEE
PES GTD Grand International Conference and Exposition Asia (GTD Asia), Bangkok, Thailand, 19–23 March 2019; pp. 594–598.
Energies 2024, 17, 3898 17 of 18

6. Roslan, M.; Hannan, M.; Ker, P.J.; Begum, R.; Mahlia, T.I.; Dong, Z. Scheduling controller for microgrids energy management
system using optimization algorithm in achieving cost saving and emission reduction. Appl. Energy 2021, 292, 116883. [CrossRef]
7. Roslan, M.; Hannan, M.; Ker, P.J.; Uddin, M. Microgrid control methods toward achieving sustainable energy management. Appl.
Energy 2019, 240, 583–607. [CrossRef]
8. Pourmousavi, S.A.; Nehrir, M.H.; Colson, C.M.; Wang, C. Real-time energy management of a stand-alone hybrid wind-
microturbine energy system using particle swarm optimization. IEEE Trans. Sustain. Energy 2010, 1, 193–201. [CrossRef]
9. Marzband, M.; Sumper, A.; Ruiz-Alvarez, A.; Domínguez-García, J.L.; Tomoiagă, B. Experimental evaluation of a real time energy
management system for stand-alone microgrids in day-ahead markets. Appl. Energy 2013, 106, 365–376. [CrossRef]
10. Choobineh, M.; Mohagheghi, S. A multi-objective optimization framework for energy and asset management in an industrial
Microgrid. J. Clean. Prod. 2016, 139, 1326–1338. [CrossRef]
11. Ding, Y.M.; Hong, S.H.; Li, X.H. A demand response energy management scheme for industrial facilities in smart grid. IEEE
Trans. Ind. Inform. 2014, 10, 2257–2269. [CrossRef]
12. Gholian, A.; Mohsenian-Rad, H.; Hua, Y. Optimal industrial load control in smart grid. IEEE Trans. Smart Grid 2015, 7, 2305–2316.
[CrossRef]
13. Huang, X.; Hong, S.H.; Li, Y. Hour-ahead price based energy management scheme for industrial facilities. IEEE Trans. Ind. Inform.
2017, 13, 2886–2898. [CrossRef]
14. Youssef, T.A.; El Hariri, M.; Elsayed, A.T.; Mohammed, O.A. A DDS-based energy management framework for small microgrid
operation and control. IEEE Trans. Ind. Inform. 2017, 14, 958–968. [CrossRef]
15. Gutiérrez-Oliva, D.; Colmenar-Santos, A.; Rosales-Asensio, E. A review of the state of the art of industrial microgrids based on
renewable energy. Electronics 2022, 11, 1002. [CrossRef]
16. Correia, A.F.; Moura, P.; de Almeida, A.T. Technical and economic assessment of battery storage and vehicle-to-grid systems in
building microgrids. Energies 2022, 15, 8905. [CrossRef]
17. Hussain, A.; Bui, V.H.; Kim, H.M. Microgrids as a resilience resource and strategies used by microgrids for enhancing resilience.
Appl. Energy 2019, 240, 56–72. [CrossRef]
18. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018.
19. Arwa, E.O.; Folly, K.A. Reinforcement learning techniques for optimal power control in grid-connected microgrids: A compre-
hensive review. IEEE Access 2020, 8, 208992–209007. [CrossRef]
20. Mughees, N.; Jaffery, M.H.; Mughees, A.; Ansari, E.A.; Mughees, A. Reinforcement learning-based composite differential
evolution for integrated demand response scheme in industrial microgrids. Appl. Energy 2023, 342, 121150. [CrossRef]
21. François-Lavet, V.; Taralla, D.; Ernst, D.; Fonteneau, R. Deep reinforcement learning solutions for energy microgrids management.
In Proceedings of the European Workshop on Reinforcement Learning (EWRL 2016), Barcelona, Spain, 3–4 December 2016.
22. Chen, P.; Liu, M.; Chen, C.; Shang, X. A battery management strategy in microgrid for personalized customer requirements.
Energy 2019, 189, 116245. [CrossRef]
23. Nakabi, T.A.; Toivanen, P. Deep reinforcement learning for energy management in a microgrid with flexible demand. Sustain.
Energy Grids Netw. 2021, 25, 100413. [CrossRef]
24. Ji, Y.; Wang, J.; Xu, J.; Fang, X.; Zhang, H. Real-time energy management of a microgrid using deep reinforcement learning.
Energies 2019, 12, 2291. [CrossRef]
25. Lee, S.; Seon, J.; Sun, Y.G.; Kim, S.H.; Kyeong, C.; Kim, D.I.; Kim, J.Y. Novel architecture of energy management systems based on
deep reinforcement learning in microgrid. IEEE Trans. Smart Grid 2023, 15, 1646–1658. [CrossRef]
26. Ahmed, I.; Pedersen, A.; Mihet-Popa, L. Smart Microgrid Optimization using Deep Reinforcement Learning by utilizing the
Energy Storage Systems. In Proceedings of the 2024 4th International Conference on Smart Grid and Renewable Energy (SGRE),
Doha, Qatar, 8–10 January 2024; pp. 1–7.
27. ProSoft Technology. Introduction to Modbus TCP/IP; Acromag, Inc.: Wixom, MI, USA, 2024.
28. EEM-MA771—Measuring Instrument. 2024. Available online: https://www.phoenixcontact.com/en-no/products/measuring-
instrument-eem-ma771-2908286 (accessed on 10 March 2024)
29. Hva Koster Strommen. What Does Strømmen.no Cost? 2024. Available online: https://www.hvakosterstrommen.no/ (accessed
on 15 March 2024).
30. ENTSOE. Entso-e Transparency Platform. 2024. Available online: https://transparency.entsoe.eu/ (accessed on 13 March 2024).
31. RandomForestRegressor. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.
RandomForestRegressor.html (accessed on 30 May 2024).
32. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [CrossRef]
33. Amazon Web Services. What Is Reinforcement Learning? Available online: https://aws.amazon.com/what-is/reinforcement-
learning/ (accessed on 30 May 2024).
34. OpenAI. Proximal Policy Optimization. Available online: https://spinningup.openai.com/en/latest/algorithms/ppo.html
(accessed on 30 May 2024).
35. Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017,
arXiv:1707.06347.
Energies 2024, 17, 3898 18 of 18

36. Nordpool Market Data. Available online: https://www.nordpoolgroupqa.com/en/trading/Market-data1/Intraday/Market-


data1/Market-data1/Overview/ (accessed on 10 January 2023).
37. Fujimoto, S.; Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In Proceedings of the
International Conference on Machine Learning (PMLR), Stockholm, Sweden, 10–15 July 2018; pp. 1587–1596.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like