0% found this document useful (0 votes)
13 views

Data-Driven Model Predictive Control of DC-To-DC Buck-Boost Converter

This document presents a data-driven model predictive control (DDMPC) scheme for a DC-to-DC buck-boost converter to improve stability and reduce settling time. The DDMPC utilizes reinforcement learning, specifically the Proximal Policy Optimization algorithm, and is compared with traditional PI controllers, demonstrating superior performance in simulations. The study emphasizes the importance of adaptive control methodologies in optimizing renewable energy systems and provides source code for reproducibility.

Uploaded by

jathinanayadi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Data-Driven Model Predictive Control of DC-To-DC Buck-Boost Converter

This document presents a data-driven model predictive control (DDMPC) scheme for a DC-to-DC buck-boost converter to improve stability and reduce settling time. The DDMPC utilizes reinforcement learning, specifically the Proximal Policy Optimization algorithm, and is compared with traditional PI controllers, demonstrating superior performance in simulations. The study emphasizes the importance of adaptive control methodologies in optimizing renewable energy systems and provides source code for reproducibility.

Uploaded by

jathinanayadi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Received July 5, 2021, accepted July 14, 2021, date of publication July 19, 2021, date of current version

July 26, 2021.


Digital Object Identifier 10.1109/ACCESS.2021.3098169

Data-Driven Model Predictive Control of


DC-to-DC Buck-Boost Converter
KRUPA PRAG 1, MATTHEW WOOLWAY 2,3 , AND TURGAY CELIK 4,5 , (Member, IEEE)
1 School of Computer Science and Applied Mathematics, University of the Witwatersrand, Johannesburg 2000, South Africa
2 Department of Mathematics, Imperial College London, London SW7 2BX, U.K.
3 Faculty of Engineering and the Built Environment, University of Johannesburg, Johannesburg 2000, South Africa
4 School of Electrical and Information Engineering, University of the Witwatersrand, Johannesburg 2000, South Africa
5 Wits Institute of Data Science, University of the Witwatersrand, Johannesburg 2000, South Africa

Corresponding authors: Krupa Prag ([email protected]) and Turgay Celik ([email protected])

ABSTRACT The data-driven model predictive control (DDMPC) scheme is proposed to obtain fast
convergence to a desired reference and to be utilised to mitigate the destabilising effects that a DC-to-DC
buck-boost converter (BBC) with an active load experiences. The DDMPC strategy uses the observed state
to derive an optimal control policy using a reinforcement learning (RL) algorithm. The employed Proximal
Policy Optimisation (PPO) algorithm’s performance is benchmarked against the PI controller. From the
simulated results obtained using the MATLAB Simulink solver, the most robust methods for short settling
time and stability were the hybrid methods. These methods take advantage of the short settling time provided
by the PPO algorithm and the stability provided by the PI controller or the filtering mechanism over the
transient time. The source code for this study is available on GitHub to support reproducible research in
industrial electronics society.

INDEX TERMS Adaptive control, data-driven model predictive control, DC-to-DC buck-boost converter,
proximal policy optimisation, reinforcement learning.

I. INTRODUCTION Given the promise of DDMPC, it is introduced to address


The popularity of the data-driven model predictive con- challenges in controlling a DC-to-DC buck-boost converter.
trol (DDMPC) scheme has recently increased due to it being This paper seeks to investigate the DDMPC framework’s
naturally suitable for achieving the objectives in model pre- performance relative to the steady-state’s settling time and the
dictive control (MPC), which can handle non-linear system ability to maintain the steady-state in a DC-to-DC buck-boost
dynamics and hard constraints, whilst taking performance converter.
criterion into account. An attractive characteristic of DDMPC In recent times, DC power systems have predominantly
is that an accurate model of the plant is dispensable as it been chosen over AC power systems in various applications,
instead utilises the plant’s observational data to learn an given their reliability and quality. Most modern electronic
optimal policy and make informed predictive decisions using loads are DC in nature as DC has been a standard choice
the control feedback mechanism [1]. In contrast, model-based for microgrid (MG) designs [3]–[6]. Furthermore, there has
predictive control schemes require accurate modelling of the been considerable attention drawn to the generation of power
physical model of the considered plant using first principles, from renewable energy resources. These systems demand
which may either be infeasible or even if these models are advance control schemes to fully tap their potential of either
available, they may be intractable for controller designs due extracting the maximum power by adjusting the load to match
to their complexity. For the DDMPC scheme, the functions of the voltage of the source, such as maximum power point
plant modelling, control design and the optimisation thereof tracking [7], [8], or to maintain a constant supply of power to
are all encapsulated through learning from the observational a passive load [9], [10]. As a result, the study of a buck-boost
data received from the plant [2]. converter, a type of DC-to-DC converter that regulates the
voltage from a source to a load, has gained traction. The load
The associate editor coordinating the review of this manuscript and of the buck-boost converter may either be passive or active.
approving it for publication was Zheng H. Zhu . These converters are analogous to step-up, and step-down

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
101902 VOLUME 9, 2021
K. Prag et al.: Data-Driven Model Predictive Control of DC-to-DC BBC

transformers as the desired output voltage are less than or


greater than the input or source voltage.
A buck converter steps down the voltage from the source
to the load; hence, the magnitude of the output voltage is
less than that of the source voltage. A boost converter steps
up the voltage from the source to the load; consequently,
the magnitude of the output voltage is greater than that of the
FIGURE 1. DC-to-DC buck-boost converter circuit.
source voltage. A buck-boost converter has an output voltage
that is either less than or greater than the source voltage in
magnitude. II. BACKGROUND AND RELATED WORK
A transistor switch, an inductor, and a smoothing capacitor This section presents the DC-to-DC buck-boost converter’s
are connected within the basic buck-boost converter circuit to formulation details and the general feedback control mech-
smooth out switching noise into regulated DC voltages. The anism in conjunction with traditional PID controller and
potential of DC-to-DC buck-boost converters is compromised DDMPC systems, including an overview of the evolution
by the destabilising effects on the circuit, resulting in severe of feedback control systems from MPC to DDMPC, and
voltage and frequency oscillations [4], [11], [12]. The reason discussed RL based controllers. A review of related work on
for this instability is caused by the existence of a limit cycle applying controller designs to mitigate voltage instability in
in the switch model. The DC-to-DC buck-boost converter’s a buck-boost converter imposed by CPL and a passive load
formulation details are given in Section II-A. with a fixed resistance is given.
In the panorama of past to present literature, the buck-boost
converter with a passive or a constant power supply (CPL)
A. DC-TO-DC BUCK-BOOST CONVERTER
has been a predominant topic of study compared to that of
A diagram of a DC-to-DC buck-boost circuit comprising of
the buck-boost converter with an active or variable power
a MOSFET switch, a diode, an inductor L and an output
load (VPL). The motivation behind this study is to support
capacitor C, and a load R is shown in Fig. 1. The inductor
the research of optimising the use of renewable sources of
and capacitor’s parallel configuration in the circuit acts as
energy for the production of electrical energy in an attempt to
a second-order low pass filter reducing the voltage ripple
save fossil fuel energy resources which effectively translate
at the output. The corresponding descriptions of the DC-
models such as maximum point tracking to buck-boost con-
to-DC buck-boost converter circuit components are tabulated
verters with a dynamic load or for schemes in general which
in Table 1.
have a constant output voltage with a fixed resistance load
[8], [9], [13]. Furthermore, in response to the exhibited poor
TABLE 1. Description of DC-to-DC buck-boost converter circuit
instability in DC-to-DC buck-boost converters, the current components.
work’s objective is to develop an adaptive control method-
ology to mitigate voltage instability and reach the desired
voltage with a minimum settling time. Model-based strategies
may be limited in effectively handling any uncertainties faced
in practical applications; hence model-independent schemes
are considered. Data-driven techniques use the state observa-
tion from a state-feedback control scheme to determine the
actuation signal to be applied to the MOSFET Switch using
controllers such as proportional integral derivative (PID) and
various reinforcement learning (RL) techniques to obtain an
optimal policy. This work seeks to investigate the perfor- The output voltage in the buck-boost converter is regulated
mance of the DDMPC schemes in terms of stability and the using pulse width modulation (PWM) pulses, which are given
length of the settling time to reach the reference voltage for a to the gate of the MOSFET switch. The switch-mode of the
DC-to-DC buck-boost controller with an active load. circuit affects the indirect transfer of energy between the
The paper is organised as follows. Section II, the formula- inductor and the output capacitor. The desired output voltage
tion of the DC-to-DC buck-boost converter is presented, and of a buck-boost converter is adjustable based on the duty
an overview of feedback control schemes and control meth- cycle D of the switching transistor [14]. The duty cycle is
ods are given with reference to the DC-to-DC buck-boost the ratio between the pulse width, the elapsed time between
converter. Section III discusses the techniques applied in the rising and falling edges of a pulse, and the total period
investigating the aims of the paper. The experimental results of a rectangular waveform. The control input of a buck-boost
section, Section IV, details the experimental procedure and converter is bounded by zero and one, which are the on and
settings used in conducting the experiments, followed by off states of the MOSFET switch in the circuit.
the results and result analysis. Section V is the concluding The measured output voltage Vout over the load in a circuit,
section. with an inverting converter topology, is of reversed polarity to

VOLUME 9, 2021 101903


K. Prag et al.: Data-Driven Model Predictive Control of DC-to-DC BBC

that of the input or source voltage Vin due to how the inductor where A1 , B1 and C1 are the system matrices, x the state
discharges charge [15]. The output voltage Vout over the load variable, ẋ the derivative of the state variable, and y the output
in a buck-boost converter is defined as follows: which are obtained using Eqns. (2)-(3) and defined as:
 
D −Ron + rL
Vout = − · Vin , (1) 0 
1−D  L ,
A1 = 
where Vin is the input or source voltage and D is the duty ratio.
 1 
0 −
Comparing buck and boost mode, it is found that the duty RC
 T
ratio D is greater in boost mode than that in buck mode. 1 T
, x = iL vc ,

B1 = 0
In boost mode, the switch’s on-state is on for a longer duration L
than when in buck mode, thus storing more energy in the C1 = 0 1 , y = Vout = vc .
 
(5)
inductor, which prevents a rapid change in current to be
passed to the capacitor. The output voltage is increased when In the consecutive cycles, where the MOSFET is in the on-
enough energy is built up in the inductor and transferred to state, the capacitor then supplies energy to the load [17]. The
the capacitor. state-space representation for when the MOSFET switch is in
In the buck-boost converter circuit, the flow of charge the on-state is given by:
in the circuit is determined by the MOSFET switch state, ẋ = A2 x + B2 Vin ,
and the diode controls the direction of the flow of charge.
When the MOSFET switch in the buck-boost converter is in y = C2 x, (6)
the on-state in the initial cycle, the circuit is closed. In this where A2 , B2 and C2 are the system matrices, x the state
state, current flows to only the inductor as the input voltage variable, ẋ the derivative of the state variable, and y the
source is directly connected to the inductor, and the diode output, which are respectively defined as follows:
prevents current from flowing to the output of the circuit as  
the diode is reversed biased. Furthermore, while the circuit is rL 1
− RC L 
closed and the MOSFET switch is in the on-state, the induc- A2 =  ,
 1 1 
tor accumulates charge and stores energy in the form of a − −
magnetic field. When the MOSFET switch is in the off-state, C RC
T
the diode will allow current to flow from the inductor to the

1 T
, x = iL vc ,

rest of the components of the circuit [16]. While in this state, B2 = 0
L
the inductor’s polarity is reversed, and the diode is forward C2 = 0 1 , y = Vout = vc .
 
(7)
biased. The inductor provides the energy stored and works
as the source, allowing current to flow from the inductor to The average matrices A, B and C obtained using Eqn. (5)
the capacitor and the load. In this state, the capacitor now and Eqn. (6) are given by:
accumulates charge and stores energy.
A = DA1 + (1 − D)A2
When the MOSFET switch is in the off-state, the inductor 
D(Ron + rL ) (1 − D)rL (1 − D)

experiences a sudden drop in current, thus inducing a volt-
− −
1  , (8)
age to the output. In this state, the change in the inductor’s = L RC L 
(1 − D)
current iL : − −
C RC
diL Vin Ron + rL B = DB1 + (1 − D)B2 = DB1 , (9)
= − iL , (2)
dt L L C = DC1 + (1 − D)C2 = C1 = C2 , (10)
is the difference between the input voltage Vin , and the prod- The steady-state is given by:
uct of the inductor’s current iL and the sum of the resistance
when the switch is on Ron and the resistance of the inductor X = −A−1 BVin
DVin
 
rL , which is divided by the inductance L [17]. The change in
the capacitor’s voltage vc :  (1 − D)rL L 
 (1 − D)2 R + D(Ron + rL ) + 
= RC  ,(11)
 
dvc vc
=− , (3) 
− D(1−D)Vin 
dt RC
 D(Ron + rL ) (1 − D)rL L 
+ +(1−D)2
is the capacitor’s voltage, and the reciprocal of the product of R R2 C
the resistance R and the capacitance C. which is the product of the average state variables, inverse of
The state-space representation of the system in the off-state A and B, and the input voltage Vin .
is given by: The transfer function G from the input voltage Vin to the
output capacitor vc [17] is given by:
ẋ = A1 x + B1 Vin , D(D−1)
GVin (s) = L , (12)
y = C1 x, (4) LCs2 + R s+(1−D)2

101904 VOLUME 9, 2021


K. Prag et al.: Data-Driven Model Predictive Control of DC-to-DC BBC

which is comprises of the duty ratio D, inductance L, capac-


itance C and resistance of the load R.
The following description can summarise the DC-to-DC
buck-boost converter’s operations: the inductor gets charged
through the voltage source, and the capacitor powers the load.
Hence the supply of energy to the load remains uninterrupted
irrespective of the state of the MOSFET switch [16].

B. FEEDBACK CONTROL SYSTEMS


A feedback loop is a powerful tool used in control sys-
tems. It considers the plant’s output and enables the system
to iteratively adjust the input into the system to meet the
desired output response. A simple feedback loop is illustrated
in Fig. 2. Sensors are used to measure the plant’s current
state st , at time t. The controller is then fed the current state st FIGURE 3. Generic PID process controller.

and the error value et , which is the difference between the


current state and the reference state. Using this information, steady-state error. This error correction is based on the
the controller determines the actuation at to be applied to the present steady-state error.
plant. The actuation applied updates the state of the plant. • Integral gain Ki : The proportional gain Kp may cause
oscillation from quick reactions. The integral gain Ki
increases the control signal with respect to the past
accumulation of the steady-state error.
• Derivative gain Kd : The derivative gain Kd adds the
ability to anticipate future error. Considering the rate
of error change, if this change in error increases, this
term would add damping to the system to prevent it from
FIGURE 2. Feedback control system.
overshooting. This term does not affect the steady-state
Control systems that follow the basic feedback control error.
structure are PID controller, MPC systems and DDMPC
systems. D. MODEL PREDICTIVE CONTROL
MPC technique is an advanced feedback control algorithm
C. PROPORTIONAL INTEGRAL DERIVATIVE CONTROL that uses a model of a plant to forecast behaviours by solving
PID controller technology employs a feedback control loop an online optimisation problem to select the most suitable
mechanism to reduce the effect that disturbances have on the control action, such that the plant being acted upon is driven
system, steer the plant towards the desired state and creates towards the target. The MPC model is given both the target
well-defined relations between variables in the system [18]. of the system and the mathematical model of the plant. The
A PID controller takes the error at time t, et , as an input. number of predicted future steps or the time window into the
The error is the difference between the measured and the future, over which control actions are predicted, is known as
reference value. The output returned by the PID controller the prediction horizon. Control actions are computed using a
is the actuation, at , which is the action to be applied to control optimisation algorithm to solve an open-loop optimi-
the plant or considered system. The control signal or the sation problem over a prediction horizon [19].
actuation is equal to the sum of either all or some of the A summary of the MPC model design is shown in Fig. 4,
following three terms, where some terms may be zero-valued: which represents an iterative process of updating the calcu-
the proportional gain Kp multiplied by the magnitude of the lated control actions over a prediction horizon. The Reference
error; the integral gain Ki multiplied by the integral of the or target, and the predicted control actions are the Inputs
error; the derivative gain Kd multiplied by the derivative to the Plant. Due to Disturbances caused by independent
of the error. The generic PID controller shown in Fig. 3 is variables, the system may not behave as expected. As a
given by: result, the updated state of the physical system, Output, are
Z t compared to the model representation of the system, Dynamic
det
at = Kp et + Ki et dt + Kd . (13) Model. The difference between the model and the actual plant
0 dt is then used to update the calculated control actions to be
The PID controller returns a control signal at , which is the applied by the MPC Controller. This process is repeated
sum of the P, I , and D terms. The characteristics of these multiple times to get the system acted upon to behave as
terms are: described by the Reference state.
• Proportional gain Kp : The control signal proportion- A challenge associated with solving MPC problems is
ally increases with respect to the error to reduce the that it is dependent on the accuracy of the physical system

VOLUME 9, 2021 101905


K. Prag et al.: Data-Driven Model Predictive Control of DC-to-DC BBC

1) REINFORCEMENT LEARNING FOR CONTROL SYSTEMS


RL is a model-free framework that can be used to solve
optimal control problems. As per the general feedback control
structure described, the controller receives feedback from
the plant in the form of a state signal and takes action in
response. Similarly, the decision rule is a state feedback
control law called the policy in RL [27]. The applied actu-
ation changes the system’s state, and the latest transition to
the updated state is evaluated using a reward function. The
objective of the optimal control is to maximise the cumulative
reward from each initial state. Given that this is a sequential
decision-making process, the problem becomes to maximise
FIGURE 4. Overview of the model predictive control system.
the system’s long-term performance.
There exist several powerful model-free RL algorithms.
This paper particularly considers a policy gradient method,
realisation (model) of industrial systems that generally have proximal policy optimisation (PPO) algorithm, which
multiple degrees of freedom [20], [21]. Another challenge directly optimises policy parameters from the observed
impeding the performance of MPC systems pertains to the data [27].
optimal controller, the limitation of the number of recorded
measurements of the observable state and the constrained
number of actuations available. Given the explosion of data F. CONTROLLER FOR THE DC-TO-DC BUCK-BOOST
obtained from high fidelity simulations, the accessibility of CONVERTER
hardware and faster processing computers that can now be The challenge of mitigating voltage instability imposed by
obtained at affordable prices, DDMPC models have been a CPL or a passive load with a constant resistance on the
proposed to replace multidimensional model representations DC MGs has been studied and documented in the literature.
and improve the performance of controllers [22]. Control techniques to overcome this challenge have evolved
from model-based schemes to model-independent schemes
E. DATA-DRIVEN MODEL PREDICTIVE CONTROL with controller systems ranging from traditional state feed-
DDMPC uses the observation or measured data of the plant back controllers to, more recently, using machine intelligence
with intelligent learning techniques to improve characterising techniques.
the system model, learning the control policy, determining a Common power converter controllers for buck-boost
combination of sensor placement, and the array of actuations converters with CPL include state-feedback controllers
to be made available. [28], [29], proportional-integral-derivative (PID) control
DDMPC is a particular extension of the MPC method [30]–[32], model predictive control (MPC) [33]–[35] and
that has gained traction, given its efficiency in formulating sliding model control (SMC) [36].
stochastic MPC problems and autonomously improving a Recent work on voltage regularisation and stabilisation
repetitive task’s performance. This removes the unrealistic using intelligent controllers include using quadratic D-stable
expectation of curating a perfect model of a physical system fuzzy controller [37], fuzzy-PID [38], machine learning and
that incorporates both the system’s complex dynamic charac- reinforcement learning techniques such as Deep Determinis-
teristics and encapsulates disturbances and uncertainty in the tic Policy Gradient (DDPG) [39], Proximal Policy Optimi-
model through the cumbersome process of anticipating and sation (PPO) and Ultralocal Model (UML) control [40], and
incorporating a discrete number disturbance scenario-based deep reinforcement learning (DRL) such as Markov Decision
models. Process (MDP), deep Q network (DQN) algorithms [41] and
The proposed method by DDMPC is to use data, recent deep deterministic policy gradient based UML [42].
results from past iterations, to improve both safety and per-
formance of the autonomous system by using historical data
to represent disturbance variables, that is, the information III. METHODOLOGY
being propagated through the dynamic system. Two particular The general framework of the DDMPC scheme, which uses
challenges faced by controllers that can be addressed using the feedback mechanism as its base structure, takes the
historical data are ensuring recursive feasibility and obtaining observed voltage read over the load or the resistance as the
optimality despite a short prediction horizon and satisfying feedback signal. The determined control action is the state
input state constraints in the presence of uncertainty. The util- of the MOSFET switch in the DC-to-DC BBC converter.
isation of historical data in overcoming challenges associated The series of control signals sent to the MOSFET switch
with solving the stochastic MPC [23]–[26]. determines how quickly and efficiently the BBC converges to
The DDMPC scheme learns the optimal control policy the desired output voltage. In this paper, the performance of a
using RL is discussed in Section II-E1. data-driven PI controller is compared to the DDMPC scheme.

101906 VOLUME 9, 2021


K. Prag et al.: Data-Driven Model Predictive Control of DC-to-DC BBC

The DDMPC scheme’s RL policy considered is the PPO action a in the action space when in this state. However,
algorithm. Furthermore, two hybrid cases are considered. this function does not measure how good an action is
compared to the other available actions; hence, the critic
A. PROPORTIONAL INTEGRAL CONTROL network is employed to critique the actions returned by
The details of a general PID controller are discussed in the actor network.
Section II-C. For the DC-to-DC buck-boost converter, a PI • The critic takes the observed state st and the action at
controller is used to eliminate the steady-state error et and returned by the actor network as an input and returns the
reduce the forward gain. The pulse generator generates rect- discounted long-term reward’s corresponding expecta-
angular wave pulses in a duty cycle. The integrator integrates tion. The critic network is trained to predict the value
the proportional gain over the current time step. This value function shown by Eqn. (17), which measures how good
is then subtracted from the output voltage value and then fed it is to be in a specific state st .
into a relay function, allowing its output to switch between The actor-critic network aims to maximise the surrogate
two states. The relay function compares the input to a thresh- objective function:
old value to determine which corresponding actuation output h  i
the controller should return. A summary of the PI controller L (θ) = Êt min rt (θ) Ât , clip (rt (θ) , 1 − , 1+) Ât ,
is given in Fig. 5. (14)
which is an expectation function of the advantage function by
Â, policy parameters θ, and the probability ratio rt (θ) which
is defined as:
πθ (at |st )
rt (θ) = , (15)
πθold (at |st )
which is the ratio between the current policy and the policy
based on past experiences. The comprehensive definition
FIGURE 5. PI controller for DC-to-DC buck-boost converter. of the probability ratio is: a ratio between the probability
taking an action a when in state s at time t, given the policy
The applied PI controller’s design uses only the output parameters πθ , and the probability of taking action a when in
voltage value from the DC-to-DC buck-boost converter and state s at time t using the past or old policy parameters θold
does not consider the dynamics of the model, hence making from the previous epoch.
this a model-free PI controller implementation. The general algorithmic structure of the PPO algorithm
[40] is as follows:
B. PROXIMAL POLICY OPTIMISATION ALGORITHM 1) For the PPO algorithm, the first step is to initialise the
PPO algorithm is a model-free, online, or on-policy RL parameters of the actor-critic network.
method employed in the DDMPC scheme. The algorithm 2) The next step is to generate N experiences:
entails using small batches of experiences from interacting
with the environment to update the decision making policy. {st1 , at1 , rt1 }, {st2 , at2 , rt2 }, . . . , {stN , atN , rtN },
Iteratively, once the policy is updated, past experiences are
the sequence of experiences consist of a tuple of the
discarded, and a new batch is generated to update the policy.
state-action pair and their corresponding reward value.
The PPO algorithm is a class of policy gradient training
3) Calculate the action-value function and the advantage
methods that try to reduce the gradient estimations’ vari-
function for each time instance t.
ance towards better policies, causing consistent progress and
ensuring that the policy does not drastically change from the • For each instance, the action-value function and

previous policy or go down irrecoverable paths [40]. the advantage functions are computed at each time
The PPO algorithm alternates between sampling data step t. The action-value function is defined as the
through interacting with the environment and optimising a expected return of starting at state s and taking
clipped surrogate objective function which employs stochas- action a following the policy π is given by:
tic gradient ascent [43]. The stability of training the agent Qπ (s, a) =
X
Eπθ [R (st , at ) |s, a] , (16)
is improved by utilising a clipped surrogate objective func- t
tion and limiting the size of the policy change at each
iteration [44]. where the function is the sum of the expected
The PPO algorithm maintains two function approximators, rewards given the corresponding state-action pair.
the actor and the critic networks. • The value function is the expected return of how
good it is to be in a particular state, is shown by:
• The actor network maps action choices directly to the
observed state. At any particular time, t, the actor takes V π (s) =
X
Eπθ [R (st , at ) |s] , (17)
the observed state s and returns the probability of taking t

VOLUME 9, 2021 101907


K. Prag et al.: Data-Driven Model Predictive Control of DC-to-DC BBC

where the function is the sum of the expected and then employs the PI controller, as discussed in Section III-
rewards given the state. The Advantage function, A, to determine the actions to be applied to the buck-boost
given by: converter.
Aπ (s, a) = Qπ (s, a) − V π (s) , (18) 2) HYBRID II
is the difference between the action-value function This hybrid approach conditionally utilises a filtering mech-
Q and the value function V . anism with the PPO algorithm. The PPO algorithms are
4) Over K epochs, learn from the mini-batch experiences. solely used to determine the actions to be applied until the
reference voltage is reached, then the filtering mechanism is
• Randomly sample a set of M experiences to form
used to filter the pulse signal dictated by the PPO algorithm.
part of the mini-batch which is used to estimate the
The filter sends a 0 pulse to the buck-boost converter if the
gradient.
stipulated conditional statement is violated, else applies the
• The critic network’s parameters can be updated
signal dictated by the PPO algorithm. The reason for applying
using critic loss function Lc :
a 0 pulse if the measured output voltage is greater than the
M
1 X π 2 reference voltage is that when the MOSFET switch is open,
Lc (θv ) = Q (s, a) − V (s|θv ) , (19) the inductor dissipates current to the capacitor, which powers
M
t=1 the fixed load, thus reducing the stored energy in the inductor
which minimises the loss over the sampled mini- and capacitor.
batch.
• The actor network’s parameters are updated by: IV. RESULTS
The DC-to-DC buck-boost converter model’s performance
M
1 Xh  with an active load or fixed resistance, detailed in
La (θv ) = − min rt (θ )Ât , Section II-A, is analysed using the different applied control
M
t=1
techniques, which are discussed in Section III. The proce-
clip (rt (θ), 1 − , 1 + ) Ât ] , (20) dure of experimentally testing the applied control techniques
outlined in Section IV-A, the corresponding experimental
which minimises the loss La over the sampled mini-
results, quantitative results and result analysis are reported
batch.
for three different cases in Section IV-C. The three cases are
5) Repeat steps (2) through to (4) until the terminating
for the different reference voltages used in the model, which
criterion is met.
are 30V , 80V and 110V .
Sampling actions train the PPO based RL agent according
to the updated stochastic policy; hence it is considered a A. EXPERIMENTAL PROCEDURE
stochastic policy trained in an on-policy manner. During the
The setup of the DC-to-DC buck-boost converter and the
initial stage of training, the state-action space is explored
procedure followed to experimentally compare the perfor-
through randomly selecting actions. As policy training pro-
mance of the four applied control methods are described in
gresses, the policy becomes less random, and the update rule
this section.
exploits actions found to yield higher rewards.
During training, the PPO agent estimates and associ-
1) DC-TO-DC BUCK-BOOST CONVERTER
ated probabilities of taking each action in the action space.
An action is randomly selected based on the probability The model of a DC-to-DC buck-boost converter with a pas-
distribution over actions. The actor and critic properties are sive load used is as per Fig. 1. The corresponding parameters
updated after training over multiple epochs, using mini- of the circuit are tabulated in Table 2.
batches, as the PPO agent interacts with the environment.
TABLE 2. DC-to-DC buck-boost converter circuit parameters.
The PPO agent aims to train the coefficient of the actor-critic
neural networks’ coefficient to reduce the error e between the
desired output Vref with the actual value Vout .

C. HYBRID APPROACHES
The hybrid approach uses the PPO algorithm, discussed in
Section III-B, with either a PI controller or a filter which are
discussed in Section III-C1 and Section III-C2, respectively.

1) HYBRID I
This hybrid approach applies the PPO algorithm until a stop- The model was constructed using the computation engine
ping condition is met, which is when the output voltage value in MATLAB R and Simulink R R2021a. The motivation for
is greater than or equal to the reference voltage in magnitude, using the simulated model rather than a state-space model is

101908 VOLUME 9, 2021


K. Prag et al.: Data-Driven Model Predictive Control of DC-to-DC BBC

to take advantage of the native matrix computation engine in which is the difference between the reference voltage value
MATLAB/Simulink [15]. and the magnitude of the output voltage value. The absolute
value of the measured output voltage is used in both the
2) PI CONTROLLER reward and error value calculation; as the output voltage is
The PI controller uses the output voltage from the buck-boost reversed in polarity to that of the input voltage, as discussed
converter and the reference voltage value to determine the in Section II-A.
action signal pulse to be applied to the model. Details of At each sample time step t, a vector representing the state
the PI controller are given in Section III-A, and the corre- st is constructed. The PPO agent measures and calculates the
sponding parameters used are detailed in this section. The following parameters of the DC-to-DC buck-boost converter
PI controller’s corresponding parameters were selected after model which forms the state st : the output voltage Vout ,
performing a grid search. The corresponding optimised PI the error value et and change in error de dt , thus the state vector
controller parameters utilised in the experiments are tabulated is represented as st = {Vout , et , de
dt }. Eqn. (22) shows how the
in Table 3. The MATLAB/Simulink solver used for the PI change in value error is calculated:
controller is ODE23 stiff/TR-BDF2). de et−1 − et
= . (22)
dt t − (t − 1)
TABLE 3. PI Controller’s parameters for the DC-to-DC buck-boost
converter. The training of the PPO RL agent uses a fixed number of
sample steps T unless the termination criterion is met. Should
the output voltage value exceed Vref is greater than the upper
bound uB , the training for that episode is terminated. During
the training of the PPO agent, at each sample time step t,
the PPO RL agent takes the current state st and the awarded
reward value rt as inputs. The reward function is given in
Algorithm 1.
3) PPO
The DDMPC scheme’s RL based controller employs the PPO Algorithm 1 PPO Reward Function
algorithm. The corresponding details of the PPO algorithm Input: Vref , Vout , et , et−1 , , Vref Reached
are delineated in Section III-B. The PPO RL agent parameters 1: uB = Vref · (1 + ) F Upper Bound
are consistent for the three different reference voltage cases; 2: lB = Vref · (1 − ) F Lower Bound
30V , 80V and 110V . The duration of the simulation was 0.3s, 3: if (Vref Reached == True & Vout < lB) or (Vout > uB)
with a sample time of 1E−5s. then
The actor-critic network architecture is built using three 4: rt = −1
completely connected hidden layers for both the actor net- 5: else
work and the critic network. Each of these hidden layers is 6:
1
rt = abs(e
built using 256 neurons. The non-linear mapping function t)
7: end if
used in both these networks is a rectified linear unit (ReLU).
The output layer of the actor network employs a softmax
activation function. The parameters of the PPO algorithm and
4) HYBRID I
the neural networks are tabulated in Table 4.
This hybrid approach uses the PPO RL agent with a PI
TABLE 4. PPO parameters for the DC-to-DC buck-boost converter. controller, as described in Section III-C1. The parameters
used for the RL agent and the PI controller are given
in Section IV-A2 and Section IV-A3, respectively. The PI
controller is implemented to determine the action to be
applied to the buck-boost converter once the magnitude of
the absolute voltage exceeds that of the reference voltage,
abs(Vout ) > Vref .

5) HYBRID II
This hybrid approach uses the PPO RL agent with filter,
as described in Section III-C2. The parameters used for the
RL agent are given in Section IV-A3. The filter mecha-
nism is applied if the following condition is violated: if the
absolute output voltage is greater than the reference voltage
In the PPO implementation, the error value calculated at
abs(Vout ) > Vref .
each sample time instance t is given by:
The simulation time for the PI controller is 3s and, for
et = Vref − abs(Vout ), (21) the PPO, Hybrid I and Hybrid II , each of the conducted

VOLUME 9, 2021 101909


K. Prag et al.: Data-Driven Model Predictive Control of DC-to-DC BBC

experiments are simulated for 0.3s with a fixed sample time


of 1E−5s. The corresponding Simulink models and code
utilised can be found on GitHub. All experiments were con-
ducted using an AMD RYZEN 3770x @3.6GHz CPU.

B. SETTLING TIME
The settling time is the time elapsed from the instantaneous
step to when the outputs of the considered dynamical control
system remain within a specified error range. The error range
used is 2% of the reference voltage value. Thus, the  value
used in the reward function, Algorithm 1 is 0.02. In Fig. 6 two
time periods of interest for a dynamical control system are
highlighted; these are settling time and transient time. All the
responses or the observed states of a control system from the
end of the simulation duration to the first data point value that
does not fall between the error band or the range of accepted
values with respect to a reference value make up the transient
time. The time taken to reach the transient time is known as
the settling time.

FIGURE 6. Response time of dynamical control system.

C. EXPERIMENTAL RESULTS
The experimental results for the four different control tech-
niques applied are presented in this section, followed by the
analysis of the results obtained.
The results presented in Table 5, record the settling time
and the error values for both the entire duration of the
simulation time and for the transient duration after the set- FIGURE 7. Buck-Boost converter output voltage for the employed control
techniques for reference voltages: (a) 30V , (b) 80V , (c) 110V .
tling time period. For each of the attributes in the results
table, the average of the validation experiment values is
recorded. The relationship between time and the output voltage by
The lowest obtained value for each corresponding quan- the buck-boost converter when employing various controllers
titative measures is highlighted for the respected reference are presented for the three different reference voltage values,
voltage cases. respectively, in Fig. 7.
The quantitative measurements used to evaluate the applied
control techniques’ performance are the real elapsed time, 1) PI CONTROLLER
settling time, MSE, mean absolute error (MAE) and integral The PI controller’s settling time is proportional to the magni-
absolute error (IAE), the standard deviation σ is calculated tude of the reference voltage. It is observed that the average
for each of these attributes, respectively. These results are settling time is greater than that of the respective reference
tabulated in Table 5. values for all three cases. The PI controller can be seen as a

101910 VOLUME 9, 2021


K. Prag et al.: Data-Driven Model Predictive Control of DC-to-DC BBC

TABLE 5. Average quantitative measurements of the applied control techniques for the DC-to-DC buck-boost converter using Algorithm 1.

stable control technique, based on the numerical output volt- lower absolute output voltage than the reference voltage;
age values, the plots illustrating the relationship between time this features for the boost mode instances using this hybrid
and the output voltage, and the MSE, MAE, and IAE values technique, which was seen for the vanilla PI control method.
for the transient time. A disadvantage of the PI controller is Taking advantage of the short settling time provided PPO and
that the settling time is the longest compared to the other the stability provided by the PI controller, the error values
employed control techniques. he PI controller stabilises the are significantly smaller than that of both the individual
output voltage at a value less than that of the desired voltage; implementations of the PPO and PI controller for the 30V
hence it does not necessarily guarantee the lowest error values case; for both the entire simulation duration and for that after
over the transient time. the settling time, making it a robust method for when the
converter is used in buck mode.
2) PPO
The PPO algorithm is found to have the shortest settling 4) HYBRID II
time for when the converter is used for the boost cases. The Combining the PPO algorithm with a filter mechanism in this
variance and standard deviation values for the corresponding hybrid approach, it has been found that this method’s vari-
error values are considered when discussing stability. It is ance and standard deviation values for corresponding error
found that these values, for the entire simulation duration, values are generally less than that of the PPO algorithm and
are lower than that when the PI or hybrid approaches are Hybrid I for the boost cases. This indicates that this approach
used for the boost cases, which can be attributed to the short is the most robust control method with respect to stability,
settling time for these cases. However, if only transient time as the mean settling time values, output voltage value and
is considered, the PPO quantitative measurements are not the corresponding quantitative tabulated values substantiate the
lowest indicating that this is not the most robust method in performance of this control technique.
terms of stability. From the relationship between time and
output voltage illustrated in Fig 7, it can be seen that the 5) REWARD FUNCTION
PI controller resembles a parabolic decay rate, whilst for the In [40] the reward function used is the same as Algorithm 1,
PPO algorithm, exponential decay is seen for the settling time whilst in [39] Algorithm 2, both these methods, respectively,
duration. have been applied without the conditional statements to sim-
It is highlighted that the PPO algorithm does not always ilar DC-DC converters.
converge to the desired reference voltage. Hence a 100% of The results obtained using this alternative reward function
the experiments do not fall within the settling time, as seen for the PPO algorithm is tabulated in Table 6.
for the case when the converter is used for a buck case. The results of the applied PPO algorithm to the DC-to-DC
buck-boost converter using the reward function defined in
3) HYBRID I Algorithm 1 are tabulated in Table 5, and the results when
This hybrid approach uses the PPO algorithm and employs Algorithm 2 is used is tabulated in Table 6. From these results
the PI controller to determine the actions to be applied to it is found that the PPO used with the reward function defined
the buck-boost converter once the magnitude of the output in Algorithm 1, has lower settling time and quantitative mea-
voltage reaches or exceeds that of the desired voltage. The surements in comparison to when Algorithm 2 is used, hence
shortcoming of the PI controller is that it stabilises to a was the reward function employed.

VOLUME 9, 2021 101911


K. Prag et al.: Data-Driven Model Predictive Control of DC-to-DC BBC

Algorithm 2 PPO Updated Reward Function TABLE 7. Quantitative measurements of the PI control techniques with
AWGN for the DC-to-DC buck-boost converter with reference voltages:
Input: Vref , Vout , et , et−1 , , Vref Reached Table (7a) 30 V , Table (7b) 80 V and Table (8c) 110 V .
1: uB = Vref · (1 + ) F Upper Bound
2: lB = Vref · (1 − ) F Lower Bound
3: if (Vref Reached == True & Vout < lB) or (Vout > uB)
then
4: rt = −1
5: else
1
6: rt = abs(e 2
t)
7: end if

TABLE 6. Average quantitative measurements of the applied control


techniques for the DC-to-DC buck-boost converter using Algorithm 2.

6) SENSITIVITY TO NOISE
The robustness of the applied controllers to the BBC can
be evaluated based on the controller’s performance when
experiencing noise. Additive White Gaussian Noise (AWGN)
is an added linear noise model applied to the transmitted
signal, which has a uniform power across the frequency band
for the output signal and has a Gaussian distribution with
respect to time. The AWGN channel model is represented by
the outputs Hk at discreet time-steps k. The sum of the input
Fk and noise Gk is the value of Hk :
Hk = Fk + Gk , (23)
To compare the performance of the controllers an AWGN
where Gk is independently and identically distributed from with a range of SNRs have been applied to the output mea-
a zero-mean normal distribution with variance N, that is sured voltage of the DC-to-DC BBC. Table 7 records the
Gk ∼ N (0, N). AWGN is added to the transmitted sig- quantitative measurement when applying the PI controller
nal to measure and compare the controller’s performance with the AWGN model.
when experiencing such an impairment. The measurement Comparing the results in Table 7 to Table 5, it can be seen
parameter, signal-to-noise ratio (SNR), compares the power that as the SNR decreases, both the average error and settling
of the desired information signal to the power of the undesired time increases, as a result of the noise signal increasing. The
signal or background noise, which is denoted as: settling time is used to decide the threshold of SNR of the
Psignal PI controller. The settling time of the PI controller without
SNR = . (24) AWGN remains unchanged for when AWGN with SNR is
Pnoise
101912 VOLUME 9, 2021
K. Prag et al.: Data-Driven Model Predictive Control of DC-to-DC BBC

TABLE 8. Average quantitative measurements of the applied control Thus the results indicate a bias towards the signal value, this
techniques with AWGN for the DC-to-DC buck-boost converter with
reference voltages: Table (8a) 30V , Table (8b) 80V and Table (8c) 110V . is a result of the error band being calculated relative to the ref-
erence voltage value, the cases with a lower reference voltage
are more sensitive to noise. Comparing the performance of the
applied controllers with the AWGN model, it can be seen that
the PPO and Hybrid II controllers are the most robust when
considering their quantitative measurements, particularly the
percentage value of episodes that converged and the error
values for the settled time duration.
In summary, applying the DDMPC scheme with the PPO
algorithm has been found to have a shorter settling time for
the boost cases and Hybrid I compared to the PI controller.
However, it is found that the hybrid approaches are the most
robust in the light of both settling time and stability, as they
take advantage of the short settling time provided by the PPO
algorithm and the stability ensured by the PI controller and
the filtering mechanism. Given that the literature does not
document the performance of buck-boost converters with an
active load or VPL, there is no direct comparison to previ-
ous work in this regard. However, considering similar work
[40] where the buck-boost converter has a CPL, and the PI
controller is tuned using the PPO algorithm, comparing the
error values and the inferred settling time, it is found that the
hybrid approaches’ performance is comparable. Furthermore,
from observing the impact of the reward function, we find
that investigating the impact of the employed reward function
and optimising the reward function does hold promise in
improving the quality of the results found using the DDMPC
techniques. With respect to the robustness of the controllers
when experiencing noise, the PPO and Hybrid II controllers
were found to be most robust.

V. CONCLUSION
The popularity of renewable energy plants and the increasing
number of electronic applications, which are DC in nature,
make the study of DC-to-DC buck-boost converters with
active loads nascent. The buck-boost converter converts an
input voltage to the desired lower reference output voltage
when in buck mode, and to the desired reference with a
greater output voltage magnitude, in boost mode. The quality
of these converters is based on the settling time to reach the
reference voltage and the ability of the controller to maintain
a constant output voltage. The impact of the reward function
on controllers using the PPO algorithm opens up interesting
lines of follow-up research for future development, as well as
applying and testing the robustness of the discussed control
greater than or equal to 30 dB is applied. However, when the methods on physical BBC prototype.
SNR is set to 25 dB, the settling time exceeds that of the PI DDMPC techniques have been considered to improve the
controller with no AWGN. quality of these converters. The applied control techniques’
Table 8 records the results of the PPO controller with performance to the buck-boost converter was evaluated based
AWGN when applied to a BBC for respective reference volt- on the applied control technique’s short settling time and sta-
ages. When the Hybrid I controller with AWGN of 25 dB bility. The PI controller’s performance was used as a bench-
SNR was applied to a BBC with a reference voltage of 30V , mark to compare the performance of the vanilla DDMPC
the results obtained indicated that the simulation was termi- technique using the PPO algorithm. The PPO algorithm was
nated as the stopping conditions described in Algorithm 1 found to provide a short settling time to reach the reference
were met before reaching the full simulation duration of 0.3s. voltage and outperformed the PI controller in this respect.

VOLUME 9, 2021 101913


K. Prag et al.: Data-Driven Model Predictive Control of DC-to-DC BBC

Taking advantage of the short settling time of the PPO method [19] E. F. Camacho and C. Bordons, ‘‘Nonlinear model predictive control:
and the stability provided by the PI controller and the filtering An introductory review,’’ in Assessment and Future Directions of Nonlin-
ear Model Predictive Control. Berlin, Germany: Springer, 2007, pp. 1–16.
mechanism, merit was found in the hybrid techniques as their [20] S.-K. Kim, C. R. Park, J.-S. Kim, and Y. I. Lee, ‘‘A stabilizing model
performance surpass that of the PI controller with respect to predictive controller for voltage regulation of a DC/DC boost converter,’’
settling time and the PPO algorithm with respect stability. IEEE Trans. Control Syst. Technol., vol. 22, no. 5, pp. 2016–2023,
Jan. 2014.
Furthermore, the PPO and Hybrid II controllers were found to [21] S. Bououden, O. Hazil, S. Filali, and M. Chadli, ‘‘Modelling and model
perform comparatively to the controllers without noise when predictive control of a DC–DC boost converter,’’ in Proc. 15th Int. Conf.
AWGN was applied to the feedback signal, thus in general Sci. Techn. Automat. Control Comput. Eng. (STA), Dec. 2014, pp. 643–648.
[22] S. L. Brunton and J. N. Kutz, Data-Driven Science and Engineering:
the PPO and Hybrid II controllers have merit with respect to Machine Learning, Dynamical Systems, and Control. Cambridge, U.K.:
short settling time, stability and sensativity to noise. Cambridge Univ. Press, 2019.
[23] G. C. Calafiore and L. Fagiano, ‘‘Stochastic model predictive control
of LPV systems via scenario optimization,’’ Automatica, vol. 49, no. 6,
REFERENCES
pp. 1861–1866, 2013.
[1] J. Berberich, J. Kohler, M. A. Muller, and F. Allgower, ‘‘Data-driven model [24] G. Schildbach, L. Fagiano, C. Frei, and M. Morari, ‘‘The scenario approach
predictive control with stability and robustness guarantees,’’ IEEE Trans. for stochastic model predictive control with bounds on closed-loop con-
Autom. Control, vol. 66, no. 4, pp. 1702–1717, Apr. 2021. straint violations,’’ Automatica, vol. 50, no. 12, pp. 3009–3018, 2014.
[2] Z. Hou, H. Gao, and F. Lewis, ‘‘Data-driven control and learning systems,’’ [25] S. Grammatico, X. Zhang, K. Margellos, P. Goulart, and J. Lygeros,
IEEE Trans. Ind. Electron., vol. 64, no. 5, pp. 4070–4075, May 2017. ‘‘A scenario approach for non-convex control design,’’ IEEE Trans. Autom.
[3] D. Marx, P. Magne, B. Nahid-Mobarakeh, S. Pierfederici, and B. Davat, Control, vol. 61, no. 2, pp. 334–345, Feb. 2016.
‘‘Large signal stability analysis tools in DC power systems with constant [26] M. Lorenzen, F. Dabbene, R. Tempo, and F. Allgöwer, ‘‘Stochastic
power loads and variable power loads—A review,’’ IEEE Trans. Power MPC with offline uncertainty sampling,’’ Automatica, vol. 81, no. 1,
Electron., vol. 27, no. 4, pp. 1773–1787, Apr. 2012. pp. 176–183, 2017.
[4] S. R. Huddy and J. D. Skufca, ‘‘Amplitude death solutions for stabilization
[27] L. Buşoniu, T. de Bruin, D. Tolić, J. Kober, and I. Palunko, ‘‘Reinforcement
of DC microgrids with instantaneous constant-power loads,’’ IEEE Trans.
learning for control: Performance, stability, and deep approximators,’’
Power Electron., vol. 28, no. 1, pp. 247–253, Jan. 2013.
Annu. Rev. Control, vol. 46, pp. 8–28, Jan. 2018.
[5] S. Singh, N. Rathore, and D. Fulwani, ‘‘Mitigation of negative impedance
[28] Z. Wu, J. Zhao, and J. Zhang, ‘‘Cascade PID control of buck-boost-
instabilities in a DC/DC buck-boost converter with composite load,’’
type DC/DC power converters,’’ in Proc. 6th World Congr. Intell. Control
J. Power Electron., vol. 16, no. 3, pp. 1046–1055, May 2016.
Automat., vol. 2, 2006, pp. 8467–8471.
[6] Q. Xu, C. Zhang, C. Wen, and P. Wang, ‘‘A novel composite nonlinear
[29] M. A. A. Mohamed, Q. Guan, and M. Rashed, ‘‘Control of DC–DC
controller for stabilization of constant power load in DC microgrid,’’ IEEE
converter for interfacing supercapcitors energy storage to DC micro grids,’’
Trans. Smart Grid, vol. 10, no. 1, pp. 752–761, Jan. 2019.
in Proc. IEEE Int. Conf. Electr. Syst. Aircr., Railway, Ship Propuls. Road
[7] V. C. Kotak and P. Tyagi, ‘‘DC to DC converter in maximum power point
Vehicles Int. Transp. Electrific. Conf. (ESARS-ITEC), Nov. 2018, pp. 1–8.
tracker,’’ Int. J. Adv. Res. Electr., Electron. Instrum. Eng., vol. 3297, no. 12,
pp. 6115–6125, 2007. [Online]. Available: https://www.ijareeie.com [30] R. Sumita and T. Sato, ‘‘PID control method using predicted output voltage
for digitally controlled DC/DC converter,’’ in Proc. 1st Int. Conf. Electr.,
[8] R. F. Coelho, F. Concer, and D. C. Martins, ‘‘A study of the basic DC–DC
Control Instrum. Eng. (ICECIE), Nov. 2019, pp. 1–7.
converters applied in maximum power point tracking,’’ in Proc. Brazilian
Power Electron. Conf., Sep. 2009, pp. 673–678. [31] R. D. Bhagiya and R. M. Patel, ‘‘PWM based double loop PI control of a
[9] J. M. Carrasco, L. G. Franquelo, J. T. Bialasiewicz, E. Galván, bidirectional DC–DC converter in a standalone PV/battery DC power sys-
R. C. P. Guisado, M. N. M. Prats, J. I. León, and N. Moreno-Alfonso, tem,’’ in Proc. IEEE 16th India Council Int. Conf. (INDICON), Dec. 2019,
‘‘Power-electronic systems for the grid integration of renewable energy pp. 1–4.
sources: A survey,’’ IEEE Trans. Ind. Electron., vol. 53, no. 4, [32] T. Kobaku, R. Jeyasenthil, S. Sahoo, R. Ramchand, and T. Dragicevic,
pp. 1002–1016, Jun. 2006. ‘‘Quantitative feedback design-based robust PID control of voltage mode
[10] O. Ibrahim, N. Z. Yahaya, and N. Saad, ‘‘State-space modelling and controlled DC–DC boost converter,’’ IEEE Trans. Circuits Syst. II, Exp.
digital controller design for DC–DC converter,’’ Telkomnika, Telecommun. Briefs, vol. 68, no. 1, pp. 286–290, Jan. 2021.
Comput. Electron. Control, vol. 14, no. 2, pp. 497–506, 2016. [33] Q. Xu, Y. Yan, C. Zhang, T. Dragicevic, and F. Blaabjerg, ‘‘An offset-free
[11] A. Kwasinski and C. N. Onwuchekwa, ‘‘Dynamic behavior and stabiliza- composite model predictive control strategy for DC/DC buck converter
tion of DC microgrids with instantaneous constant-power loads,’’ IEEE feeding constant power loads,’’ IEEE Trans. Power Electron., vol. 35, no. 5,
Trans. Power Electron., vol. 26, no. 3, pp. 822–834, Mar. 2011. pp. 5331–5342, May 2020.
[12] Z. Zhang, D. Zhang, and R. C. Qiu, ‘‘Deep reinforcement learning for [34] N. Boutchich, A. Moufid, N. Bennis, and S. E. Hani, ‘‘A constrained
power system applications: An overview,’’ CSEE J. Power Energy Syst., MPC approach applied to buck DC–DC converter for greenhouse powered
vol. 6, no. 1, pp. 213–225, 2019. by photovoltaic source,’’ in Proc. Int. Conf. Electr. Inf. Technol. (ICEIT),
[13] K. Osmani, A. Haddad, T. Lemenand, B. Castanier, and M. Ramadan, Mar. 2020, pp. 1–6.
‘‘An investigation on maximum power extraction algorithms from PV sys- [35] Z. Zhou, L. Zhang, Z. Liu, Q. Chen, R. Long, and H. Su, ‘‘Model predictive
tems with corresponding DC–DC converters,’’ Energy, vol. 224, Jun. 2021, control for the receiving-side DC–DC converter of dynamic wireless power
Art. no. 120092, doi: 10.1016/j.energy.2021.120092. transfer,’’ IEEE Trans. Power Electron., vol. 35, no. 9, pp. 8985–8997,
[14] K. Rouzbehi, A. Miranian, J. M. Escaño, E. Rakhshani, N. Shariati, and Sep. 2020.
E. Pouresmaeil, ‘‘A data-driven based voltage control strategy for DC–DC [36] J. Fan, S. Li, J. Wang, and Z. Wang, ‘‘A GPI based sliding mode control
converters: Application to DC microgrid,’’ Electronics, vol. 8, no. 5, p. 493, method for boost DC–DC converter,’’ in Proc. IEEE Int. Conf. Ind. Technol.
Apr. 2019. (ICIT), Mar. 2016, pp. 1826–1831.
[15] R. H. G. Tan and L. Y. H. Hoo, ‘‘DC–DC converter modeling and simu- [37] M. M. Mardani, N. Vafamand, M. H. Khooban, T. Dragičević, and
lation using state space approach,’’ in Proc. IEEE Conf. Energy Convers. F. Blaabjerg, ‘‘Design of quadratic d-stable fuzzy controller for DC micro-
(CENCON), Oct. 2015, pp. 42–47. grids with multiple CPLs,’’ IEEE Trans. Ind. Electron., vol. 66, no. 6,
[16] S. Arora, P. T. Balsara, and D. K. Bhatia, ‘‘Effect of sampling time and pp. 4805–4812, Jun. 2019.
sampling instant on the frequency response of a boost converter,’’ in [38] R. F. Bastos, C. R. Aguiar, A. F. Q. Gonçalves, and R. Q. Machado,
Proc. 42nd Annu. Conf. IEEE Ind. Electron. Soc. (IECON), Oct. 2016, ‘‘An intelligent control system used to improve energy production from
pp. 7155–7160. alternative sources with DC/DC integration,’’ IEEE Trans. Smart Grid,
[17] X. Zhou and Q. He, ‘‘Modeling and simulation of buck-boost converter vol. 5, no. 5, pp. 2486–2495, Sep. 2014.
with voltage feedback control,’’ in Proc. MATEC Web Conf., vol. 31, 2015, [39] M. Gheisarnejad, H. Farsizadeh, M.-R. Tavana, and M. H. Khooban,
pp. 5–9. ‘‘A novel deep learning controller for DC–DC buck-boost converters in
[18] K. J. Astrom and T. Hägglund, ‘‘Advanced PID control,’’ IEEE Control wireless power transfer feeding CPLs,’’ IEEE Trans. Ind. Electron., vol. 68,
Syst., vol. 26, no. 1, pp. 98–101, Feb. 2006. no. 7, pp. 6379–6384, Jul. 2021.

101914 VOLUME 9, 2021


K. Prag et al.: Data-Driven Model Predictive Control of DC-to-DC BBC

[40] M. Hajihosseini, M. Andalibi, M. Gheisarnejad, H. Farsizadeh, and MATTHEW WOOLWAY received the Ph.D.
M.-H. Khooban, ‘‘DC/DC power converter control-based deep machine degree from the University of the Witwatersrand,
learning techniques: Real-time implementation,’’ IEEE Trans. Power Elec- Johannesburg, South Africa, in 2018. He is cur-
tron., vol. 35, no. 10, pp. 9971–9977, Oct. 2020. rently a Teaching Fellow in applied mathemat-
[41] C. Cui, N. Yan, and C. Zhang, ‘‘An intelligent control strategy for buck DC– ics with the Department of Mathematics, Imperial
DC converter via deep reinforcement learning,’’ 2020, arXiv:2008.04542. College London, and a Research Associate with
[Online]. Available: http://arxiv.org/abs/2008.04542 the Faculty of Engineering and the Built Environ-
[42] M. Gheisarnejad, H. Farsizadeh, M.-R. Tavana, and M. H. Khooban,
ment, University of Johannesburg. His research
‘‘A novel deep learning controller for DC/DC buck-boost converters in
interests include computational intelligence, arti-
wireless power transfer feeding CPLs,’’ IEEE Trans. Ind. Electron., vol. 68,
no. 7, pp. 6379–6384, Jul. 2021. ficial intelligence, and optimization.
[43] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, ‘‘Prox-
imal policy optimization algorithms,’’ 2017, arXiv:1707.06347. [Online].
Available: http://arxiv.org/abs/1707.06347
[44] B. Liu, Q. Cai, Z. Yang, and Z. Wang, ‘‘Neural proximal/trust region policy
optimization attains globally optimal policy,’’ 2019, arXiv:1906.10306.
[Online]. Available: http://arxiv.org/abs/1906.10306
TURGAY CELIK (Member, IEEE) received
the second Ph.D. degree from the University of
Warwick, Coventry, U.K., in 2011. He is currently
a Professor of digital transformation and the Direc-
KRUPA PRAG is currently a postgraduate stu- tor of the Wits Institute of Data Science, Univer-
dent at the University of the Witwatersrand, sity of the Witwatersrand, Johannesburg, South
Johannesburg, South Africa. She is an Associate Africa. His research interests include signal and
Lecturer with the School of Computer Science image processing, computer vision, machine intel-
and Applied Mathematics, University of the ligence, robotics, data science and engineering,
Witwatersrand. Her research interests include opti- and remote sensing. He is an Associate Editor of
mization, optimal control theory, and computa- the ELL (IET), IEEE ACCESS, IEEE GEOSCIENCE AND REMOTE SENSING LETTERS
tional intelligence. (GRSL), IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS
AND REMOTE SENSING (JSTARS), and SIVP (Springer).

VOLUME 9, 2021 101915

You might also like