Artificial Neural Networks Trained Through Deep Reinforcement Learning Discover Control Strategies For Active Flow Control
Artificial Neural Networks Trained Through Deep Reinforcement Learning Discover Control Strategies For Active Flow Control
We present the first application of an artificial neural network trained through a deep
reinforcement learning agent to perform active flow control. It is shown that, in a
two-dimensional simulation of the Kármán vortex street at moderate Reynolds number
(Re = 100), our artificial neural network is able to learn an active control strategy from
experimenting with the mass flow rates of two jets on the sides of a cylinder. By
interacting with the unsteady wake, the artificial neural network successfully stabilizes
the vortex alley and reduces drag by approximately 8 %. This is performed while using
small mass flow rates for the actuation, of the order of 0.5 % of the mass flow rate
intersecting the cylinder cross-section once a new pseudo-periodic shedding regime
is found. This opens the way to a new class of methods for performing active flow
control.
Key words: control theory, drag reduction
1. Introduction
Drag reduction and flow control are techniques of critical interest for industry
(Brunton & Noack 2015). For example, 20 % of all energy losses on modern
heavy-duty vehicles are due to aerodynamic drag (of which a large part is due
to flow separation on tractor pillars (see Vernet et al. 2014)), and drag is naturally
the main source of energy losses for an airplane. Drag is also a phenomenon that
penalizes animals, and Nature shows examples of drag mitigation techniques. It is,
for example, thought that structures of the skin of fast-swimming sharks interact with
the turbulent boundary layer around the animal, and reduce drag by as much as 9 %
(Dean & Bhushan 2010). This is therefore a proof-of-existence that flow control can
be achieved with benefits, and is worth aiming for.
https://doi.org/10.1017/jfm.2019.62
In the past, much research has been carried out towards so-called passive drag
reduction methods, for example using micro vortex generators for passive control of
transition to turbulence (Fransson et al. 2006; Shahinfar et al. 2012). While it should
be underlined that this technique is very different from the one used by sharks
drag can also be achieved. Another way to obtain drag reduction is by applying an
active control to the flow. A number of techniques can be used in active drag control
and have been proven effective in several experiments, a typical example being to use
small jets (Schoppa & Hussain 1998; Glezer 2011). Interestingly, it has been shown
that effective separation control can be achieved with even quite weak actuation, as
long as it is used in an efficient way (Schoppa & Hussain 1998). This underlines
the need to develop techniques that can effectively control a complex actuation input
into a flow, in order to reduce drag.
Unfortunately, designing active flow control strategies is a complex endeavour
(Duriez, Brunton & Noack 2016). Given a set of point measurements of the flow
pressure or velocity around an object, there is no easy way to find a strategy to
use this information in order to perform active control and reduce drag. The high
dimensionality and computational cost of the solution domain (set by the complexity
and nonlinearity inherent to fluid mechanics) mean that analytical solutions and
real-time predictive simulations (that would decide which control to use by simulating
several control scenarios in real time) seem out of reach. Despite the considerable
efforts put in to the theory of flow control, and the use of a variety of analytical
and semi-analytical techniques (Barbagallo, Sipp & Schmid 2009; Barbagallo et al.
2012; Sipp & Schmid 2016), bottom-up approaches based on an analysis of the
flow equations face considerable difficulties when attempting to design flow control
techniques. A consequence of these challenges is the simplicity of the control
strategies used in most published works about active flow control, which traditionally
focus on either harmonic or constant control input (Schoppa & Hussain 1998).
Therefore, there is a need to develop efficient control methods, that perform complex
active control and take full advantage of actuation possibilities. Indeed, it seems that,
as of today, the actuation possibilities are large, but only simplistic (and probably
suboptimal) control strategies are implemented. To the knowledge of the authors, only
a few published examples of successful complex active control strategies are available
with respect to the importance and extent of the field (Pastoor et al. 2008; Erdmann
et al. 2011; Gautier et al. 2015; Guéniat, Mathelin & Hussaini 2016; Li et al. 2017).
In the present work, we aim at introducing for the first time deep neural networks
and reinforcement learning to the field of active flow control. Deep neural networks
are revolutionizing large fields of research, such as image analysis (Krizhevsky,
Sutskever & Hinton 2012), speech recognition (Schmidhuber 2015) and optimal
control (Mnih et al. 2015; Duan et al. 2016). Those methods have surpassed previous
algorithms in all these examples, including methods such as genetic programming, in
terms of complexity of the tasks learned and learning speed. It has been speculated
that deep neural networks will bring advances also to fluid mechanics (Kutz 2017),
but to date those have been limited to a few applications, such as the definition
of reduced-order models (Wang et al. 2018), the effective control of swimmers
(Verma, Novati & Koumoutsakos 2018) or performing particle image velocimetry
(PIV) (Rabault, Kolaas & Jensen 2017). As deep neural networks, together with the
https://doi.org/10.1017/jfm.2019.62
dimensionality). The PPO agent manages to control the jets and to interact with the
unsteady wake to reduce the drag. We have chosen to release all our code as open
source, to help trigger interest in those methods and facilitate further developments.
In the following, we first present the simulation environment, before giving details
about the network and reinforcement framework, and finally we offer an overview of
the results obtained.
284 J. Rabault, M. Kuchta, A. Jensen, U. Réglade and N. Cerardi
2 2.00
1 1.25
Pressure
Downloaded from https://www.cambridge.org/core. Hanyang University Seoul Campus, on 15 Jan 2020 at 07:44:30, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
y 0 0.50
-1 -0.25
-1.00
-2
0 5 10 15 20
x
F IGURE 1. (Colour online) Unsteady non-dimensional pressure wake behind the cylinder
after flow initialization without active control. The location of the velocity probes is
indicated by the black dots. The location of the control jets is indicated by the red dots.
2. Methodology
2.1. Simulation environment
The PPO agent performs active flow control in a 2D simulation environment. In
the following, all quantities are considered non-dimensionalized. The geometry of
the simulation, adapted from the 2D test case of well-known benchmarks (Schäfer
et al. 1996), consists of a cylinder of non-dimensional diameter D = 1 immersed in
a box of total non-dimensional length L = 22 (along the X-axis) and height H = 4.1
(along the Y-axis). Similarly to the benchmark of Schäfer et al. (1996), the cylinder
is slightly off the centreline of the domain (a shift of 0.05 in the Y-direction is used),
in order to help trigger the vortex shedding. The inflow profile (on the left wall of
the domain) is parabolic, following the formula (cf. 2D-2 test case in Schäfer et al.
(1996))
U(y) = 6(H/2 − y)(H/2 + y)/H 2 , (2.1)
where (U(y), V(y) = 0) is the non-dimensionalized velocity vector. Using this velocity
profile, the mean velocity magnitude is Ū = 2U(0)/3 = 1. A no-slip boundary
condition is imposed on the top and bottom walls and on the solid walls of the
cylinder. An outflow boundary condition is imposed on the right wall of the domain.
The configuration of the simulation is shown in figure 1. The Reynolds number
based on the mean velocity magnitude and cylinder diameter (Re = ŪD/ν, with
ν the kinematic viscosity) is set to Re = 100. Computations are performed on an
unstructured mesh generated with Gmsh (Geuzaine & Remacle 2009). The mesh
is refined around the cylinder and is composed of 9262 triangular elements. A
non-dimensional, constant numerical time step dt = 5 × 10−3 is used. The total
instantaneous drag on the cylinder C is computed as follows:
Z
FD = (σ · n) · ex dS, (2.2)
C
where σ is the Cauchy stress tensor, n is the unit vector normal to the outer cylinder
surface, and ex = (1, 0). In the following, the drag is normalized into the drag
coefficient
FD
CD = 1 2 , (2.3)
https://doi.org/10.1017/jfm.2019.62
2
ρ Ū D
where ρ = 1 is the non-dimensional volumetric mass density of the fluid. Similarly,
the lift force FL and lift coefficient CL are defined as
Z
FL = (σ · n) · ey dS, (2.4)
C
ANNs discover control strategies for flow control 285
and
FL
CL = , (2.5)
1
ρ Ū 2 D
Downloaded from https://www.cambridge.org/core. Hanyang University Seoul Campus, on 15 Jan 2020 at 07:44:30, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
reduced to 11 and five probes in appendix E, but further parameter space study and
sensitivity analysis is beyond the scope of the present paper and is left to future
work.
An unsteady wake develops behind the cylinder, which is in good agreement with
what is expected at this Reynolds number. A simple benchmark of the simulation
was performed by observing the pressure fluctuations, drag coefficient and Strouhal
286 J. Rabault, M. Kuchta, A. Jensen, U. Réglade and N. Cerardi
number St = fD/Ū, where f is the vortex shedding frequency. The mean value of
CD in the case without actuation (approximately 3.205) is within 1 % of what is
Downloaded from https://www.cambridge.org/core. Hanyang University Seoul Campus, on 15 Jan 2020 at 07:44:30, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
reported in the benchmark of Schäfer et al. (1996), which validates our simulations,
and similar agreement is found for St (typical value of approximately 0.30). In
addition, we also performed tests on refined meshes, going up to approximately
30 000 triangular elements, and found that the mean drag varied by less than 1 %
following mesh refinement. A pressure field snapshot of the fully developed unsteady
wake is presented in figure 1.
where h · iT indicates the sliding average back in time over a duration corresponding
to one vortex shedding cycle. The ANN tries to maximize this function rt , i.e. to
make it as little negative as possible, therefore minimizing drag and mean lift (to
take into account long-term dynamics, an actualized reward is actually used during
gradient descent; see appendix C for more details). This specific reward function
has several advantages compared with using the plain instantaneous drag coefficient.
https://doi.org/10.1017/jfm.2019.62
Firstly, using values averaged over one vortex shedding cycle leads to less variability
in the value of the reward function, which was found to improve learning speed
and stability. Secondly, the use of a penalization term based on the lift coefficient
is necessary to prevent the network from ‘cheating’. Indeed, in the absence of this
penalization, the ANN manages to find a way to modify the configuration of the flow
in such a way that a larger drag reduction is obtained (up to approximately 18 %
ANNs discover control strategies for flow control 287
drag reduction, depending on the simulation configuration used), but at the cost of a
large induced lift, which is damaging in most practical applications.
Downloaded from https://www.cambridge.org/core. Hanyang University Seoul Campus, on 15 Jan 2020 at 07:44:30, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
The ANN used is relatively simple, being composed of two dense layers of 512
fully connected neurons, plus the layers required to acquire data from the probes,
and generate data for the two jets. This network configuration was found empirically
through trial and error, as is usually done with ANNs. Results obtained with smaller
networks are less good, as their modelling ability is not sufficient in regards to
the complexity of the flow configuration obtained. Larger networks are also less
successful, as they are harder to train. In total, our network has slightly over 300 000
weights. For more details, readers are referred to the code implementation (see
appendix A).
At first, no learning could be obtained from the PPO agent interacting with the
simulation environment. The reason for this was the difficulty for the PPO agent
to learn the necessity to set time-correlated, continuous control signals, as the PPO
first tries purely random control and must observe some improvement on the reward
function for performing learning. Therefore, we implemented two tricks to help the
PPO agent learn control strategies:
(i) The control value provided by the network is kept constant for a duration of
50 numerical time steps, corresponding to approximately 7.5 % of the vortex
shedding period. This means, in practice, that the PPO agent is allowed to
interact with the simulation and update its control only each 50 time steps.
(ii) The control is made continuous in time to avoid jumps in the pressure and
velocity due to the use of an incompressible solver. For this, the control at each
time step in the simulation is obtained for each jet as cs+1 = cs + α(a − cs ),
where cs is the control of the jet considered at the previous numerical time step,
cs+1 is the new control, a is the action set by the PPO agent for the current 50
time steps and α = 0.1 is a numerical parameter.
Using those technical tricks, and choosing an episode duration Tmax = 20.0 (which
spans approximately 6.5 vortex shedding periods, and corresponds to 4000 numerical
time steps, i.e. 80 actions by the network), the PPO agent is able to learn a control
strategy after typically approximately 200 epochs corresponding to 1300 vortex
shedding periods or 16 000 sampled actions, which requires roughly 24 hours of
training on a modern desktop using one single core. This training time could be
reduced easily by at least a factor of 10, using more cores to parallelize the data
sampling from the epochs which is a fully parallel process. Fine tuning the policy
can take a bit longer time, and up to approximately 350 epochs can be necessary to
obtain a fully stabilized control strategy. A training has also been performed going
up to over 1000 episodes to confirm that no more changes were obtained if the
network is allowed to train for a significantly longer time. Most of the computation
time is spent in the flow simulation. This set-up with simple, quick simulations
makes experimentation and reproduction of our results easy, while being enough for
a proof-of-concept in the context of a first application of reinforcement learning to
active flow control and providing an interesting control strategy for further analysis.
https://doi.org/10.1017/jfm.2019.62
3. Results
3.1. Drag reduction through active flow control
Robust learning is obtained by applying the methodology presented in the previous
section. This is illustrated by figure 2, which presents the averaged learning curve
288 J. Rabault, M. Kuchta, A. Jensen, U. Réglade and N. Cerardi
3.35
Individual rewards
3.30
Downloaded from https://www.cambridge.org/core. Hanyang University Seoul Campus, on 15 Jan 2020 at 07:44:30, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
Averaged reward
3.25 2ß averaged reward
Standard training duration
3.20
CD 3.15
3.10
3.05
3.00
2.95
F IGURE 2. (Colour online) Illustration of the robustness of the learning process. The drag
values reported are obtained at each training epoch (including exploration noise), for 10
different trainings using the same metaparameters, but different values of the random seed.
Robust learning takes place within 200 epochs, with fine converged strategy requiring a
few more epochs to stabilize. The drag reduction is slightly less than what is reported in
the rest of the text, as these results include the random exploration noise and are computed
over the second half of the training epochs, where some of the transient in the drag value
is still present during training.
picked up at each time step implying that, in the case of deterministic control, no
exploration noise is present), which is as expected. The final drag reduction value
obtained in the deterministic mode (not shown so as not to overload figure 2) is also
consistent across the runs.
Therefore, it is clear that the ANN is able to consistently reduce drag by applying
active flow control following training through the DRL/PPO algorithm, and that the
ANNs discover control strategies for flow control 289
3.25
Downloaded from https://www.cambridge.org/core. Hanyang University Seoul Campus, on 15 Jan 2020 at 07:44:30, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
3.20
3.15 0.02
3.10 Q*1 0
CD
3.05
-0.02
3.00 0 20 40 60
Time (non-dimensional)
2.95
Controlled Baseline
2.90
0 10 20 30 40 50 60
Time (non-dimensional)
F IGURE 3. (Colour online) Time-resolved value of the drag coefficient CD in the case
without (baseline curve) and with (controlled curve) active flow control, and corresponding
normalized mass flow rate of the control jet 1 (Q∗1 , inset). The effect of the flow control
on the drag is clearly visible: a reduction of the drag of approximately 8 % is observed,
and the fluctuations in time due to vortex shedding are drastically reduced. Two successive
phases can be distinguished in the mass flow rate control: first, a relatively large control
is used to change the flow configuration, up to a non-dimensional time of approximately
11, before a pseudo-periodic regime with very limited flow control is established.
learning is both stable and robust. All results presented further in both this section
and the next one are obtained using deterministic prediction, and therefore exploration
noise is not present in the following figures and results. The time series for the drag
coefficient obtained using the active flow control strategy discovered through training
in the first run, compared with the baseline simulation (no active control, i.e. Q1 =
Q2 = 0), is presented in figure 3 together with the corresponding control signal (inset).
Similar results and control laws are obtained for all training runs, and the results
presented in figure 3 are therefore representative of the learning obtained with all 10
realizations.
In the case without actuation (baseline), the drag coefficient CD varies periodically
at twice the vortex shedding frequency, as should be expected. The mean value for
the drag coefficient is hCD i ≈ 3.205, and the amplitude of the fluctuations of the drag
coefficient is approximately 0.034. By contrast, the mean value for the drag coefficient
in the case with active flow control is hCD0 i ≈ 2.95, which represents a drag reduction
of approximately 8 %.
To put this drag reduction into perspective, we estimate the drag obtained in
the hypothetical case where no vortex shedding is present. For this, we perform a
simulation with the upper half-domain and a symmetric boundary condition on the
https://doi.org/10.1017/jfm.2019.62
lower boundary (which cuts the cylinder through its equator). More details about
this simulation are presented in appendix D. The steady-state drag obtained on a full
cylinder in the case without vortex shedding is then CDs = 2.93 (see appendix D),
which means that the active control is able to suppress approximately 93 % of the
drag increase observed in the baseline without control compared with the hypothetical
reference case where the flow would be kept completely stable.
290 J. Rabault, M. Kuchta, A. Jensen, U. Réglade and N. Cerardi
In addition to this reduction in drag, the fluctuations of the drag coefficient are
reduced to approximately 0.0016 by the active control, i.e. a factor of roughly 20
Downloaded from https://www.cambridge.org/core. Hanyang University Seoul Campus, on 15 Jan 2020 at 07:44:30, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
compared with the baseline. Similarly, fluctuations in lift are reduced, though by
a more modest factor of approximately 5.7. Finally, a Fourier analysis of the drag
coefficients obtained shows that the actuation slightly modifies the characteristic
frequency of the system. The actively controlled system has a shedding frequency
approximately 3.5 % lower than the baseline.
Several interesting points are visible from the active control signal imposed by
the ANN presented in figure 3. Firstly, the active flow control is composed of
two phases. In the first one, the ANN changes the configuration of the flow by
performing a relatively large transient actuation (non-dimensional time ranging from
0 to approximately 11). This changes the flow configuration, and sets the system in
a state in which less drag is generated. Following this transient actuation, a second
regime is reached in which a smaller actuation amplitude is used. The actuation
in this new regime is pseudo-periodic. Therefore, it appears that the ANN has
found a way to both set the flow in a modified configuration in which less drag is
present, and keep it in this modified configuration at a relatively small cost. In a
separate simulation, the small actuation present in the pseudo-periodic regime once
the initial actuation has taken place was suppressed. This led to a rapid collapse of
the modified flow regime, and the original base flow configuration was recovered. As
a consequence, it appears that the modified flow configuration is unstable, though
only small corrections are needed to keep the system in its neighbourhood.
Secondly, it is striking to observe that the ANN resorts to quite small actuations.
The peak value for the norm of the non-dimensional control mass flow rate Q∗1 ,
which is reached during the transient active control regime, is only approximately
0.02, i.e. a factor 3 smaller than the maximum value allowed during training. Once
the pseudo-periodic regime is established, the peak value of the actuation is reduced
to approximately 0.006. This is an illustration of the sensitivity of the Navier–Stokes
equations to small perturbations, and a proof that this property of the equations can
be exploited to actively control the flow configuration, if forcing is applied in an
appropriate manner.
flow. Therefore, it appears challenging to directly analyse the control strategy from
the trained network, which should be considered rather as a black box in this regard.
Instead, we can look at macroscopic flow features and how the active control
modifies them. This pinpoints the effect of the actuation on the flow and separation
happening in the wake. Representative snapshots of the flow configuration in the
baseline case (no actuation), and in the controlled case when the pseudo-periodic
ANNs discover control strategies for flow control 291
(a) 2
1 2.00
Velocity magnitude
Downloaded from https://www.cambridge.org/core. Hanyang University Seoul Campus, on 15 Jan 2020 at 07:44:30, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
y 0 1.75
-1 1.50
-2 1.25
(b) 2 1.00
1 0.75
y 0 0.50
0.25
-1
0
-2
0 5 10 15 20
x
regime is reached (i.e. after the initial large transient actuation), are presented in
figure 4. As can be seen in figure 4, the active control leads to a modification of
the 2D flow configuration. In particular, the Kármán alley is altered in the case
with active control and the velocity fluctuations induced by the vortices are globally
less strong, and less active close to the upper and lower walls. More strikingly, the
extent of the recirculation area is dramatically increased. Defining the recirculation
area as the region in the downstream neighbourhood of the cylinder where the
horizontal component of the velocity is negative, we observe a 130 % increase in
the recirculation area, averaged over the pseudo-period. The recirculation area in
the active control case represents 103 % of what is obtained in the hypothetical
stable configuration of appendix D (so, the recirculation area is slightly larger in
the controlled case than in the hypothetical stable case, though the difference is so
small that it may be due to a side effect such as slightly larger separation close to
the jets, rather than a true change in the extent of the developed wake), while the
recirculation area in the baseline configuration with vortex shedding is only 44 % of
this same stable configuration value. This is, similarly to what was observed for CD ,
an illustration of the efficiency of the control strategy at reducing the effect of vortex
shedding.
To go into more details, we look at the mean and the standard deviation (STD)
of the flow velocity magnitude and pressure, averaged over a large number of vortex
shedding periods (in the case with active flow control, we consider the pseudo-periodic
regime). Results are presented in figure 5. Several interesting points are visible from
both the velocity and pressure data. Similarly to what was observed on the snapshots,
the area of the separated wake is larger in the case with active control than in the
baseline. This is clearly visible from the mean value plots of both velocity magnitude
https://doi.org/10.1017/jfm.2019.62
and pressure. This feature results in a lower mean pressure drop in the wake of
the cylinder in the case with active control, which is the cause of the reduced
drag. This phenomenon is similar to boat tailing, which is a well-known method for
reducing the drag between bluff bodies. However, in the present case, this is attained
through applying small controls to the flow rather than modifying the shape of the
obstacle. The STD figures also clearly show a decreased level of fluctuations of
292 J. Rabault, M. Kuchta, A. Jensen, U. Réglade and N. Cerardi
Downloaded from https://www.cambridge.org/core. Hanyang University Seoul Campus, on 15 Jan 2020 at 07:44:30, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
(a) 2
Mean pressure
0 1.50
y 0
-1 1.00
-2 0.50
2
1 0
y 0
0
0 -0.50
0
-1
-1.00
0
-2
0 5 10 15 20
2
1 0.35
y 0 0.30
STD pressure
-1 0.25
-2 0.20
2 0.15
1
y 0 0.10
0.05
-1
0
-2
0 5 10 15 20
x
F IGURE 5. (Colour online) Comparison of the flow morphology without (top part of
each double panel) and with (bottom part of each double panel) actuation. (a) Velocity
magnitude comparisons: mean (upper double panel) and STD (lower double panel).
(b) Pressure comparisons: mean (upper double panel) and STD (lower double panel). The
https://doi.org/10.1017/jfm.2019.62
colour bar is common to both parts of each double panel. A clear increase in size of the
recirculation area is observed with actuation, which is associated with a lower pressure
drop behind the cylinder.
ANNs discover control strategies for flow control 293
both the velocity magnitude and the pressure in the wake, as well as a displacement
downstream of the cylinder of the regions where highest flow variations are recorded.
Downloaded from https://www.cambridge.org/core. Hanyang University Seoul Campus, on 15 Jan 2020 at 07:44:30, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
4. Conclusion
We show for the first time that the deep reinforcement learning (DRL) paradigm,
and more specifically the proximal policy optimization (PPO) algorithm, can discover
an active flow control strategy for synthetic jets on a cylinder, and control the
configuration of the 2D Kármán vortex street. From the point of view of the artificial
neural network (ANN) and DRL, this is just yet another environment to interact
with. The discovery of the control strategy takes place through the optimization of a
reward function, here defined from the fluctuations of the drag and lift components
experienced by the cylinder. A drag reduction of up to approximately 8 % is observed.
In order to reduce drag, the ANN decides to increase the area of the separated region,
which in turn induces a lower pressure drop behind the cylinder, and therefore lower
drag. This brings the flow into a configuration that presents some similarities with
what would be obtained from boat tailing. The value of the drag coefficient and
extent of the recirculation bubble when control is turned on are very close to what is
obtained by simulating the flow around a half-cylinder using a symmetric boundary
condition at the lower wall, which allows one to estimate the drag expected around
a cylinder at comparable Reynolds number if no vortex shedding was present. This
implies that the active control is able to effectively cancel the detrimental effect of
vortex shedding on drag. The learning obtained is remarkable, as little metaparameter
tuning was necessary, and training takes place in about one day on a laptop. In
addition, we have resorted to strong regularization of the output of the DRL agent
through under-sampling of the simulation and imposing a continuous control for
helping the learning process. It could be expected that relaxing those constraints, i.e.
giving more freedom to the network, could lead to even more efficient strategies.
These results are potentially of considerable importance for fluid mechanics, as they
provide a proof that DRL can be used to solve the high dimensionality, analytically
untractable problem of active flow control. The ANN and DRL approach has a number
of strengths which make it an appealing methodology. In particular, ANNs allow for
an efficient global approximation of strongly nonlinear functions, and they can be
trained through direct experimentation of the DRL agent with the flow, which makes
it in theory easily applicable to both simulations and experiments without changes
in the DRL methodology. In addition, once trained, the ANN requires only a few
calculations to compute the control at each time step. In the present case when two
hidden layers of width 512 are used, most of the computational cost comes from a
matrix multiplication, where the size of the matrices to multiply is [512, 512]. This is
much less computationally expensive than the underlying problem. Finally, we are able
to show that learning takes place in a timely manner, requiring a reasonable number
of vortex shedding periods to obtain a converged strategy.
This work opens a number of research directions, including applying the DRL
methodology to more complex simulations, for example more realistic three-
dimensional large-eddy simulations or direct numerical simulations on large computer
clusters, or even applying such an approach directly to a real-world experiment. In
https://doi.org/10.1017/jfm.2019.62
addition, a number of interesting questions arise from the use of ANNs and DRL.
For example, can some form of transfer learning be used between simulations and
the real world if the simulations are realistic enough (i.e. can one train an ANN in
a simulation, and then use it in the real world)? The use of DRL for active flow
control may provide a technique to finally take advantage of advanced, complex flow
actuation possibilities, such as those allowed by complex jet actuator arrays.
294 J. Rabault, M. Kuchta, A. Jensen, U. Réglade and N. Cerardi
Acknowledgement
The help of T. Kvernes for setting up the computational infrastructure used in
Downloaded from https://www.cambridge.org/core. Hanyang University Seoul Campus, on 15 Jan 2020 at 07:44:30, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
Here, U is the inflow velocity profile (2.1) while fQi are radial velocity profiles which
mimic suction or injection of the fluid by the jets. The functions are chosen such
that the prescribed velocity continuously joins the no-slip condition imposed on the
ΓW surfaces of the cylinder. More precisely, we set fQi = A(θ; Qi )(x, y) where the
modulation depends on the angular coordinate θ (cf. figure 6), such that for the jet
https://doi.org/10.1017/jfm.2019.62
with width ω and centred at θ0 placed on the cylinder of radius R the modulation is
set as
π π
A(θ; Q) = Q cos (θ − θ0 ) . (B 3)
2ωR2 ω
We remark that with this choice the boundary conditions on the jets are in fact
controlled by single scalar values Qi . Negative values of Qi correspond to suction.
ANNs discover control strategies for flow control 295
˝W
˝1
Downloaded from https://www.cambridge.org/core. Hanyang University Seoul Campus, on 15 Jan 2020 at 07:44:30, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
y
œ0 x
˝I ˝W ˝W ˝O n
˝2
˝W
F IGURE 6. (Colour online) Solution domain Ω (not to scale) for the Navier–Stokes
equations of the simulation environment. On parts of the cylinder boundary (in blue)
velocity boundary conditions determined by Qi are prescribed.
u∗ − u0 u∗ + u0
+ u0 · (∇u0 ) = −∇p0 + Re−1 1 in Ω, (B 4)
δt 2
− 1(p − p0 ) = −δt−1 ∇ · u∗ in Ω, (B 5)
u − u∗ = −δt−1 ∇(p − p0 ) in Ω. (B 6)
The steps (B 4) and (B 6) are considered with the boundary conditions (B 2), while for
pressure projection (B 5) a Dirichlet boundary condition p = 0 is used on ΓO and the
remaining boundaries have n · ∇p = 0.
Discretization of the IPCS scheme (B 4)–(B 6) relies on the finite-element method.
More specifically, the velocity and pressure fields are discretized, respectively, in
terms of the continuous quadratic and continuous linear elements on triangular cells.
Because of the explicit treatment of the nonlinearity in (B 4), all the matrices of
https://doi.org/10.1017/jfm.2019.62
linear systems in the scheme are assembled (and their solvers set up) once prior
to entering the time loop in which only the right-hand side vectors are updated. In
our implementation the solvers for the linear systems involved are the sparse direct
solvers from the UMFPACK library (Davis 2004). We remark that the finite-element
mesh used for training consists of 9262 elements and gives rise to systems with
37 804 and 4820 unknowns in (B 4), (B 6) and (B 5), respectively.
296 J. Rabault, M. Kuchta, A. Jensen, U. Réglade and N. Cerardi
Once u and p have been computed, the drag and lift are integrated over the entire
surface of the cylinder. In particular, the jet surfaces are included.
Downloaded from https://www.cambridge.org/core. Hanyang University Seoul Campus, on 15 Jan 2020 at 07:44:30, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
where πΘ is the policy function described by the ANN, when it has the weights Θ,
and st is the (hidden) state of the system.
Following this optimization formulation, one has naturally π∗ = πΘ=Θ ∗ , where Θ ∗
is the set of weights obtained through the maximization (C 1).
The maximization (C 1) is solved through gradient descent performed on the weights
Θ of the ANN, following experimental sampling of the system through interaction
with the environment. Sophisticated gradient descent batch methods, such as Adagrad
or Adadelta, allow one to automatically set the learning rate in accordance to the
local speed of the gradient descent and provide stabilization of the gradient by adding
momentum. More specifically, if we denote τ a (s–a–r) sequence,
and we overload the R operator as R(τ ) = i γ ri , then the value function obtained
P i
with the weights Θ, which is the quantity that should be maximized, can be written
as " H #
X X
V(Θ) = E R(st , ut )|πθ = P(τ , Θ)R(τ ). (C 3)
t=0 τ
ANNs discover control strategies for flow control 297
From this point, elementary manipulations lead to
Downloaded from https://www.cambridge.org/core. Hanyang University Seoul Campus, on 15 Jan 2020 at 07:44:30, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
X
∇ Θ V(Θ) = ∇ Θ P(τ , Θ)R(τ )
τ
X P(τ , Θ)
= ∇ Θ P(τ , Θ)R(τ )
τ
P(τ , Θ)
X ∇ Θ P(τ , Θ)
= P(τ , Θ) R(τ )
τ
P(τ , Θ)
X
= P(τ , Θ)∇ Θ log (P(τ , Θ)) R(τ ). (C 4)
τ
The last expression represents a new expected value, which can be empirically
sampled under the policy πΘ and used as the input to the gradient descent.
In this new expression, one needs to estimate the log-prob gradient ∇ Θ log(P(τ , Θ)).
This can be performed in the following way:
" #
∇ Θ log(P(τ (i) , Θ)) = ∇ Θ log P(s(i) (i) (i) (i) (i)
Y
t+1 |st , at )πΘ (at |st )
t
" #
log P(s(i) (i) (i)
log πΘ (a(i) (i)
X X
= ∇Θ t+1 |st , at ) + t |st )
t t
This last expression depends only on the policy, not the dynamic model. This allows
effective sampling and gradient descent. In addition, one can show that this method
is unbiased.
In the case of continuous control, as is performed in the present work, the ANN
is used to predict the parameters of a distribution with compact support (i.e. the
distribution is 0 outside of the range of admissible actions; in the present case,
a Γ distribution is used, but other choices are possible), given the input ot . The
distribution obtained at each time step describes the probability distribution for the
optimal action to perform at the corresponding step, following the current belief
encoded by the weights of the ANN. When performing training, the action effectively
taken is sampled following this distribution. This means that there is a level of
exploration randomness when an action is chosen during training. The more uncertain
the network is about which action to take (i.e. the wider the distribution), the more
the network will try different controls. This is the source of the random exploration
during training. By contrast, when the network is used in pure prediction mode, the
DRL agent extracts the action with the highest probability and uses it for the action,
https://doi.org/10.1017/jfm.2019.62
2 2
y 1 1
Downloaded from https://www.cambridge.org/core. Hanyang University Seoul Campus, on 15 Jan 2020 at 07:44:30, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
0 0
0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 Velocity
x magnitude
F IGURE 7. (Colour online) Illustration of the converged flow obtained around a centred
half-domain, using a symmetric boundary condition at the lower boundary. The lower
boundary cuts the cylinder through its equatorial plane. This results in a configuration
where no vortex shedding is present, which constitutes a ‘hypothetical’ no-shedding
baseline to which we can compare our results. The mesh is heavily refined in all the
recirculation area.
steady state without vortex shedding CDs = 2.93. This value can be compared with
the drag obtained when active flow control is turned on, to estimate how efficient the
control is at reducing the negative effect of vortex shedding on drag. Similarly, the
asymptotic recirculation area for a half-cylinder without vortex shedding is obtained
and the value can be extended to the hypothetical steady case with a full cylinder
(As = 2.41).
ANNs discover control strategies for flow control 299
(a) 2
1
Downloaded from https://www.cambridge.org/core. Hanyang University Seoul Campus, on 15 Jan 2020 at 07:44:30, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
y 0
-1 1.25
Pressure
-2 0.50
(b) 2
-0.25
1
-1.00
y 0
-1
-2
0 5 10 15 20
F IGURE 8. (Colour online) Unsteady non-dimensional pressure wake behind the cylinder
after flow initialization without active control and position of the pressure probes in the
case with five (a) and 11 (b) probes. Both the resolution and the regions of the flow on
which information is provided are much reduced compared with the main body of the text.
In both cases, the position of the probes is indicated by black dots, while the position of
the jets is indicated by red squares.
(typically, mean values of CD reach 3.03 for five probes, and 2.99 for 11 probes
in the pseudo-periodic regime). This can probably be attributed to the absence
of information about the configuration of the developed wake. This provides an
illustration of the fact that ANNs can also be used to perform efficient control even
when only partial, under-sampled flow information is available.
300 J. Rabault, M. Kuchta, A. Jensen, U. Réglade and N. Cerardi
3.25
3.20
Downloaded from https://www.cambridge.org/core. Hanyang University Seoul Campus, on 15 Jan 2020 at 07:44:30, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
3.15 0.02
Q1* 0
3.10
-0.02
CD 3.05 0 20 40
Time (non-dimensional)
60
3.00
2.95
5 probes 151 probes
2.90 11 probes Baseline
0 10 20 30 40 50 60
Time (non-dimensional)
F IGURE 9. (Colour online) Time-resolved value of the drag coefficient CD in the case
without (baseline curve) and with active flow control (controlled curves, cases with five,
11, 151 probes; the last one is the same as was reported in the main body of the text), and
corresponding normalized mass flow rates of the control jet 1 (Q∗1 , inset for both cases).
This figure is plotted in a similar way to figure 3. As visible here, the asymptotic drag
reduction in the case of a reduced input information size is a bit less good than with
a full flow information. This may be due to both the absence of information about the
far-wake configuration, and the lesser spatial resolution of the sampling.
REFERENCES
A BADI , M., BARHAM , P., C HEN , J., C HEN , Z., DAVIS , A., D EAN , J., D EVIN , M., G HEMAWAT, S.,
I RVING , G., I SARD , M., RUDLUR , M., L OVENBERG , J., M ONGA , R., M OORE , S., S TEINER ,
B., T UCHU , P., VASUDEJON , V., WARDEN , P., W ICKE , M., Y U , Y. & Z HENG , X. 2016
Tensorflow: a system for large-scale machine learning. In Proceedings of the 12th USENIX
Symposium on Operating Systems Design and Implementation (OSDI ’16), vol. 16, pp. 265–283.
BARBAGALLO , A., D ERGHAM , G., S IPP, D., S CHMID , P. J. & ROBINET, J.-C. 2012 Closed-loop
control of unsteadiness over a rounded backward-facing step. J. Fluid Mech. 703, 326–362.
BARBAGALLO , A., S IPP, D. & S CHMID , P. J. 2009 Closed-loop control of an open cavity flow using
reduced-order models. J. Fluid Mech. 641, 1–50.
B RUNTON , S. L. & N OACK , B. R. 2015 Closed-loop turbulence control: progress and challenges.
Appl. Mech. Rev. 67 (5), 050801.
DAVIS , T. A. 2004 Algorithm 832: UMFPACK v4.3 – an unsymmetric-pattern multifrontal method.
ACM Trans. Math. Softw. 30 (2), 196–199.
D EAN , B. & B HUSHAN , B. 2010 Shark-skin surfaces for fluid-drag reduction in turbulent flow: a
review. Phil. Trans. R. Soc. Lond. A 368 (1929), 4775–4806.
D UAN , Y., C HEN , X., H OUTHOOFT, R., S CHULMAN , J. & A BBEEL , P. 2016 Benchmarking deep
reinforcement learning for continuous control. In International Conference on Machine Learning,
pp. 1329–1338.
https://doi.org/10.1017/jfm.2019.62
D URIEZ , T., B RUNTON , S. L. & N OACK , B. R. 2016 Machine Learning Control – Taming Nonlinear
Dynamics and Turbulence. Springer.
E RDMANN , R., PÄTZOLD , A., E NGERT, M., P ELTZER , I. & N ITSCHE , W. 2011 On active control
of laminar–turbulent transition on two-dimensional wings. Phil. Trans. R. Soc. Lond. A 369
(1940), 1382–1395.
F RANSSON , J. H. M., TALAMELLI , A., B RANDT, L. & C OSSU , C. 2006 Delaying transition to
turbulence by a passive mechanism. Phys. Rev. Lett. 96 (6), 064501.
ANNs discover control strategies for flow control 301
G AUTIER , N., A IDER , J.-L., D URIEZ , T., N OACK , B. R., S EGOND , M. & A BEL , M. 2015 Closed-loop
separation control using machine learning. J. Fluid Mech. 770, 442–457.
G EUZAINE , C. & R EMACLE , J.-F. 2009 Gmsh: a 3-D finite element mesh generator with built-in
Downloaded from https://www.cambridge.org/core. Hanyang University Seoul Campus, on 15 Jan 2020 at 07:44:30, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
pre- and post-processing facilities. Intl J. Numer. Meth. Engng 79 (11), 1309–1331.
G LEZER , A. 2011 Some aspects of aerodynamic flow control using synthetic-jet actuation. Phil. Trans.
R. Soc. Lond. A 369 (1940), 1476–1494.
G ODA , K. 1979 A multistep technique with implicit difference schemes for calculating two- or
three-dimensional cavity flows. J. Comput. Phys. 30 (1), 76–95.
G OODFELLOW, I., B ENGIO , Y., C OURVILLE , A. & B ENGIO , Y. 2016 Deep Learning, vol. 1. MIT
Press.
G U , S., L ILLICRAP, T., S UTSKEVER , I. & L EVINE , S. 2016 Continuous deep Q-learning with model-
based acceleration. In Intl Conference on Machine Learning, pp. 2829–2838.
G UÉNIAT, F., M ATHELIN , L. & H USSAINI , M. Y. 2016 A statistical learning strategy for closed-loop
control of fluid flows. Theor. Comput. Fluid Dyn. 30 (6), 497–510.
H E , K., Z HANG , X., R EN , S. & S UN , J. 2016 Deep residual learning for image recognition. In Proc.
of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 770–778.
H ORNIK , K., S TINCHCOMBE , M. & W HITE , H. 1989 Multilayer feedforward networks are universal
approximators. Neural Networks 2 (5), 359–366.
K OBER , J., BAGNELL , J. A. & P ETERS , J. 2013 Reinforcement learning in robotics: a survey. Intl J.
Robotics Res. 32 (11), 1238–1274.
K RIZHEVSKY, A., S UTSKEVER , I. & H INTON , G. E. 2012 Imagenet classification with deep
convolutional neural networks. Adv. Neural Inform. Proc. Syst. pp. 1097–1105.
K UTZ , J. N. 2017 Deep learning in fluid dynamics. J. Fluid Mech. 814, 1–4.
L E C UN , Y., B ENGIO , Y. & H INTON , G. 2015 Deep learning. Nature 521, 436–444.
L I , R., N OACK , B. R., C ORDIER , L., B ORÉE , J. & H ARAMBAT, F. 2017 Drag reduction of a car
model by linear genetic programming control. Exp. Fluids 58 (8), 103.
L ILLICRAP, T. P., H UNT, J. J., P RITZEL , A., H EESS , N., E REZ , T., TASSA , Y., S ILVER , D. &
W IERSTRA , D. 2015 Continuous control with deep reinforcement learning. arXiv:1509.02971.
L OGG , A., M ARDAL , K.-A. & W ELLS , G. 2012 Automated Solution of Differential Equations by the
Finite Element Method: The FEniCS Book, vol. 84. Springer.
M NIH , V., K AVUKCUOGLU , K., S ILVER , D., G RAVES , A., A NTONOGLOU , I., W IERSTRA , D. &
R IEDMILLER , M. 2013 Playing Atari with deep reinforcement learning. arXiv:1312.5602.
M NIH , V., K AVUKCUOGLU , K., S ILVER , D., RUSU , A. A., V ENESS , J., B ELLEMARE , M. G.,
G RAVES , A., R IEDMILLER , M., F IDJELAND , A. K., O STROVSKI , G., P ETERSON , S., B EATIE ,
C., S ADIH , A., A NTONOGLOU , I., K ING , H., RUMANRON , D., W IERSTA , D., L EGG , S. &
H ASSABIS , D. 2015 Human-level control through deep reinforcement learning. Nature 518
(7540), 529.
PASTOOR , M., H ENNING , L., N OACK , B. R., K ING , R. & TADMOR , G. 2008 Feedback shear layer
control for bluff body drag reduction. J. Fluid Mech. 608, 161–196.
R ABAULT, J., K OLAAS , J. & J ENSEN , A. 2017 Performing particle image velocimetry using artificial
neural networks: a proof-of-concept. Meas. Sci. Technol. 28 (12), 125301.
R AUBER , P. E., FADEL , S. G., FALCAO , A. X. & T ELEA , A. C. 2017 Visualizing the hidden
activity of artificial neural networks. IEEE Trans. Vis. Comput. Graphics 23 (1), 101–110.
S CHAARSCHMIDT, M., K UHNLE , A. & F RICKE , K. 2017 Tensorforce: a tensorflow library for applied
reinforcement learning. https://github.com/tensorforce/tensorforce.
S CHÄFER , M., T UREK , S., D URST, F., K RAUSE , E. & R ANNACHER , R. 1996 Benchmark computations
of laminar flow around a cylinder. In Flow Simulation with High-Performance Computers II
(ed. E. H. Hirschel), pp. 547–566. Springer.
https://doi.org/10.1017/jfm.2019.62
S CHMIDHUBER , J. 2015 Deep learning in neural networks: an overview. Neural Networks 61, 85–117.
S CHOPPA , W. & H USSAIN , F. 1998 A large-scale control strategy for drag reduction in turbulent
boundary layers. Phys. Fluids 10 (5), 1049–1051.
S CHULMAN , J., L EVINE , S., M ORITZ , P., J ORDAN , M. I. & A BBEEL , P. 2015 Trust region policy
optimization. CoRR abs/1502.05477, arXiv:1502.05477.
302 J. Rabault, M. Kuchta, A. Jensen, U. Réglade and N. Cerardi
S CHULMAN , J., W OLSKI , F., D HARIWAL , P., R ADFORD , A. & K LIMOV, O. 2017 Proximal policy
optimization algorithms. arXiv:1707.06347.
S HAHINFAR , S., S ATTARZADEH , S. S., F RANSSON , J. H. & TALAMELLI , A. 2012 Revival of
Downloaded from https://www.cambridge.org/core. Hanyang University Seoul Campus, on 15 Jan 2020 at 07:44:30, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
classical vortex generators now for transition delay. Phys. Rev. Lett. 109 (7), 074501.
S IEGELMANN , H. T. & S ONTAG , E. D. 1995 On the computational power of neural nets. J. Comput.
Syst. Sci. 50 (1), 132–150.
S IPP, D. & S CHMID , P. J. 2016 Linear closed-loop control of fluid instabilities and noise-induced
perturbations: a review of approaches and tools. Appl. Mech. Rev. 68 (2), 020801.
VALEN -S ENDSTAD , K., L OGG , A., M ARDAL , K.-A., N ARAYANAN , H. & M ORTENSEN , M. 2012
A comparison of finite element schemes for the incompressible Navier–Stokes equations. In
Automated Solution of Differential Equations by the Finite Element Method, pp. 399–420.
Springer.
V ERMA , S., N OVATI , G. & K OUMOUTSAKOS , P. 2018 Efficient collective swimming by
harnessing vortices through deep reinforcement learning. Proc. Natl Acad. Sci. USA;
http://www.pnas.org/content/early/2018/05/16/1800923115.full.pdf.
V ERNET, J., Ö RLÜ , R., A LFREDSSON , P. H., E LOFSSON , P. & S CANIA , A. B. 2014 Flow
separation delay on trucks a-pillars by means of dielectric barrier discharge actuation. In
First International Conference in Numerical and Experimental Aerodynamics of Road Vehicles
and Trains (Aerovehicles 1), Bordeaux, France, pp. 1–2.
WANG , Z., X IAO , D., FANG , F., G OVINDAN , R., PAIN , C. C. & G UO , Y. 2018 Model identification
of reduced order fluid dynamics systems using deep learning. Intl J. Numer. Meth. Fluids 86
(4), 255–268.
https://doi.org/10.1017/jfm.2019.62