0% found this document useful (0 votes)
15 views

Convolutional Recurrent Autoencoder Network For Le

This document discusses a convolutional recurrent autoencoder network (CRAN) model for learning underwater ocean acoustics using deep learning. The CRAN is a data-driven model that can learn complex ocean acoustics phenomena independent of how the data is obtained. It learns a reduced dimensional representation of physical data and can efficiently predict system evolution. The document tests the CRAN on two cases - 1D wave propagation and 2D far-field transmission loss distribution, showing it can learn essential physics patterns and predict long-term evolution accurately. Such ability indicates potential for real-time marine vessel decision making and control by mitigating far-field underwater radiated noise signatures.

Uploaded by

Rafa Manzano
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Convolutional Recurrent Autoencoder Network For Le

This document discusses a convolutional recurrent autoencoder network (CRAN) model for learning underwater ocean acoustics using deep learning. The CRAN is a data-driven model that can learn complex ocean acoustics phenomena independent of how the data is obtained. It learns a reduced dimensional representation of physical data and can efficiently predict system evolution. The document tests the CRAN on two cases - 1D wave propagation and 2D far-field transmission loss distribution, showing it can learn essential physics patterns and predict long-term evolution accurately. Such ability indicates potential for real-time marine vessel decision making and control by mitigating far-field underwater radiated noise signatures.

Uploaded by

Rafa Manzano
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Preprint

Convolutional recurrent autoencoder network for learning underwater ocean acoustics

Wrik Mallik,a Rajeev K. Jaiman,b and Jasmin Jelovicac


Department of Mechanical Engineering, The University of British Columbia, Vancouver,
BC V5T 1Z4, Canada

(Dated: 13 April 2022)

Underwater ocean acoustics is a complex physical phenomenon involving not only widely varying physical
arXiv:2204.05573v1 [physics.flu-dyn] 12 Apr 2022

parameters and dynamical scales but also uncertainties in the ocean parameters. Thus, it is difficult to
construct generalized physical models which can work in a broad range of situations. In this regard, we
propose a convolutional recurrent autoencoder network (CRAN) architecture, which is a data-driven deep
learning model for acoustic propagation. Being data-driven it is independent of how the data is obtained
and can be employed for learning various ocean acoustic phenomena. The CRAN model can learn a reduced-
dimensional representation of physical data and can predict the system evolution efficiently. Two cases of
increasing complexity are considered to demonstrate the generalization ability of the CRAN. The first case
is a one-dimensional wave propagation with spatially-varying discontinuous initial conditions. The second
case corresponds to a far-field transmission loss distribution in a two-dimensional ocean domain with depth-
dependent sources. For both cases, the CRAN can learn the essential elements of wave propagation physics
such as characteristic patterns while predicting long-time system evolution with satisfactory accuracy. Such
ability of the CRAN to learn complex ocean acoustics phenomena has the potential of real-time prediction
for marine vessel decision-making and online control.

a [email protected]
b [email protected]
c [email protected]

1
DL model for learning underwater ocean acoustics

I. INTRODUCTION

In the ocean environment, underwater radiated noise (URN) propagation from marine vessel operations
acts as a stressor for underwater marine animals. With the ever-increasing marine traffic over the last few
decades, this has become a significant concern for both environmentalists and the shipbuilding industry
(Duarte et al., 2021; Erbe et al., 2019). The main noise sources come from large commercial vessels such as
cargo ships, tankers, cruise ships and ferries as well as small size watercrafts such as motorized boats, fishing
vessels and tug boats. These anthropogenic noise sources adversely influence the abilities of marine mammals
in their essential life activities e.g., foraging, communication, echolocation, migration, reproduction. Thus,
the prediction capability of far-field URN propagation is extremely important for better design and control
of marine vessels.
Predicting far-field URN propagation can be a daunting task as it involves a complex multiphysics
phenomenon with widely varying physical scales. For example, URN depends on various factors such as
temperature, density and salinity of the ocean, ocean bathymetry, ocean-bed material properties, and marine
biological conditions among others. All of these factors involve uncertainties, which pose a serious challenge
for developing physical models of underwater ocean acoustic phenomena (James and Dowling, 2005, 2011).
Far-field URN is analyzed via techniques with varying fidelity according to the nature of the physical
problem investigated. As a result, experimental measurements, high-fidelity numerical techniques like the
finite-element method, or hybrid numerical techniques like ray tracing, all have their necessities according
to specific situations. Therefore, the biggest challenge for marine vessel operation is how to compute far-
field URN generated from various operational marine vessel parameters (e.g., vessel speed, propeller pitching
angle, etc.) efficiently, in a widely varying ocean environment. If such knowledge can be acquired for efficient
decision-making, marine vessel operations can be adapted to mitigate their far-field URN signatures.
One can attempt to solve the aforementioned challenge via data-driven models, which can approximate
the actual physical system. Being data-driven, they rely only on the observed data and remain independent
of the data generation techniques. Therefore, they can encompass all conditions and parameters for which
data are available, obtained from either experimental or various fidelity computational solvers. This will
enable the data-driven models to form a general approximation or digital twin of the physical system.
Furthermore, these data-driven models can learn a much lower-dimensional representation of the system
from high-dimensional physical data. This potentially enhances their scalability and facilitates real-time
predictions. The data-driven models are generally developed via an offline-online application strategy. The
offline phase consists of training the models to learn the problem physics. During the online phase, the data-
driven model can provide real-time solutions for various applications including decision making, control and
optimization.
Various data-driven techniques have been historically employed for learning complex physical systems
exhibiting nonlinear and multi-scale behavior. Projection-based methods such as proper orthogonal de-
composition (POD) are one such popularly used technique. However, POD, and other projection-based
methods, can prove inefficient for certain convection-dominated and wave propagation problems. For such

2
DL model for learning underwater ocean acoustics

problems, the worst-case error obtained from the best-approximated linear subspace of a high-dimensional
solution decays slowly with an increase in subspace dimension (i.e., the large Kolmogorov n-width problem)
(Greif and Urban, 2019; Mojgani and Balajewicz, 2020; Taddei, 2020). This is a limitation of employing
linear operations via projection to find the best-approximated reduced-dimensional subspace for transport
problems.
In recent years, deep learning (DL) architectures based on neural networks have been increasingly used
as data-driven models for mechanistic problems (Bergen et al., 2019; Hsieh, 2009; Reichstein et al., 2019).
Recent numerical experiments (Mansour, 2019) indicate that DL models tend to prioritize the learning
of inherent simpler features of the data over the complex data patterns. Such behavior is observed even
when the training data is noisy. Such learning of simple laws governing the data aligns perfectly with data
of mechanistic origin, which are governed by a few fundamental physical principles (e.g., Newton’s laws
of motion). This underscores the motivation for approximating mechanistic systems with data-driven DL
models.
Deep learning models can learn a low-dimensional representation of high-dimensional physical data via
a nonlinear mapping. Such a nonlinear mapping is developed via a composition of nonlinear activation
functions in multi-layer deep neural networks. This makes neural networks less susceptible to the large
Kolmogorov n-width problem. Also, neural networks can employ global bases from an affine subspace,
leading to highly flexible basis functions at their disposal. These properties allow them to approximate
many complex functions, which has been theoretically demonstrated by various universal approximation
theorems (Cybenko, 1989; Hornik, 1991; Leshno et al., 1993; Pinkus, 1999). Finally, it can be shown that
for a sufficiently over-parameterized DL model, standard neural network training techniques can empirically
converge to the globally optimum neural network configuration (Allen-Zhu et al., 2019). Such convergence
can be further corroborated by the fact that deep learning models generated with linear activation functions
will empirically converge to POD subspaces (Bukka et al., 2021). This has led to the employment of various
DL models like autoencoder networks and recurrent neural networks for accurately learning various nonlinear
and complex physical problems in a much lower dimension and with real-time online prediction capability
(Bukka et al., 2021; Cheng et al., 2020; Gonzalez and Balajewicz, 2018; Lee and Carlberg, 2020; Parish and
Carlberg, 2020; Sorteberg et al., 2019).
Various data-driven models, including DL models, have recently been applied in the area of underwater
acoustics for classification problems. These studies have mainly focused on the localization of ship-generated
sound sources based on their acoustic signatures in both shallow and deep ocean (Chen and Schmidt, 2021;
Chi et al., 2019; Ferguson, 2021; Huang et al., 2018; Niu et al., 2017; Wang and Peng, 2018). A comprehensive
review of data-driven modeling in various acoustics applications, including ocean acoustics, is provided in
Ref. (Bianco et al., 2019).
DL models can also be employed for low-dimensional representation learning and forecasting tasks in
acoustics. Some recent articles have presented similar tasks with DL architectures like convolutional neural
network autoencoders and long short-term memory RNNs (LSTM-RNNs) for studying the propagation of
shallow water waves (Deo and Jaiman, 2022; Fotiadis et al., 2020). There has also been some recent research

3
DL model for learning underwater ocean acoustics

on the development of physics-based DL models for wave propagation, which tries to regularize DL training
with a kinematics-based regularizer (Mallik et al., 2021). Although such ideas can be effective for improving
the generalization of DL models during a dearth of training data, the research is still in the developmental
stage. Some attempts have also been made to completely replace numerical techniques like finite element
method, finite difference method, etc., with neural networks, for solving acoustic partial differential equations
(Borrel-Jensen et al., 2021). However, the practical applicability of such approaches is questionable. Thus,
the application of DL models for learning various problems in wave propagation is an emerging area of
research, with a significant scope of broad-scale applications, especially in the area of underwater ocean
acoustics.
In this research, we develop a convolutional recurrent autoencoder network (CRAN) as our DL archi-
tecture for learning underwater acoustics. The CRAN model is a composite encoder-propagator-decoder
framework (Bukka et al., 2021), which applies a sequence-to-sequence learning mechanism for autoregressive
prediction of long-time and far-field prediction of acoustic signals. The learning is performed on a signif-
icantly reduced-dimensional subspace, which approximates high-dimensional physical data. The CRAN is
employed for learning two different scenarios in underwater acoustics. First, we employ the CRAN to learn
solutions of the second-order wave equation subjected to discontinuous initial conditions. The presence of
discontinuities can provide a challenge to learn such phenomena via a projection-based reduced-order model
(Greif and Urban, 2019). Hence, it forms an interesting learning challenge for the data-driven DL model.
Furthermore, we consider learning wave propagation physics with the varying location of the discontinuities
to demonstrate a more generalized learning capability. As a second case, we employ the CRAN model to
learn the spatial distribution of far-field transmission loss from a point source in a two-dimensional under-
water ocean environment. We again consider a wide range of source depth locations to demonstrate the
generalized learning capacity of the CRAN model.
The paper discusses the development of the CRAN model and proposes modifications for a more efficient
autoregressive prediction capacity over the previous implementation of CRAN for learning unsteady flow
dynamics and fluid-structure interaction (Bukka et al., 2021; Gupta and Jaiman, 2022). The training strategy
for the CRAN model for the two different test cases and the generalized learning capacity of underwater
ocean acoustics are discussed. The efficiency of the deep learning-based autoencoder (encoder and decoder)
to learn low-dimensional spatial representation for discontinuous initial conditions is also compared to similar
operations performed via POD. Such application of the CRAN model for learning complex underwater wave
propagation phenomena and ocean acoustics is presented here for the first time. Successful application of
the CRAN promises future application of such data-driven models to approximate ocean acoustics with a
wide range of parameters, based on combined experimental and multi-fidelity computational data sets.

II. DATA-DRIVEN LEARNING OF OCEAN ACOUSTICS

Here we begin with formulating a general data-driven learning problem, which will eventually be applied
for learning ocean acoustics. Any general partial differential equation depending on a spatial domain Ω :=

4
DL model for learning underwater ocean acoustics

Ri (i = 1, 2, 3) and on the time interval I := R+ for a real-valued parameter φ ≥ 0 can be written as



U (x, t; φ) = F (U (x, t; φ) ; φ) , (t, x) ∈ I × Ω, (1)
∂t
subject to any general initial and boundary conditions. F is any general nonlinear operator governing the
dynamics of the system. For many cases, F is known but a closed-form solution to Eq. (1) does not exist.
Under such situations, numerical techniques such as finite element or finite difference are employed to obtain
a solution over a discretized spatial domain with discrete time steps

UM,k+1 (xM , tK ; φ) = FM,K (UM,k (xM , tK ; φ) ; φ) , k = 0, 1, . . . , K − 1, (tK , xM ) ∈ I × ΩM . (2)

Here UM,k ∈ RM is the discrete solution with M spatial discretizations and K represent the discrete time
steps over the time interval I. The discrete operator FM,K depends on the exact form of Eq. (1) and
thus provides a forward map from the causality (initial and boundary conditions, system properties and
dynamics) to the effects (final solution).
For very complex dynamical phenomena, the forward operator F may not be known exactly due to
system uncertainties. In such situations, the forward problem must be solved via experiments for each set
of initial conditions, boundary conditions and parameter values. On the other hand, learning the forward
map with high-fidelity solvers for complex boundary conditions and a wide range of parameters can become
computationally expensive, especially as M , K, or the parameter space become large. In that regard, there
are various challenges in obtaining a forward map in ocean acoustics and most solution strategies are not
applicable for real-time predictions.
Here we resort to data-driven techniques to obtain a generalized model that can approximate the ocean
acoustics forward map over a wide range of ocean parameters and also in the presence of uncertainties. These
data-driven models are set up to learn the inverse map from the effects to the causality. Being data-driven,
these models are independent of the governing equations providing the data and can be applied to data
obtained from any medium-fidelity numerical solver employing ray tracing, high-fidelity solver using finite
element or finite difference methods, and experiment alike. Thus, they can be applied even when physical
models are not fully identified. Also, the data-driven models can be learned on low-dimensional subspaces
to enhance their scalability. According to the manifold hypothesis, such low-dimensional subspaces remain
embedded within the high-dimensional space representing the physical data. Therefore, identifying such
low-dimensional subspaces via data-driven models is possible with the correct choice of the data-driven
model.
In this research, we will specifically investigate two scenarios for data-driven learning of ocean acoustics:
time-domain solutions of second-order wave equations, and frequency-domain far-field underwater transmis-
sion loss propagation from point sources.

A. Time-domain underwater wave propagation

Underwater noise radiations in the ocean environment can be considered as a wave propagation phenom-
ena with the assumptions that the fluid is isotropic and homogeneous, viscous stresses are negligible, the

5
DL model for learning underwater ocean acoustics

process is adiabatic, and the spatial variations of the ambient pressure, density and temperature are rela-
tively very small. Under these assumptions, physical quantities like density and pressure can be expressed
as a sum of their steady values and unsteady fluctuations of much smaller amplitude. Using the aforemen-
tioned assumptions, the conservation of mass and momentum, and the equation of state of the fluid, the
propagation of pressure fluctuations for x = (x, y, z) can be represented as a second-order hyperbolic partial
differential equation,
1 ∂2p
∇2 p − = q(x, t; x0 , t0 ), (t, x) ∈ I × Ω, (3)
co 2 ∂t2
∂2 ∂ 2
∂ 2
subject to the initial and boundary conditions, where ∇2 = ∂x2 + ∂y 2 + ∂z 2 . Here, q is a source term subject

to the source location x0 , and initiation point t0 . c0 is the constant speed of sound in the medium domain
represented by Ω. Equation (3) can also be expressed in the general form using Eq. (1) as

U (x, t; φ) = F (U (x, t; φ) ; φ) , (t, x) ∈ I × Ω, (4)
∂t
 
where U = p, ∂p
∂t and φ represents the set of parameters including boundary conditions, domain and
source properties. When F is known we can obtain a closed-form solution for homogeneous properties,
and uniform initial and boundary conditions. For more complex ocean properties and non-uniform initial
and boundary conditions, numerical techniques like the finite element and finite different methods, etc. are
employed to solve Eq. (4).
Here, we employ a data-driven approach to learn the system represented by Eq. (4). The data-driven
learning problem can be stated as

p (x, ti+1 ; φ) = G (p (x, ti ; φ) ; Θ) , (5)

where φ represents the set of parameters including initial and boundary conditions, and domain and source
properties, which affect the solution of Eq. (3). G is the inverse operator dependent on the parameters
Θ, which is learned via data-driven techniques. Once G is learned successfully, it can be used during the
prediction phase to obtain the solution at the i+1-th iteration if the solution at the i-th iteration is provided.
Autoregressive application of G can then be employed over many iterations to obtain the long-time evolution
of the system.

B. Frequency-domain far-field ocean acoustics

For practical ocean acoustic applications on very large ocean domain, Eq. (3) is converted to frequency-
domain via Fourier transform to obtain the Helmholtz equation,

∇2 + k 2 (x) p (x, ω) = q (x, ω; x0 ) ,


 
x ∈ Ω, (6)

subject to the boundary conditions. Here we consider a two-dimensional ocean domain. Herein, x = (R, z)
∂2 ∂2
and ∇2 = ∂R2 + ∂z 2 . k (x) is the wavenumber of the medium at a radial frequency of ω. Here R can be
considered as the direction along with the range of the ocean domain from some reference point and z is the
depth from the ocean surface.

6
DL model for learning underwater ocean acoustics

Pressure fluctuations in the ocean domain are subjected to geometric spreading, which can lead to
many orders of reduction in their magnitude over a large domain. Under such conditions, transmission
loss is generally considered for representing far-field acoustics, where transmission loss can be represented
 
in decibels as −20 log10 p(x,ω)
p0 , where p0 is the reference pressure. For practical observations in ocean
acoustics, the range considered for far-field transmission loss measurement is usually one/two orders larger
than the ocean depth. Thus, frequency-domain far-field transmission loss computation can also be formulated
as a propagation problem along the range. Eq. (6) can thus be written in the general form using Eq. (1) as


U (R, z; φ) = F (U (R, z; φ) ; φ) , (R, z) ∈ Ω, (7)
∂R
 
∂p
where U = p, ∂R and φ represents the set of parameters including boundary conditions, domain and
source properties.
Similar to the time-domain wave propagation problem, the data-driven approach to solve Eq. (7) for the
pressure fluctuations can be formulated as

p (Ri+1 , z; φ) = G (p (Ri , z; φ) ; Θ) . (8)

G is the inverse operator with parameters θ, which must be learned via data-driven techniques. On successful
learning G can be used to predict the solution at the i + 1-th range-wise location if the solution at the i-th
range-wise location is provided. Autoregressive application of G can thus provide far-field propagation of
the pressure magnitude from a point source if it is employed autoregressively over several iterations.

III. DATA-DRIVEN LEARNING METHODOLOGY

As discussed in the previous section, the objective of data-driven learning is to obtain the operator G
such that any physically observed field variable (pressure, velocity, etc.) can be propagated in either time
or along a spatial direction. Furthermore, we want to learn G on a reduced dimension for scalability. For
any general high-dimensional observed variable of interest UM,k ∈ RM , obtained at propagation step k, the
data-driven learning problem can be formulated as the learning of three operators E, P and D,

An,k (xM , tk ; φ) =E (UM,k (xM , tk ; φ) ; θE ) ,

An,k+1 (xM , tk ; φ) =P (An,k (xM , tk ; φ) ; θP ) , (9)

ŨM,k+1 (xM , tk ; φ) =D (An,k+1 (xM , tk ; φ) ; θD ) .

Here ŨM,k+1 ∈ RM and An,k ∈ Rn , for all k = 0, 1, . . . , K − 1. To successfully learn the inverse map in
low-dimensions, ŨM,k+1 ≈ UM,k+1 and n  M . Here the operator E compresses the high-dimensional
system to a low-dimensional spatial subspace, P propagates the low-dimensional system, and D expands the
low-dimensional evolved solution to the original high dimension. Thus, G can be considered a composition
of these three operators,
G = (E ◦ P ◦ D; Θ) (10)

7
DL model for learning underwater ocean acoustics

where Θ = (θE , θP , θD ). Here, we will consider learning the low-dimensional spatial representation and the
evolution of the low-dimensional system as separate learning problems. These are subsequently discussed in
this section in detail.

A. Learning low-dimensional spatial representation

Two approaches for learning the low-dimensional spatial representation are discussed below namely the
projection-based POD model and the deep neural network based on convolutional autoencoder.

1. Projection-based low-dimensional modeling

By projecting the high-dimensional solution to a low-dimensional subspace, projection-based models such


as POD attempt to find the best approximation of the high-dimensional solution. The best approximation
is considered as the low-dimensional subspace that minimizes the L2 error norm between the true and
approximate solutions. Like most POD applications we consider a Galerkin (i.e., orthogonal) projection.
Here we briefly present the POD employed via the method of snapshots. An elaborate explanation can be
found in Ref. (Rowley and Dawson, 2017).
T
Let us consider a set of snapshots UM,K = [u1 , u2 , . . . , uK ] obtained at K time intervals, where
M
u ∈ R . POD assumes that the snapshot matrix can be decomposed into a linear superposition of spatial
basis functions, each of which is associated with temporal coefficients
K
X
uM,k (x, tk ) = aj,k (tk ) vj (x) , k = 1, 2, . . . , K. (11)
j=1

Here we assume that the rank of U is K and M > K. In order to obtain these bases, we perform a singular
value decomposition,
K
X
U = VΣWT = σj vj wj T , (12)
j=1

where VT V = WT W = I. Thus, V, W ∈ RM ×K and Σ ∈ RK×K . The singular values are obtained from

UT Uwj = σj 2 wj , j = 1, 2, . . . , K. (13)

Practical computation of σj 2 is performed via an eigenvalue analysis of Eq. (13) to obtain the POD modes
as,
vj = Uwj /σj , j = 1, 2, . . . , K. (14)

For efficient model reduction, we intend to obtain the smallest set of n modes
n
X
ũM,k (x, tk ) = aj,k (tk ) vj (x) , k = 1, 2, . . . , K,
j=1 (15)
=⇒ ŨM,K (x, tK ) = Vn (x) An,K (tK ) ,

8
DL model for learning underwater ocean acoustics

such that ŨM,K ≈ UM,K .


POD is a widely used approach for approximating high-dimensional solutions to various physical prob-
lems. However, its efficiency is reduced when approximating solutions to hyperbolic partial differential
equations and convection-dominated problems. The worst-case error on approximating high-dimensional
solutions via projection to a lower-dimensional subspace of dimension n is defined as the Kolmogorov n-
width. It has been shown that the Kolmogorov n-width decays at a sub-exponential rate with n even for
linear transport problems once we consider non-uniform or discontinuous initial conditions (Greif and Ur-
ban, 2019). This serves as a motivation for employing nonlinear subspace to approximate high-dimensional
solutions as presented next.

2. Convolutional autoencoders

DL models, composed of deep neural networks, attempt to find the best low-dimensional nonlinear sub-
space, which can represent the high-dimensional system. Such nonlinear mapping between low-dimensional
and high-dimensional representations becomes especially useful for learning transport problems, where the
Kolmogorov n-width decays slowly. Here we employ DL models as an autoencoder for obtaining the
low-dimensional spatial representation. The autoencoder consists of an encoder, which compresses high-
dimensional spatial data to a much lower-dimensional set of latent states. The latent states can be subse-
quently expanded to their high-dimensional representation by the decoder. The autoencoder is a combination
of the encoder and the decoder and is learned in a semi-supervised manner, as we specify the input and
output to the autoencoder, but do not supervise how the latent states are learned.
The specific DL architecture considered for both the encoder and the decoder is Convolutional Neural
Network. Convolutional Neural Network is specifically selected as they can provide geometric priors like
translational equivariance, translational invariance, and stability in presence of deformations and scale sepa-
ration to the nonlinear subspace (Bronstein et al., 2017). Interpretability of Convolutional Neural Network’s
learning mechanism for mechanistic problems in a three-dimensional Euclidean domain has been presented
in Ref. (Mallik et al., 2022). The method of learning low-dimensional representation of high-dimensional
physical data via convolutional autoencoders is briefly elaborated below.
M
Let us consider a compact M -dimensional Euclidean domain Ω = [0, 1] ⊂ RM on which square inte-
grable functions U ∈ L2 (Ω) are defined. We consider a generic semi-supervised learning environment for
obtaining an unknown function a : L2 (Ω) → Y that is not observed on a training set, and square integrable
function Ũ : Y → L2 (Ω) that is observed on the training set. Thus,

{Ui , Ũi ∈ L2 (Ω) , ai = a (Ui ) , Ũi = Ũ (ai )}i∈I , (16)

where Y = Rn .
The convolutional encoder consists of several convolutional layers of the form C(U) which sequentially
performs a set of convolution operations of the form g = κΛ (U) and a point-wise non-linearity ξ, acting on

9
DL model for learning underwater ocean acoustics

a p-dimensional input U(x) = (U1 (x), . . . , Up (x)),

C(U) = ξ (κΛ (U)) . (17)


 
The convolution operation κΛ (f ) operates by applying a set of kernels (or filters) Λ = λl,l0 , l =
0
1, . . . , , p, l = 1, . . . , p,
p 
X 
gl (x) = fl0 ∗ λl,l0 (x), (18)
0
l =1

producing a q-dimensional output g(x) = (g1 (x), . . . , gq (x)), often referred to as the feature maps. Here,
Z
(fl ∗ λ) (x) =
0 f (x − x0 ) λ((x0 ) dx0 , (19)

denotes a standard convolution operation. The nonlinearity in our learning mechanism is introduced with
a leaky rectified linear unit (leakyReLU) (Maas et al., 2013), ξ.
The convolutional encoder composed of L convolutional layers put together has a general hierarchical
representation,

E(U) = C(L) ◦ . . . C(2) ◦ C(1) ; θE (U), (20)

where θE = {Λ(1) , . . . , Λ(L) } is the set of all network parameters (all the kernel coefficients) of the convo-
lutional encoder. The model is considered deep when it consists of multiple convolutional layers. For the
present purpose we assume convolutional neural networks with three convolutional layers, which still classify
as a deep model according to popular consensus. The convolutional decoder is obtained similarly, based on
the output of the convolutional encoder,

D(a) = C 0 (L) ◦ . . . C 0 (2) ◦ C 0 (1) ; θD (a),



(21)

where C 0 (a) = ξ (κ0 Λ (a)) represent a transpose convolution layer and κ0 Λ (a) represent a transpose convo-
lution operation.
In this article, we will primarily perform spatial dimension reduction via the convolutional autoencoder.
However, we will also present some comparisons with results obtained via POD. Figs. 1 and 2 show a
comparison of dimensional reduction techniques discussed here for POD and convolutional autoencoder,
respectively. Similar to POD, our dimension-reduction objective with the convolutional autoencoder is to
find the lowest-dimensional subspace of dimension n such that ŨM,K ≈ UM,K .

B. Learning system evolution

The data-driven learning of the system evolution presented in Eq. (9) is posed as a sequence-to-sequence
learning problem. To achieve this we employ several deeply stacked LSTM networks. LSTMs are gated
RNNs routinely used for accurately learning sequences with a long-term data dependency. The gating
mechanism of LSTMs provides them invariance to time warping. Thus they are significantly less affected
by vanishing gradients compared to non-gated RNNs.

10
DL model for learning underwater ocean acoustics

FIG. 1. An illustration of proper orthogonal decomposition for learning low-dimensional representation An,K from
high-dimensional data UM,K

A single LSTM cell consists of the input gate, the output gate, and the forget gate. The cell input, the
cell state, and the cell output are denoted by a, c and h, respectively. The cell output is then passed to a
fully connected layer with a linear activation to keep the input and output (y) dimensions consistent. The
operation of an LSTM cell is presented in Fig. 3, where i, f and c̃ represent the input gate, the forget gate
and the updated cell state, respectively. The operation of the LSTM cell can be explained via the following
equations.

ft =σ (Wf · [ht−1 , at ] + bf ) ,

it =σ (Wi · [ht−1 , at ] + bi ) ,

c̃t = tanh (Wc · [ht−1 , at ] + bc ) ,


(22)
ct =ft ∗ ct−1 + it ∗ c̃t ,

ot =σ (Wo · [ht−1 , at ] + bo ) ,

ht =ot ∗ tanh (ct ) ,

where W and b represent the weights and biases for each of the gates, respectively. On successful learning,
we assume yt ≈ at+1 . LSTM cells can be stacked together to learn how a present sequence of observables
evolve into a future sequence of observables. The gating mechanism of LSTMs ensures such learning can be
facilitated for long sequences, containing several observables, by mitigating the effect of vanishing gradients
during training of widely stacked LSTM.

11
DL model for learning underwater ocean acoustics

FIG. 2. An illustration of convolutional autoencoder network for learning low-dimensional representation An,K , of
high-dimensional data UM,K

FIG. 3. Structure of an LSTM cell

We explore three stacked LSTM architectures for sequence-to-sequence learning of system evolution:
a standard LSTM network, a single-shot LSTM (SS-LSTM) network and an autoregressive LSTM (AR-
LSTM) network. These LSTM networks are shown in Figs.4 (a), (b) and (c), respectively. For the standard

12
DL model for learning underwater ocean acoustics

LSTM network, the prediction of each unit of a sequence of length s depends only on the previous state of
the sequence. Long-term data dependency is learned via the evolution of these states over the whole input
sequence. For the SS-LSTM, a single-shot learning mechanism is evoked as the network memorizes all the
states of the current sequence length and predicts the complete output sequence at one shot. The AR-LSTM
is more similar to the standard LSTM but has a feedback mechanism. The previously obtained output units
are also sequentially fed back to the LSTM when new units are predicted. Thus, for a sequence length
s, 2s − 1 LSTM cells are required and thus 2s − 1 states have to be computed. Because of the increased
complexity in its operation, a longer training time is expected for the AR-LSTM than the other networks.

(a) Standard LSTM (b) SS-LSTM

(c) AR-LSTM

FIG. 4. Learning mechanism of various stacked LSTM architectures

C. Convolutional recurrent autoencoder architecture

The two separate data-driven learning tasks discussed previously are performed via the composite convo-
lutional autoencoder network (CRAN) shown in Fig. 5. Here we present a three-layer convolutional encoder
E and decoder D. The encoder and decoder presented here are generated with one-dimensional convolutional
kernels, which operate on one-dimensional inputs. However, the convolution encoding and decoding can be

13
DL model for learning underwater ocean acoustics

performed in an analogous manner for higher-dimensional Euclidean data sets. The LSTM propagator P
presented here is a general LSTM model. Thus, any one of the three LSTM architectures presented above
can be included in the CRAN model.

FIG. 5. Illustration of 1D CRAN model for data-driven learning

We have separated the data-driven learning task into a lower-dimensional spatial representation learning
task and a system evolution learning task. The convolutional autoencoder (Fig. 2) learns the low-dimensional
representation of the high-dimensional physical data, and the LSTM network learns the system evolution.
These are trained separately. On successful training of these individual components, the CRAN model is
employed during the prediction phase in a sequence-to-sequence manner while operating on a low-dimensional
subspace. For a sequence containing S units, this can be mathematically represented as,

An,S0 (xM , tS0 ; φ) =E (UM,S0 (xM , tS0 ; φ) ; θE ) ,

An,S1 (xM , tS1 ; φ) =P (An,S0 (xM , tS0 ; φ) ; θP ) , (23)

ŨM,S1 (xM , tS1 ; φ) =D (An,S1 (xM , tS1 ; φ) ; θD ) ,

where UM,S0 , ŨM,S1 ∈ RM ×S and An,S0 An,S1 ∈ RN ×S . Feeding back ŨM,S1 we can obtain ŨM,S2 as
the output. Autoregressive application of CRAN over a large number of iterations p can enable us to
h i
obtain Ũ = ŨM,S1 , ŨM,S2 , . . . , ŨM,SP , Ũ ∈ RM ×pS . Thus, starting with S initial propagation steps
autoregressive application of the CRAN model can evolve the system to pS propagation steps into the
horizon. The accuracy of the prediction Ũ will indicate CRAN’s data-driven learning capability.

14
DL model for learning underwater ocean acoustics

IV. TEST SCENARIOS FOR CRAN-BASED LEARNING

We consider two specific test cases to demonstrate the data-driven learning capability of the CRAN
model. For the time-domain wave propagation case we consider one-dimensional wave propagation with
spatially varying discontinuous initial conditions. For the frequency domain far-field ocean acoustic scenario,
we consider depth-dependent sound speed and source locations varying with the depth.

A. One-dimensional wave propagation with discontinuous initial condition

In the present study, we consider a homogeneous wave equation, such that q = 0. We consider a fully
reflecting boundary by applying pressure release boundary conditions,
   
L L
p − ,t = p , t = 0. (24)
2 2

Finally, we employ a spatially varying discontinuous initial condition depending on spatial location xs ,

1,

if x < xs ,

L L

p (x, 0; xs ) = x∈ − , . (25)
−1, 2 2
if x ≥ xs ,

For the aforesaid conditions, there is no closed-form solution to Eq. 3. Thus, a Galerkin-finite element
method was employed to obtain the pressure fluctuations. Temporal snapshots of the pressure fluctuations
obtained at regular intervals will serve as the training data for the convolutional autoencoder and LSTM
networks. For the bounded domain, the interference pattern obtained is periodic in nature. Thus, training
data obtained over a single period is sufficient to train the network. Here, we are also interested in the
generalized learning and prediction capability of the data-driven model with variation in the location of the
discontinuous initial condition (i.e., varying xs in Eq. 25). Thus, temporal snapshots over one period of
evolution for several spatially varying locations of the discontinuity were obtained to train the network. The
generalized learning capacity of the CRAN model will be determined by its ability to predict solutions for
any general location of discontinuity xs over the domain, which is not included in the training set. During
the prediction phase, we employ autoregressive CRAN operation following Eq. 23, to compute one period
of the evolution of the pressure fluctuation.

B. Far-field ocean acoustics with depth-dependent properties

For this scenario, we consider the ocean domain with depth-dependent sound speed following Munk’s
sound speed profile (Jensen et al., 2011) and also varying density. The computational ocean domain is
shown in Fig. 6. For the case considered here, the domain has a range of 100 kilometers and a depth of
5000 meters, with a level bathymetry profile. Since all properties and boundary conditions remain uniform
with the range, we consider this as a depth-dependent scenario.

15
DL model for learning underwater ocean acoustics

FIG. 6. Ocean domain with depth-dependent properties

Here, we compute the pressure fluctuation magnitude in presence of geometric spreading via BELLHOP,
a ray/beam tracing software (Porter). Ray/beam tracing are routinely used computational techniques in
ocean acoustics as they can be applied over a wide range of depth and range-dependent problems. Although
they cannot account for certain complex physical aspects that can be resolved via finite element method,
they are employed here for generating training data at a reasonable computational cost. The ocean bottom
is considered as a rigid boundary and the surface is considered as a pressure release boundary. Theoretical
details of the ray tracing method and details on the computational techniques employed in BELLHOP can
be obtained from references (Jensen et al., 2011) and (Porter), respectively.

BELLHOP is employed to generate snapshots throughout the range at regular intervals. These range-
wise snapshots serve as the training data. Furthermore, similar to the previous test case, we consider point
sources distributed at varying depth-wise locations zs . The generalized learning capability of the CRAN
can be demonstrated if on training the model over a range of depth-wise sources, we can predict the sound
propagation in the domain for any sound source depth not included in the training set.

V. RESULTS

In this section we will present the results obtained with the CRAN model for the two test scenarios
presented earlier. The capability of convolutional autoencoders to learn low-dimensional spatial represen-
tation, and the learning of system evolution via various LSTM models, will be discussed. The training of

16
DL model for learning underwater ocean acoustics

the CRAN model and its generalization on the test cases will also be discussed. The CRAN model and its
various components were trained with TensorFlow 2.5.0 (Abadi et al., 2015) libraries.

A. One-dimensional wave propagation with discontinuous initial condition

The results for the time-domain second-order wave propagation scenario with discontinuous initial con-
ditions are presented here. The finite element method-based solution of the pressure fluctuations obtained
for xs = 0 is presented in Fig. 7. The solution is obtained over one complete period and consists of 512
temporal snapshots over uniform time intervals. The spatial data is uniformly discretized on M points with
M = 1025. This solution will serve as the baseline case to compare the efficiency of POD and convolutional
autoencoder for learning low-dimensional spatial representations. It will also serve as a baseline to compare
the various LSTM architectures discussed earlier and select the most accurate and efficient network.

FIG. 7. Normalized pressure fluctuations for discontinuous initial condition obtained via finite element method:

xs /L = 0

1. Efficiently learning low dimensional spatial representation

Discontinuities in the initial condition can pose a challenge in model reduction via projection-based
models. A recent study (Greif and Urban, 2019) claims that for such initial conditions, the worst-case error
obtained on projecting the high-dimensional solution to a linear subspace of width n can decay only at 4n1/2

17
DL model for learning underwater ocean acoustics

or worse. We numerically inspect such results with POD on a bounded domain and then compare them
with the dimensional reduction efficiency of the convolutional autoencoder.
Fig. 8 (a) shows the relative L∞ error of POD predictions over the domain for all time steps, predicted
with varying size of POD modes, n. Here p̃ and p are predicted and target pressure fluctuations, respectively.
It can be observed that even with 64 POD modes, the relative L∞ error remains above 10% at all time-steps.
To inspect the efficiency of model reduction via POD we compute the ratio L∞ errors of POD with varying
n compared to n = 64 in Fig. 8 (b). It is observed that on increasing n from 4 to 64 modes, the worst-case
error (L∞ ) decreases by a maximum value of 9.5 at t = 0.45T . At other time-steps, the reduction ratio of the
worst-case error is even lower. Thus, the numerical results presented here follow the theoretical bounds for
worst-case error decay with projection-based models for wave propagation discontinuous initial conditions.

(a) Normalized L∞ error: POD (b) Normalized L∞ error of various POD models relative to
POD model with n = 64

FIG. 8. L∞ error of various POD models

The convolutional autoencoder was also trained with the baseline result, which serves as both training
input and output. Our aim is to find a low-dimensional subspace of dimension n, which can approximate
the baseline solution. We specifically select n = 64 to perform a direct comparison of the convolutional au-
toencoder results with their POD counterparts. The training was performed via ADAM optimizer (Kingma
and Ba, 2014), which minimizes the mean square error between the target and predicted training values.
The L∞ error of the convolutional autoencoder predictions over the domain at all time steps is compared
to its POD counterpart in Fig. 9. We can see that convolutional autoencoder errors are almost two times
lower than POD overall. The L∞ error does not exceed 10% except for a few time steps near t = 0.25T and
t = 0.75T . However, this could also be an artifact of computing relative L∞ error at time-steps when the
target solution over most of the domain is close to zero (please see Fig. 7).
The lack of efficiency of POD could be attributed to its orthogonal bases, which can be considered
analogous to Fourier modes. Being uniform in nature over the domain, these POD bases are not efficient

18
DL model for learning underwater ocean acoustics

FIG. 9. Comparison of proper orthogonal decomposition (POD) and convolutional autoencoder net (CAN) with

n = 64

in approximating discontinuities in the solution. On the other hand, convolutional neural networks are not
restricted to such orthogonal bases, and can select basis functions from an affine subspace. The results
confirm that such affine bases and their nonlinear activation can approximate discontinuities in the solution
with greater efficiency.

2. Selection of efficient autoregressive evolution model

To compare the various LSTM networks at our disposal, we train each of them separately on the baseline
finite element method-based solution. The LSTM training for this specific purpose is performed directly on
the full-dimensional solution to quantify errors that arise solely during the autoregressive prediction via the
LSTM. The training input was divided into 32 training batches, where each batch represents one sequence.
To obtain the training output finite element method results were obtained for one period but from time
t = T /32 to t = T + T /32, where T is the time period. The training output also consisted of 32 batches over
one period of data. Thus, the training output batches correspond to each input training batch but shifted by

19
DL model for learning underwater ocean acoustics

T /32. The evolution from input to output batches of the period is learned by the LSTM architecture. The
LSTMs were trained via ADAM with the objective of minimizing the mean square error between the target
and predicted values. The training for each of these networks converged with less than 1% time-averaged
L1 error computed over the whole period.
The autoregressive prediction capability of the three trained LSTM models was subsequently tested for
the next period of the data. Since the system is periodic, there is little difference between the training and
test data sets. Initializing with a sequence spanning t = T /32, the autoregressive operation was carried
out 32 times to predict a whole period of evolution. The L1 errors in the predicted pressure fluctuations p̃
over the domain for the whole period, obtained with the various LSTM networks, are presented in Fig. 10.
We can see that errors in plain LSTM predictions begin increasing very quickly after a few iterations of its
operation. Thus, the L1 errors regularly cross 10% after t = 0.4T or about 13 iterations of the plain LSTM.
On the other hand, the L1 errors obtained via the SS-LSTM and AR-LSTM never rise beyond 5% at any
time-step during the period. Thus, even though the plain LSTM network showed training convergence, it
fails to predict a similar data set during autoregressive operation with only the first test batch provided for
initialization. Eq. 23.
To understand the unsatisfactory performance of the plain LSTM compared to its SS-LSTM and AR-
LSTM counterparts, we explore their learning mechanisms. During training, the plain LSTM network is
only trained to learn the mapping between units of the input sequence to their corresponding units of their
output sequence (please see Fig. 4 (a)). Thus, the dependency between individual members of the input
sequence is only provided via the LSTM cell states. Such dependency between individual time-steps in an
evolving transport phenomenon is not strong enough for autoregressive operations. Any error in any of the
time-steps will quickly accrue over several autoregressive operations and lead to poor long-time performance.
On the other hand, the AR-LSTM (Fig. 4 (c)) has a feedback mechanism where the newly predicted units
also influence the prediction of the next time-step of the output sequence. Thus, for a sequence containing
s units, the last unit of the output sequence is connected to all s input units and s − 1 previously predicted
output units via the cell states. Such a feedback mechanism requires a significantly larger number of
training computations than the plain LSTM. However, it clearly enables better learning of data-dependency
and successful autoregressive operation.
Finally, the SS-LSTM (Fig. 4 (b)) employs a completely different strategy. It learns the whole input
sequence at once and predicts the output sequence together. Such a one-shot operation induces a common
memory where the dependency between the individual members can be better learned. The SS-LSTM
requires larger memory than the plain LSTM during the single-shot learning but its total number of training
computations are fewer than the AR-LSTM. Most importantly, its long-term system evolution prediction
via autoregressive operations is sufficiently accurate.
The three network parameters and training time are presented in Table I. The number of hidden units
(Nh ), which controls the model size, was selected via a hyperparameter tuning to ensure that the time-
averaged training L1 error does not exceed 1%. We can see that a highly parameterized plain LSTM
network with a large number of hidden units was employed but failed to accurately predict the long-time

20
DL model for learning underwater ocean acoustics

FIG. 10. Comparison of LSTM models for autoregressive evolution

evolution. The AR-LSTM model had four times lower hidden units and a much lower number of trainable
parameters. However, because of the large number of training computations, its training time is longer
than even a highly overparameterized plain LSTM. Finally, the SS-LSTM model had a two times lower
number of hidden neurons than the AR-LSTM. However, it had a larger number of trainable parameters
compared to the AR-LSTM. This is due to the single-shot learning mechanism connecting the whole input
sequence to the LSTM cell and also subsequently to the fully connected layer, leading to a larger number of
connections. Importantly, the SS-LSTM requires almost 2.5 times lower training time than the AR-LSTM
with comparable prediction errors. Thus, it is the most efficient sequence-to-sequence model and will be
employed for the subsequent results.

3. Learning spatially distributed discontinuities via CRAN

The results for the generalized learning capacity of our CRAN model for spatially distributed disconti-
nuities along the domain will be discussed here. To train the CRAN, temporal snapshots of the evolution of

21
DL model for learning underwater ocean acoustics

TABLE I. Training time and model size for various LSTM networks

Training parameters LSTM AR-LSTM SS-LSTM

Nh 512 128 64

Trainable parameters 2.36 × 106 3.95 × 105 6.81 × 105

Training time (GPU seconds) 750 1031 410

the wave propagation over a period were generated for ten different locations of discontinuity in the initial
condition (xs in Eq. 25). Two different validation sets for xs and five test sets for xs were also selected.
These 17 xs values were randomly sampled from the domain and their locations were adjusted so that they
lie on the nearest node of the finite element mesh. Thus, the same mesh could be applied to generate all
the 17 xs sets. The data set for each different xs had 512 temporal snapshots.

The CRAN training was separated into a convolutional autoencoder training and an LSTM training on
the reduced-dimensional latent states obtained from the trained convolutional autoencoder’s encoder. First,
we discuss the convolutional autoencoder training. The training was performed via ADAM with the standard
objective of minimizing the mean square error between the target and predicted results. A hyperparameter
tuning was performed for optimal convolutional autoencoder training and generalization. Several network
parameters like convolutional neural network filter size, number of filters, etc. were considered for optimal
network tuning. However, the most important hyperparameter considered here is the number of training
epochs. Fig. 11 (a) shows the time-averaged L1 error obtained over the whole period, for all the ten xs
sets combined, with respect to the training epochs. Similar results are also presented for the two validation
xs sets combined. We can observe that with 10 xs data sets for training, the network shows a significant
influence of overfitting. The onset of overfitting is indicated by the increase in the validation error. We can
see that somewhere after 1200 training epochs, the convolutional autoencoder shows overfitting. Thus, the
convolutional autoencoder trained with 1200 epochs is considered the optimally trained network.

Next, we discuss the LSTM training. Since we are going to train the LSTM on the reduced-dimensional
latent states obtained from the trained convolutional encoder, there is no specific target latent data sets.
Thus, we instead inspect the accuracy of the LSTM training and validation sets on the reconstructed
CRAN predictions and the target, for both training and validation data sets. Since we are keeping the
convolutional autoencoder network fixed, any change in the errors will solely be due to the LSTM network.
Fig. 11 (b) shows the CRAN time-averaged L1 error for the both training and validation cases. Similar
to the convolutional autoencoder training, we also observe overfitting during the LSTM training. The
validation set indicates that overfitting sets in after 1100 training epochs. Thus, the LSTM network trained
for 1100 epochs is considered the optimal one. The parameters of the trained convolutional encoder, the
LSTM network and the convolutional decoder are shown in Tables II, III and IV, respectively, for a sequence
containing 16 snapshots.

22
DL model for learning underwater ocean acoustics

(a) Training and validation L1 error: convolutional (b) Training and validation L1 error: CRAN
autoencoder

FIG. 11. Effects of training epoch on convolutional autoencoder and CRAN training and validation: 1D wave

propagation

TABLE II. Convolutional encoder parameters: 1D wave propagation

Output # filters/
Layer # Layer type Kernel size Stride
dimension # neurons

Input (16, 1025, 1) - - -

1 Convolution 1D (16, 1020, 32) (6) 32 1

Max Pool 1D (16, 510, 32) (2) - -

2 Convolution 1D (16, 506, 64) (5) 64 1

Max Pool 1D (16, 253, 64) (2) - -

3 Convolution 1D (16, 250, 96) (4) 96 1

Flatten (16, 24000) - - -

4 Fully Connected (16, 64) - 64 -

The CRAN network with the trained convolutional autoencoder and LSTM was next employed on the
test xs locations to predict the evolution of the wave propagation over a whole period. The autoregressive
prediction was initiated with a sequence spanning t = T /32. The accuracy of the predicted solution compared
to the target was measured via Structural Similarity Index Measure (SSIM), a statistical measure used for
comparing two images or two same dimensional data sets with pixelated representation. The formal definition

23
DL model for learning underwater ocean acoustics

TABLE III. LSTM parameters: 1D wave propagation

Output
Layer # Layer type # neurons
dimension

Input (16, 64) -

5 LSTM (16, 256) 256

Fully Connected (1024) 1024

Reshape (16, 64) -

TABLE IV. Convolutional decoder parameters: 1D wave propagation

Output # filters/
Layer # Layer type Kernel size Stride
dimension # neurons

Input (16, 64, 1) - - -

6 Fully Connected (16, 24000) - 24000 -

Reshape (16, 250, 96) - - -

7 Convolution 1D Transpose (16, 253, 64) (4) 64 1

Upsampling 1D (16, 506, 48) (2) - -

8 Convolution 1D Transpose (16, 510, 32) (5) 32 1

Upsampling 1D (16, 1020, 32) (2) - -

9 Convolution 1D Transpose (16, 1025, 1) (6) 1 1

of the SSIM is provided in the Appendix. An SSIM of 1.0 between two images indicates that the two images
are identical. SSIM decreases from 1 as the similarity between two images decreases. Since the solutions
here can be represented in a pixelated format on a uniform space and time grid, SSIM is employed for
comparing them and quantifying CRAN prediction accuracy. The SSIMs for the five test xs sets and the
two validation sets are presented in Fig. 12. We can observe that except for the case of xs = −0.47L, the
predicted solutions show 85% or more similarity to the target solutions.

To further investigate the nature of the predicted results we compare the target (‘true’) and predicted
(‘pred’) solutions at several temporal phases of the evolution for xs = −0.0625L, in Fig. 13. The results
show some differences between the magnitude of the target and predicted solution at t = 0.07T . Some
differences in the peak solution in the right-half of the domain and some on the left-half of the domain are

24
DL model for learning underwater ocean acoustics

FIG. 12. Structural similarity index measure for validation and test sets: 1D wave propagation

also observed at t = 0.26T and t = 0.51T , respectively. However, at all the time-steps, the overall structure
of the predicted solution follows the target solutions very closely. This can be observed more clearly on
projecting the results in a x − t plane as shown in Fig. 14. We can see the differences in the magnitude
of the target and predicted solutions at some time-steps but the predicted wave propagation always follows
the characteristic lines. Such characteristics are an integral part of the solutions of any hyperbolic partial
differential equation, including wave propagation. Thus, the predicted solutions show physical consistency.
The predicted and target solutions were also compared for another case, xs = 0.4375L. The results are
shown in a x − t plane in Fig. 15. Similar to the solutions for xs = −0.0625L, we see some differences in
the predicted magnitudes over the whole domain at some time-steps over the period. However, we again
see that the characteristics of the wave propagation are accurately represented. The SSIM is a function
of not only the structural similarity of the images but also of the luminance and the contrast. For the
present solution, luminance and contrast can be associated with differences in the magnitude of target and
predicted images, whereas the shape of the characteristics can be associated with the structural quality. For
the present case, SSIM is 0.84. Based on the comparison of the solutions in Fig. 15, one can state that such
reduction from 1 is mostly due to the differences in the magnitude than the structural quality of the images.

25
DL model for learning underwater ocean acoustics

(a) t = 0.07T (b) t = 0.26T (c) t = 0.51T

FIG. 13. True and predicted solutions for 1D wave propagation at various time: xs /L = −0.0625

(a) Truth (b) CRAN prediction

FIG. 14. Comparison of truth and CRAN predictions for 1D wave propagation: xs /L = −0.0625

Since the characteristics are accurately captured by the CRAN for both cases, we may safely conclude the
CRAN was able to demonstrate generalized learning of the wave propagation physics for spatially distributed
discontinuities. Overall, for all the seven xs cases not observed by the network during training (Fig. 12),
CRAN could predict a mean SSIM of 0.90.

B. Far-field two-dimensional ocean acoustics

The results for the frequency-domain ocean acoustics are discussed here. As stated earlier, the training
data was obtained with ray/beam tracing solver BELLHOP. Here we have not considered any acoustic
attenuation. Thus the loss in acoustic intensity will be primarily due to geometric spreading, which varies
inversely with the square of the distance from the source in a two-dimensional domain. Such loss can lead to
several orders of reduction in pressure fluctuation magnitude far from the source. Thus, it is more sensible to
 
represent the far-field acoustic signature in a logarithmic form via the transmission loss −20 log10 p(x,ω)
p0 .
Here p is the magnitude of the propagated pressure fluctuations and p0 is the reference pressure. Moreover,

26
DL model for learning underwater ocean acoustics

(a) Truth (b) CRAN prediction

FIG. 15. Comparison of truth and CRAN predictions for 1D wave propagation: xs /L = 0.4375s

data in the form of transmission loss is also more suitable for training DL models than pressure fluctuations
with varying orders of magnitude.
The transmission loss distribution computed over the domain via BELLHOP for a source at a depth
zs = 950 meters is shown Fig. 16. The transmission loss distribution was obtained by applying 2,500
Gaussian beams over an angular fan spanning from −45◦ to 45◦ . The number of Gaussian beams was
selected here to obtain converged ray tracing solutions. BELLHOP and most other routine ray tracing
solvers lead to spurious results when the pressure fluctuation decreases many orders of magnitude (Jensen
et al., 2011; Porter). We observed such spurious results beyond 10 orders of reduction in the magnitude of
pressure fluctuations. Thus, such low order pressure fluctuations were filtered away leading to a flat region
in the transmission loss contour beyond 200 dB. For obtaining the data sets we considered Munk’s sound
speed profile sampled with 26 depth-wise velocities. Thus, the ocean depth was divided into 25 depth-wise
divisions with uniform fluid properties during BELLHOP’s numerical ray trajectory computations.
Since we consider a depth-dependent ocean environment with depth-varying sound speed and source
locations, a sufficiently high resolution of the data along the depth is preferred. Thus, the ray-tracing
solutions were observed at 2049 uniform depth-wise locations. This led to an extremely accurate resolution
of beam tracing solution along the depth. We expect that our data-driven autoencoders will still be able to
learn a sufficiently low-dimensional representation along the depth even for a very high depth-wise resolution.
Similarly, a range-wise discretization was also performed but a lower resolution was considered for the range.
To select the optimal resolution that will lead to sufficiently accurate sampling, a convergence study was
performed for various range-wise discretizations. 176, 352, 704 and 1408 uniform range-wise stations were
considered and the L1 and L2 error for sampling on all the coarser discretizations compared to 1408 range-
wise stations are presented in Fig. 17. The results show that 352 range-wise sampling points are sufficiently
accurate.
To train the CRAN with transmission loss distribution for various depth-varying source locations, BELL-
HOP was employed to obtain data for 21 source depths. Data were also obtained for 3 validation depths

27
DL model for learning underwater ocean acoustics

FIG. 16. Transmission loss distribution obtained via BELLHOP for zs = 950 m

FIG. 17. Sampling error for transmission loss on various range-wise discretizations relative to 1408 points

and 12 test depths. These 36 source depth locations were randomly sampled from the domain and then
corrected so that their location coincided with the nearest depth-wise sampling point. Similar to the previ-
ous case, CRAN training was separated into convolutional autoencoder training, and LSTM training on the
reduced-dimensional latent states obtained from the trained convolutional encoder. Similar to the previous
one-dimensional wave propagation scenario, we investigate the convolutional autoencoder training and val-
idation convergence with respect to training epochs. Fig. 18 (a) shows the time-averaged L2 training error
obtained over the whole domain for all the 21 training sets combined, with varying training epochs. Simi-
larly, the time-averaged L2 validation errors for three validation sets are shown. We can observe that, unlike

28
DL model for learning underwater ocean acoustics

in the previous scenario, negligible effects of overfitting are observed here as both training and validation
errors decrease uniformly with training epochs. We consider the convolutional autoencoder to be sufficiently
trained by 320 epochs as the rate of validation error decay becomes negligible by this point.

(a) Training and validation L2 error: convolutional (b) Training and validation L2 error: CRAN
autoencoder

FIG. 18. Effects of training epoch on convolutional autoencoder and CRAN network training and validation: far-field

transmission loss propagation

Next, we focus on the LSTM training. Following our technique employed for the one-dimensional wave
propagation scenario, the accuracy of the LSTM training and validation are inspected on the target high-
dimensional training and validation data sets and their reconstructed CRAN-predicted counterparts. As
explained earlier, since we are keeping the convolutional autoencoder network fixed, any change in the
errors will solely be due to the LSTM network. Fig. 18 (b) shows the time-averaged L2 error of the CRAN
predictions for both the training and validation sets. Similar to the convolutional autoencoder training,
we observe negligible effects of overfitting during the LSTM training. The validation set indicates optimal
training epochs to be 3200 as the validation error shows a very slow decay rate by this stage. It is clear
that increasing the number of spatially distributed data sets to 21 compared to 10 for the one-dimensional
wave propagation scenario, reduces overfitting and leads to better generalization. However, by inspecting
the validation error in conjunction with the training error and using early stopping, we have shown in the
previous scenario that CRAN generalization can be improved even when training data is limited.
The CRAN network with the trained convolutional autoencoder and LSTM was next employed on the test
source depths to predict the far-field propagation of the transmission loss along the range. The autoregressive
prediction was initiated with a sequence spanning R = R̄/11, where R̄ is the complete range of the domain
and R denotes the distance along the range. It is important to note that a larger initiation sequence was
required in this case for the far-field propagation to be computed accurately. This is primarily because the
transmission loss distribution pattern varied significantly with source depth. Thus, the CRAN is required

29
DL model for learning underwater ocean acoustics

to observe the near-field transmission loss evolution longer before it could properly predict the far-field
transmission loss distribution. The trained CRAN parameters for a sequence of 32 range-wise snapshots
spanning R = R̄/11, is presented in Tables V, VI and VII, respectively.

TABLE V. Convolutional encoder parameters: far-field transmission loss propagation

Output # filters/
Layer # Layer type Kernel size Stride
dimension # neurons

Input (32, 2049, 1) - - -

1 Convolution 1D (32, 2036, 16) (14) 16 1

Max Pool 1D (32, 509, 16) (4) - -

2 Convolution 1D (32, 500, 32) (10) 32 1

Max Pool 1D (32, 125, 32) (4) - -

3 Convolution 1D (32, 120, 48) (6) 48 1

Flatten (32, 5760) - - -

4 Fully Connected (32, 64) - 64 -

TABLE VI. LSTM parameters: far-field transmission loss propagation

Output
Layer # Layer type # neurons
dimension

Input (32, 64) -

5 LSTM (32, 512) 512

Fully Connected (2048) 2048

Reshape (32, 64) -

Similar to the earlier results, the accuracy of the predicted solution compared to the target was measured
via SSIM. Fig. 19 shows the SSIM for the various validation and test cases along the domain. It can
be observed that the test cases cover most of the domain and thus can properly indicate the CRAN’s
generalization capacity. The accuracy of the CRAN for all cases is above 85%. Also for most of the test and
the validation cases, the CRAN accuracy is close to 90% or above based on the SSIM values.

30
DL model for learning underwater ocean acoustics

TABLE VII. Convolutional decoder parameters: far-field transmission loss propagation

Output # filters/
Layer # Layer type Kernel size Stride
dimension # neurons

Input (32, 64, 1) - - -

6 Fully Connected (32, 5760) - 24000 -

Reshape (32, 120, 48) - - -

7 Convolution 1D Transpose (32, 125, 32) (6) 32 1

Upsampling 1D (32, 500, 32) (4) - -

8 Convolution 1D Transpose (32, 509, 16) (10) 16 1

Upsampling 1D (32, 2036, 16) (4) - -

9 Convolution 1D Transpose (32, 2049, 1) (14) 1 1

To further investigate the CRAN predictions we compare the fully propagated predictions and the target
solutions in Figs. 20, 21 and 22, for source depths zs = 161 m, zs = 1802 m and zs = 3550 m, respectively.
Thus, they represent three completely different transmission loss distribution patterns. We can see an
excellent similarity between the predicted and target solutions for zs = 161 m (Fig. 20), with only minor
differences in magnitude near the upper limit of the transmission loss values. This is also corroborated by
a SSIM of 0.97. For zs = 1802 m (Fig. 21), the transmission loss distribution pattern of the predictions
closely match the target solutions. However, some differences in the magnitude are observed far into the
propagated range at both high and low depths. However, overall the prediction show high accuracy with a
SSIM of 0.953.

The CRAN predictions at zs = 3550 m (Fig. 22) show the lowest SSIM of 0.85. The nearest training
source to this test source was at zs = 3250 m. The 300 meters difference in source depth between test
and training for this case is the farthest among all the test cases. Thus, one may also consider this to
be an extrapolation and also the most challenging test case. Notably, the outer structure of the predicted
transmission loss distribution for zs = 3550 m matches the target solution closely. However, some differ-
ences in the magnitude and also the transmission loss patterns can be observed throughout the domain,
especially for the lower transmission loss values. These differences increase as the predicted transmission
loss is propagated farther away from the source. These differences in both magnitude and structure of the
transmission loss patterns lead to a somewhat lower value of SSIM for this case.

Overall, generalized prediction capabilities of far-field transmission loss propagation via CRAN were
demonstrated for any general source depth over the domain. For all the 15 source depths that were not
considered for training (Fig. 19), the CRAN could predict with a mean SSIM accuracy of 94%. Even for

31
DL model for learning underwater ocean acoustics

FIG. 19. Structural similarity index measure for validation and test sets: far-field transmission loss propagation

(a) Truth (b) CRAN prediction

FIG. 20. Comparison of truth and CRAN predictions for far-field transmission loss propagation: zs = 161 m

the source depth almost 300 meters away from a training source depth over a domain depth of 5000 meters,
we could predict the solution with 85% overall accuracy. Also, the CRAN prediction closely matched the
transmission loss distribution patterns of the target solutions, and only showed some differences in magnitude
far away from the source location.

32
DL model for learning underwater ocean acoustics

(a) Truth (b) CRAN prediction

FIG. 21. Comparison of truth and CRAN predictions for far-field transmission loss propagation: zs = 1802 m

(a) Truth (b) CRAN prediction

FIG. 22. Comparison of truth and CRAN predictions for far-field transmission loss propagation: zs = 3550 m

The transmission loss distribution over the whole domain was obtained within 4-6 CPU seconds with the
CRAN model for each of the source depths. On the other hand, similar predictions for each source depth
via BELLHOP required 400-450 CPU seconds. This demonstrates the potential application of the CRAN
model for real-time decision-making and control.

VI. CONCLUSIONS

The convolutional autoencoder recurrent network (CRAN) was presented in this article for data-driven
learning of ocean acoustics. We specifically investigated two test problems. While the first test problem
involved underwater wave propagation with discontinuous initial conditions, the second problem represented
a far-field transmission loss from point sources in a two-dimensional domain. To demonstrate the generalized

33
DL model for learning underwater ocean acoustics

learning capability of CRAN we considered spatially distributed discontinuities in the initial condition for
the wave propagation problem, and depth-dependent point sources for far-field transmission loss learning.

The CRAN model consisted of a convolutional autoencoder for learning a low-dimensional representation
of high-dimensional physical data and a long short-term memory propagator for learning the system evolution
in the low-dimension. We showed that the convolutional autoencoder was able to successfully reduce high-
dimensional spatial data to a lower-dimensional latent subspace, whereas projection-based techniques like
POD proved inefficient in this task. On exploring various LSTM networks, we also found that single-
shot LSTM networks were able to learn the system propagation and perform long-time propagation in an
autoregressive manner.

It was observed that the CRAN model trained with temporal snapshots for a few spatially distributed
initial conditions or sources could provide long-time propagation into the horizon for other spatial distri-
butions not included in the training set. This was demonstrated both for the temporal propagation of
one-dimensional waves and for the far-field propagation of transmission loss patterns along the range. We
found some differences in the magnitude of CRAN predictions compared to true solutions for certain test
cases. These led to a few cases in both the scenarios showing overall prediction with 80-85% accuracy, as
the structural similarity index measures of the predicted and true solutions for these cases were in the range
of 0.80-0.85.

Overall, with training for 10 spatial locations of the discontinuity for the time-domain wave propagation
scenario, we could predict with a mean SSIM accuracy of 90% for another seven different spatial locations
of discontinuity not considered in the training set. Similarly, for the two-dimensional transmission loss
propagation scenario, with training for 21 different locations of source depth, we could predict for another
15 source depth locations with a mean SSIM accuracy of 94%. Furthermore, CRAN predictions for all
the test cases followed the characteristic lines of time-domain wave propagation and also the transmission
loss patterns in the two-dimensional domain. Thus, CRAN was able to achieve generalized learning of the
underlying physics of wave propagation phenomena.

We observed that CRAN training and predictions improved as the number of training cases for spatially
distributed parameters went up from 10 in the time-domain wave propagation scenario to 21 in the far-field
transmission loss learning scenario. However, it was shown that early stopping of network training via
careful inspection of validation errors can help improve generalization even when data is limited.

The CRAN is a data-driven model independent of how the data is obtained. Here, we considered medium-
fidelity ray tracing solvers for obtaining the ocean acoustic training data for CRAN. However, the CRAN
model is equally applicable if the present data is augmented with data obtained from higher-fidelity solvers
or experimental measurements. This shows the potential for CRAN, and data-driven deep learning models
in general, to obtain a digital twin of ocean acoustics over a wide range of parameters and encompassing
physical phenomena of varying complexity. The real-time online prediction of such a digital twin can be
applied for fast decision-making in marine vessel operations.

34
DL model for learning underwater ocean acoustics

ACKNOWLEDGMENTS

This research was supported by the Natural Sciences and Engineering Research Council of Canada
(NSERC) [grant number IRCPJ 550069-19]. We would also like to acknowledge that the GPU facilities at
the Compute Canada clusters were used for the training of our deep learning models.

APPENDIX

Here we present a short explanation of how the SSIM was computed for the predicted solutions. Readers
can explore Ref. (Wang et al., 2004) for further details. Let us have two matrices x and y of common
dimension. Then the SSIM can be computed as,
α β γ
SSIM (x, y) = l (x, y) · c (x, y) · s (x, y) , (26)

where l, c and s are measures of relative luminance, contrast and structure of the two images, respectively.
α > 0, β > 0 and γ > 0 are parameters used to adjust the relative importance of the three components.
The three components of SSIM are defined as,
2µx µy + c1
l (x, y) =
µx 2 + µy 2 + c1
2σx σy + c2
c (x, y) = 2 (27)
σx + σy 2 + c2
σxy + c3
s (x, y) = .
σx σy + c3
Here µx and µy are the means of x and y, respectively. σx and σy are the variances of x and y, and σxy
2 2
is covariance of x and y. The variables c1 = (k1 L) , c2 = (k2 L) and c3 = c2 /2, are employed to stabilize
the divisions with weak denominators, where L is the dynamic range of the pixel-values or the range of any
data that can be represented in a pixelated format. Ideally, k1 , k2  1. We select k1 = 0.01 and k2 = 0.03.
The power coefficients α, β and γ can be selected as 1 in most cases.

REFERENCES

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J.,
Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser,
L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M.,
Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F.,
Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., and Zheng, X. (2015). “TensorFlow: Large-
scale machine learning on heterogeneous systems” https://www.tensorflow.org/, software available
from tensorflow.org.

35
DL model for learning underwater ocean acoustics

Allen-Zhu, Z., Li, Y., and Song, Z. (2019). “A convergence theory for deep learning via over-
parameterization,” in International Conference on Machine Learning, PMLR, pp. 242–252.
Bergen, K. J., Johnson, P. A., Maarten, V., and Beroza, G. C. (2019). “Machine learning for data-driven
discovery in solid earth geoscience,” Science 363(6433).
Bianco, M. J., Gerstoft, P., Traer, J., Ozanich, E., Roch, M. A., Gannot, S., and Deledalle, C.-A. (2019).
“Machine learning in acoustics: Theory and applications,” The Journal of the Acoustical Society of Amer-
ica 146(5), 3590–3628.
Borrel-Jensen, N., Engsig-Karup, A. P., and Jeong, C.-H. (2021). “Physics-informed neural networks for
one-dimensional sound field predictions with parameterized sources and impedance boundaries,” JASA
Express Letters 1(12), 122402.
Bronstein, M. M., Bruna, J., LeCun, Y., Szlam, A., and Vandergheynst, P. (2017). “Geometric deep
learning: going beyond euclidean data,” IEEE Signal Processing Magazine 34(4), 18–42.
Bukka, S. R., Gupta, R., Magee, A. R., and Jaiman, R. K. (2021). “Assessment of unsteady flow predictions
using hybrid deep learning based reduced-order models,” Physics of Fluids 33(1), 013601, doi: 10.1063/
5.0030137.
Chen, R., and Schmidt, H. (2021). “Model-based convolutional neural network approach to underwater
source-range estimation,” The Journal of the Acoustical Society of America 149(1), 405–420.
Cheng, M., Fang, F., Pain, C. C., and Navon, I. (2020). “Data-driven modelling of nonlinear spatio-
temporal fluid flows using a deep convolutional generative adversarial network,” Computer Methods in
Applied Mechanics and Engineering 365, 113000.
Chi, J., Li, X., Wang, H., Gao, D., and Gerstoft, P. (2019). “Sound source ranging using a feed-forward
neural network trained with fitting-based early stopping,” The journal of the acoustical society of America
146(3), EL258–EL264.
Cybenko, G. (1989). “Approximation by superpositions of a sigmoidal function,” Mathematics of control,
signals and systems 2(4), 303–314.
Deo, I. K., and Jaiman, R. (2022). “Learning wave propagation with attention-based convolutional recurrent
autoencoder net,” arXiv preprint arXiv:2201.06628 .
Duarte, C. M., Chapuis, L., Collin, S. P., Costa, D. P., Devassy, R. P., Eguiluz, V. M., Erbe, C., Gordon,
T. A., Halpern, B. S., Harding, H. R. et al. (2021). “The soundscape of the anthropocene ocean,” Science
371(6529).
Erbe, C., Marley, S. A., Schoeman, R. P., Smith, J. N., Trigg, L. E., and Embling, C. B. (2019). “The
effects of ship noise on marine mammals—a review,” Frontiers in Marine Science 6, 606.
Ferguson, E. L. (2021). “Multitask convolutional neural network for acoustic localization of a transiting
broadband source using a hydrophone array,” The Journal of the Acoustical Society of America 150(1),
248–256.
Fotiadis, S., Pignatelli, E., Valencia, M. L., Cantwell, C., Storkey, A., and Bharath, A. A. (2020). “Com-
paring recurrent and convolutional neural networks for predicting wave propagation,” arXiv preprint
arXiv:2002.08981 .

36
DL model for learning underwater ocean acoustics

Gonzalez, F. J., and Balajewicz, M. (2018). “Deep convolutional recurrent autoencoders for learning low-
dimensional feature dynamics of fluid systems” .
Greif, C., and Urban, K. (2019). “Decay of the kolmogorov n-width for wave problems,” Applied Mathe-
matics Letters 96, 216–222.
Gupta, R., and Jaiman, R. (2022). “Three-dimensional deep learning-based reduced order model for un-
steady flow dynamics with variable reynolds number,” Physics of Fluids 34(3), 033612.
Hornik, K. (1991). “Approximation capabilities of multilayer feedforward networks,” Neural networks 4(2),
251–257.
Hsieh, W. W. (2009). Machine learning methods in the environmental sciences: Neural networks and kernels
(Cambridge university press).
Huang, Z., Xu, J., Gong, Z., Wang, H., and Yan, Y. (2018). “Source localization using deep neural networks
in a shallow water environment,” The Journal of the Acoustical Society of America 143(5), 2922–2932.
James, K. R., and Dowling, D. R. (2005). “A probability density function method for acoustic field uncer-
tainty analysis,” The Journal of the Acoustical Society of America 118(5), 2802–2810.
James, K. R., and Dowling, D. R. (2011). “Pekeris waveguide comparisons of methods for predicting acoustic
field amplitude uncertainty caused by a spatially uniform environmental uncertainty (l),” The Journal of
the Acoustical Society of America 129(2), 589–592.
Jensen, F. B., Kuperman, W. A., Porter, M. B., Schmidt, H., and Tolstoy, A. (2011). Computational ocean
acoustics, 2011 (Springer).
Kingma, D. P., and Ba, J. (2014). “Adam: A method for stochastic optimization,” arXiv preprint
arXiv:1412.6980 .
Lee, K., and Carlberg, K. T. (2020). “Model reduction of dynamical systems on nonlinear manifolds using
deep convolutional autoencoders,” Journal of Computational Physics 404, 108973.
Leshno, M., Lin, V. Y., Pinkus, A., and Schocken, S. (1993). “Multilayer feedforward networks with a
nonpolynomial activation function can approximate any function,” Neural networks 6(6), 861–867.
Maas, A. L., Hannun, A. Y., Ng, A. Y. et al. (2013). “Rectifier nonlinearities improve neural network
acoustic models,” in Proc. icml, Citeseer, Vol. 30, p. 3.
Mallik, W., Farvolden, N., Jaiman, R. K., and Jelovica, J. (2022). “Deep convolutional neural network for
shape optimization using level-set approach,” arXiv preprint arXiv:2201.06210 .
Mallik, W., Jaiman, R. K., and Jelovica, J. (2021). “Kinematically consistent recurrent neural networks for
learning inverse problems in wave propagation,” arXiv preprint arXiv:2110.03903 .
Mansour, T. (2019). “Deep neural networks are lazy: on the inductive bias of deep learning,” Master’s
thesis, Massachusetts Institute of Technology.
Mojgani, R., and Balajewicz, M. (2020). “Physics-aware registration based auto-encoder for convection
dominated pdes,” arXiv preprint arXiv:2006.15655 .
Niu, H., Ozanich, E., and Gerstoft, P. (2017). “Ship localization in santa barbara channel using machine
learning classifiers,” The journal of the acoustical society of America 142(5), EL455–EL460.

37
DL model for learning underwater ocean acoustics

Parish, E. J., and Carlberg, K. T. (2020). “Time-series machine-learning error models for approximate
solutions to parameterized dynamical systems,” Computer Methods in Applied Mechanics and Engineering
365, 112990.
Pinkus, A. (1999). “Approximation theory of the mlp model,” Acta Numerica 1999: Volume 8 8, 143–195.
Porter, M. B. “Bellhop: A beam/ray trace code” https://oalib-acoustics.org/website_resources/
AcousticsToolbox/Bellhop-2010-1.pdf.
Reichstein, M., Camps-Valls, G., Stevens, B., Jung, M., Denzler, J., Carvalhais, N. et al. (2019). “Deep
learning and process understanding for data-driven earth system science,” Nature 566(7743), 195–204.
Rowley, C. W., and Dawson, S. T. (2017). “Model reduction for flow analysis and control,” Annual Review
of Fluid Mechanics 49, 387–417.
Sorteberg, W. E., Garasto, S., Cantwell, C. C., and Bharath, A. A. (2019). “Approximating the solution of
surface wave propagation using deep neural networks,” in INNS Big Data and Deep Learning conference,
Springer, pp. 246–256.
Taddei, T. (2020). “A registration method for model order reduction: data compression and geometry
reduction,” SIAM Journal on Scientific Computing 42(2), A997–A1027.
Wang, Y., and Peng, H. (2018). “Underwater acoustic source localization using generalized regression neural
network,” The Journal of the Acoustical Society of America 143(4), 2321–2331.
Wang, Z., Bovik, A. C., Sheikh, H. R., and Simoncelli, E. P. (2004). “Image quality assessment: from error
visibility to structural similarity,” IEEE transactions on image processing 13(4), 600–612.

38

You might also like