Data-Driven Aerospace Engineering- Reframing the Industry With Machine Learning
Data-Driven Aerospace Engineering- Reframing the Industry With Machine Learning
Abstract
Data science, and machine learning in particular, is rapidly transforming the scientific and industrial landscapes.
The aerospace industry is poised to capitalize on big data and machine learning, which excels at solving the types
of multi-objective, constrained optimization problems that arise in aircraft design and manufacturing. Indeed,
emerging methods in machine learning may be thought of as data-driven optimization techniques that are ideal for
high-dimensional, non-convex, and constrained, multi-objective optimization problems, and that improve with in-
creasing volumes of data. In this review, we will explore the opportunities and challenges of integrating data-driven
science and engineering into the aerospace industry. Importantly, we will focus on the critical need for interpretable,
generalizeable, explainable, and certifiable machine learning techniques for safety-critical applications. This review
will include a retrospective, an assessment of the current state-of-the-art, and a roadmap looking forward. Recent
algorithmic and technological trends will be explored in the context of critical challenges in aerospace design, man-
ufacturing, verification, validation, and services. In addition, we will explore this landscape through several case
studies in the aerospace industry. This document is the result of close collaboration between UW and Boeing to
summarize past efforts and outline future opportunities.
Keywords: Aerospace engineering, Machine learning, Data science, Optimization, Manufacturing, Design
1 Introduction
Data science is broadly redefining the state-of-the-art in engineering practice and what is possible across the scien-
tific, technological, and industrial landscapes. The big data era mirrors the scientific computing revolution of the
1960s, which gave rise to transformative engineering paradigms and allowed for the accurate simulation of com-
plex, engineered systems. Indeed, scientific computing enabled the prototyping of aircraft design through physics-
based emulators that resulted in substantial cost savings to aerospace manufacturers. The Boeing 777 was the first
aircraft to have been designed completely from simulation without a mock-up. In a similar fashion, machine learn-
ing (ML) and artificial intelligence (AI) algorithms are ushering in one of the great technological developments of
our generation [1–7]. The success of ML/AI has been undeniable in traditionally challenging fields, such as ma-
chine vision and natural language processing, fraud detection, and online recommender systems. Increasingly, new
ML/AI opportunities are emerging in engineering disciplines where processes are governed by physics [8, 9], such
as materials science, robotics, and the discovery of governing equations from data. An overview of opportunities
in data-intensive aerospace engineering is shown in Fig. 1.
Advances in data-driven science and engineering have been driven by the unprecedented confluence of (i) vast
and increasing data, (ii) advances in high-performance computation, (iii) improvements to sensing technologies,
data storage, and transfer, (iv) scalable algorithms from statistics and applied mathematics, and (v) considerable
investment by industry, leading to an abundance of open-source software and benchmark problems. Nowhere is the
opportunity for data-driven advancement more exemplified than in the field of aerospace engineering, which is data
rich and is already built on a constrained multi-objective optimization framework that is ideally suited for modern
techniques in ML/AI. Each stage of modern aerospace manufacturing is data-intensive, including manufacturing,
testing, and service. A Boeing 787 comprises 2.3 million parts that are sourced from around the globe and assembled
* Corresponding author: [email protected]
1
Figure 1: Data science and machine learning have the potential to revolutionize the aerospace industry.
in an extremely complex and intricate manufacturing process, resulting in vast multimodal data from supply chain
logs, video feeds in the factory, inspection data, and hand-written engineering notes. After assembly, a single flight
test will collect data from 200,000 multimodal sensors, including asynchronous signals from digital and analogue
sensors, including strain, pressure, temperature, acceleration, and video. In service, the aircraft generates a wealth
of real-time data, which is collected, transferred, and processed with 70 miles of wire and 18 million lines of code for
the avionics and flight control systems alone. Thus, big data is presently a reality in modern aerospace engineering,
and the field is ripe for advanced data analytics with machine learning.
The use of data for science and engineering is not new [10], and most key breakthroughs in the past decade have
been fundamentally catalyzed by advances in data quality and quantity. However, with an unprecedented ability to
collect and store data [11], we have entered a new era of data-intensive analysis, where hypotheses are now driven by
data. This mode of data-driven discovery is often referred to as the fourth paradigm [12], which does not supplant,
but instead complements the established modes of theoretical, experimental, and numerical inquiry. In fact, we
again emphasize the strong parallel with the rise of computational science in the past half century, which did not
replace, but instead augmented theory and experiments. Just as today computational proficiency is expected in the
workforce, so will data science proficiency be expected in the future. Thus, the data-intensive transformation in
the aerospace industry can learn from the digital transformation over the past decades. Perhaps the largest change
will be in how teams of researchers and engineers are formed with domain expertise and essential data science
proficiency, along with changes in research and development cycles for industry.
The aerospace industry presents a number of unique opportunities and challenges for the integration of data-
intensive analysis techniques and machine learning. The transformative impact of data science will be felt across the
aerospace industry, including: (i) in the factory (design for manufacturability, re-use and standardization, process
control, safety, productivity, reproducibility, inspection, automation, drilling, shimming), (ii) in testing and evalu-
ation (streamlining testing, certification, anomaly detection, data-driven modeling), (iii) in the aircraft (inspection,
design and performance, materials and composites, maintenance, future product development), (iv) in human-
machine interactions (advanced design interfaces, interactive visualizations, natural language interactions) and (v)
in the business (supply chain, sales, human resources, and marketing). Because of the exacting tolerances required
in aerospace manufacturing, many of these high-level objectives are tightly coupled in a constrained multi-objective
optimization. Traditionally, this optimization has been too large for any one group to oversee, and instead, indi-
vidual components are optimized locally within acceptable ranges. However, unforeseen interactions often cause
considerable redesign and program delays. In the worst-case scenario, accidents may occur. With improvements in
2
Figure 2: Schematic overview of data-driven aerospace engineering.
end-to-end database management and interaction (data standardization, data governance, a growing data-aware
culture, and system integration methods), it is becoming possible to create a digital thread of the entire design, man-
ufacturing, and testing process, potentially delivering dramatic improvements to this design optimization process.
Further, improvements in data-enabled models of the factory and the aircraft, the so-called digital twin, will allow
for the accurate and efficient simulation of various scenarios. In addition to these operational improvements, ad-
vances in data-intensive analysis are also driving fundamental advances in aerospace critical fields such as fluid
mechanics [13, 14] and material science [15]. Importantly, data science works in concert with existing methods and
workflows, allowing for transformative gains in predictive analytics and design insights gained directly from data.
Figure 2 provides a schematic of this process.
Despite these tremendous potential gains, there are several challenges facing the integration of data-science in
the aerospace industry. Foremost, due to the safety-critical aspect of aerospace engineering, data-driven models
must be certifiable and verifiable, must generalize beyond the training data, and must be both interpretable and ex-
plainable by humans. Further, collecting vast amounts of data may lead to a data mortgage, where simply collecting
and maintaining this data comes at a prohibitively high cost, and the data is cumbersome for downstream analysis.
In the aerospace industry, customers are extremely diverse, with training and certification often done by a range of
companies and regulatory authorities, respectively, motivating a level of robustness that is typically not required
in other industries. There are also fundamental differences between commercial and defense sectors, limiting the
sharing of designs, transfer of technology, and joint testing; in fact, at Boeing, testing and evaluation of commercial
and defense vehicles have only recently been grouped under one organization, and these efforts still remain siloed.
The aerospace industry has always been a leader in optimization, and the earliest advances of the Wright broth-
ers may be viewed as an optimization of the flight control system. In the century that followed, much of the
aerospace industry has been centered around a constrained, multi-objective optimization with an exceedingly large
number of degrees of freedom and nonlinear interactions. ML algorithms are a growing set of data-intensive opti-
mization and regression techniques that are ideal for these types of high-dimensional, nonlinear, non-convex, and
constrained optimizations. Aided by advances in hardware and algorithms, modern ML is poised to enable this
optimization, allowing a much broader and integrated perspective. It is important to note that not all ML is deep
learning or artificial intelligence. ML is simply optimization that is directed on data rather than first principles models,
so that it fits naturally into existing efforts, but leverages a growing and diverse set of data.
In this paper, we will explore the evolving landscape of data-driven aerospace engineering. We will discuss
emerging technology and how these are changing what is possible in aerospace design, manufacturing, testing, and
services. These advances will be viewed through case studies, providing a review of past work and a roadmap
for the future. Further, we wish to temper the excitement about the immense opportunities of data science with a
realistic view of what is easy and what is hard. Finally, because the field is evolving rapidly, we want to establish a
common terminology, taxonomy, and hierarchy for technology surrounding data science and its ML algorithms.
3
2 Machine Learning and Optimization
In this section, we will discuss a number of mathematical architectures that are central to the data-intensive mod-
eling of aerospace systems. First, we will focus on understanding modern machine learning algorithms that will
be critical to process data from the aerospace industry. We then provide a brief overview of applied optimization,
which is the mathematical underpinning of machine learning. Next, we discuss specific machine learning exten-
sions and considerations for systems governed by physics, and also discuss the importance of scalable and robust
numerical algorithms. The treatment of these mathematical topics is far from comprehensive [9], and is instead in-
tended to provide a brief, high-level overview. Potential uses for these algorithms will be explored in later sections.
4
Clustering
Discrete
e.g., K-means,
Spectral
clustering
No Discrete or
Unsupervised Continuous?
Embedding
Continuous e.g., POD/PCA
Autoencoder
Diffusion maps
Model Generative
Training Models (GAN)
Data Partial (self-supervised)
Complex Labeled? Semi- Model or
System (truth) supervised Modify?
Modify Reinforcement
Learning
Discrete Classification
e.g, SVM,
Neural network
Yes Model or Model Discrete or Decision tree, …
Supervised Continuous?
Modify? Regression
Continuous e.g., Linear
Generalized linear
Gaussian process
Optimization
Modify
and
Control
Feedback signal to modify system or control parameters
L = kx − ψ(φ(x))k. (4)
When the encoder and decoder are linear functions, given by matrices, then the optimal embedding recovers the
classical singular value decomposition (SVD) or principal component decomposition (PCA) [9, 14]. However, these
functions may be nonlinear, defined by neural networks, resulting in powerful autoencoders.
5
(a) (b)
(c)
Figure 4: Mathematical architectures of commonly used neural networks, including (a) a feed-forward neural net-
work, (b) a deep autoencoder network, and (c) a deep convolutional neural network.
Rewards may be thought of as sporadic and delayed labels, leading to RL often being classified as semi-supervised
learning. One canonical example is learning a set of moves, or a long term strategy, to win a game of chess.
Reinforcement learning is often formulated as an optimization to determine the policy π(s, a), which is a proba-
bility of taking action a given state s, to maximize the total reward across an episode. Given a policy π, it is possible
to define a value function that quantifies the desirability of being in a given state:
!
X
t
Vπ (s) = E γ rt |s0 = s , (5)
t
where E is the expected reward over the time steps t, subject to a discount rate γ. Typically, it is assumed that the
state evolves according to a Markov decision process, so that the probability of the system occurring in the current
state is determined only by the previous state. Thus a large number of trials must be evaluated in order to determine
an optimal policy. This is accomplished in chess and Go by self-play [20], which is exactly what many engineered
systems are allowed to do to learn an optimal policy. Often, in modern deep reinforcement learning, a deep neural
network is used to learn a quality function Q(s, a) that jointly describes the desirability of a given state/action pair.
6
and a deep convolution NN (DCNN) that is widely used to extract features for machine vision. The term deep
refers to the number of neural network layers (typically 7-10) used to map from inputs to outputs. Neural networks
are universal function approximators [21], and they assume a compositional structure
The flexibility of this compositional structure enables the construction of classification or regression maps between
input and output data. With sufficient data, one can optimize for the NN weight parameters θn , usually via some
form of stochastic gradient descent. Deep learning is commonly used in commercial settings, with DCNNs being
the state-of-the-art for characterizing images and spatial correlations, and recurrent NNs (RNNs) enabling powerful
speech and text recognization software. NNs are typically supervised learners and require a significant amount of
data. They are also known to overfit to data and fail to generalize to new parameter regimes. Regardless, they
are a powerful technology that can power many of the supervised learning tasks required in modern data science
applications.
where the regularization g(·) imposes the desired physical constraint. The parameter λ is a hyper-parameter allow-
ing the user to impose an increasingly strong regularization to enforce this constraint. The importance of this con-
straint, or potentially multiple constraints, cannot be overstated in engineering and physics systems. Specifically,
this is where known physics or physical constraints can be explicitly incorporated into the data-driven modeling
process. Physics informed learning, often enacted with deep learning architectures, represents the state-of-the-art
in ML methods for the engineering and physical sciences.
Rather than imposing physical constraints explicitly, an alternative is to learn embeddings based on physical
models. This physics-guided paradigm involves learning embeddings from data produced by known first princi-
ples models of physics. The computation of these embeddings, in the context of aerospace and fluid dynamics, is
often known as modal analysis, and has become increasingly data-driven (either from simulation or observation) in
recent years [14, 33, 34]. The physical coupling between fluids and aerospace structures are particularly important
and the modes of these coupled interactions are impossible to discern by analyzing the Navier-Stokes equations and
structural models alone – instead, they are determined by the boundary interactions between the coupled models.
Thus, this physics guided architecture greatly enhances the understanding and design of robust engineering sys-
tems that can withstand complex interactions, turbulence, and instabilities. Improved reduced-order models of
fluid dynamics may further aid efforts in flow control [35, 36] and to reduce jet noise [37]. Modern modal analysis
techniques, such as POD/PCA [9], dynamic mode decomposition [38–40], Koopman mode decomposition [41, 42],
and resolvent mode analysis [43], naturally fall under the umbrella of unsupervised learning. Furthermore, the
dimensionality reduction achieved by these methods enable low-latency, efficient downstream tasks such as sys-
tem identification [44], airfoil shape optimization and uncertainty quantification [45], and reduced-order model-
ing [34, 46–54].
2.2 Optimization
All of machine learning relies on optimization [9, 14]. In fact, ML may be viewed as a growing set of applied opti-
mization algorithms to build models from data. Mathematical optimization [55] comprises three interconnected ar-
eas: theoretical underpinnings, algorithm design and implementation, and modeling. Modeling allows us to bring
in deterministic or physical descriptions of the real-world, along with information about structure of unknowns
and statistical measures that describe beliefs about error and uncertainty. The scope of problems that can be for-
mulated in this way is broad, ranging from sensing, estimation, and control [56–63] to machine learning [9, 16, 64]
and decisions under uncertainty [65]. In all cases a modeling process gives rise to an optimization problem, where
7
minimizing and maximizing over parameters leads to the desired inference or learning machine. Theoretical devel-
opments, such as convex and variational analysis, capture properties of problem formulations such as smoothness,
convexity, and well-posedness of the problems themselves. In addition, deriving provable behavior of such algo-
rithms is often important to guarantee performance.
A major distinction in optimization is between convex and nonconvex problems. Here, convexity refers to the
property of the objective function to be minimized or maximized and the set of values being optimized over. Convex
optimization problems [55] are extremely well studied, as there are fast and scalable generic solution techniques
with performance guarantees. Convex objective functions will have a single global minima or maxima (i.e., the
function has a single hill or valley), and gradient-based methods may be used to converge to this extremum. In
contrast, nonconvex models, which have many local minima and maxima (i.e., the objective function has many
local hills and valleys), form a much wider, and hence richer, class of problems. But, as such, there are no general
scalable solution techniques or strong convergence guarantees.
1
proxαf (z) = arg min kx − zk2 + f (x). (8)
x 2α
In words, we minimize the sum of the function and a scaled quadratic around a base point z and return the minimiz-
ing value. When the function f is arbitrary, evaluating the operator may be difficult or impossible. However, under
8
moderate assumptions that are satisfied for a wide range of applications, the proximity operator has a closed form
solution or a provably fast computational routine. Many algorithms for solving nonsmooth nonconvex problems
use the proximity operator as a subroutine, and are provably convergent [80].
X = L + S. (9)
The principal components of L are robust to the outliers and corrupt data in S. the low-rank matrix L is decomposed
via the SVD into L = ΦDVT , where coherent features are given by the matrix Φ. We generally use the first r
dominant columns of Φ, corresponding to the r features that explain the most variance in the data. The SVD
provides the best rank-r least squares approximation for a given rank r: L̂ = Φr Dr VrT . The target rank r must
be carefully chosen so that the selected features only include meaningful patterns and discard noise corresponding
to small singular values. The subsequent left singular vectors Φr are the desired low-rank features that span the
columns of L. The truncation parameter r may be determined using the optimal singular value truncation threshold
of Gavish and Donoho [97].
RPCA has tremendous applicability for modern problems of interest, including video surveillance [98] (where
the background objects appear in L and foreground objects appear in S), natural language processing [99], matrix
completion, and face recognition [100]. Matrix completion may be thought of in terms of the Netflix prize, where a
matrix of preferences is constructed, with rows corresponding to users and columns corresponding to movies. This
matrix is sparse, as most users only rate a handful of movies. The goal is to accurately fill in missing matrix entries,
revealing likely user ratings for movies they haven’t seen. We will demonstrate the use of RPCA for an aircraft
shimming application in the case study in Sec. 8.
9
3 Digital Twin and Enabling Technologies
Several key technologies are necessary to support the design, manufacturing, testing, and service of tomorrow’s
aerospace products, which will ultimately be enabled by a robust digital twin. These enabling technologies include
sensors and the internet of things, a comprehensive digital thread, rapid data access and data storage, virtual real-
ity to perform testing, reduced order models and discrepancy models, uncertainty quantification, autonomy, and
control. This section will review a number of these emerging technologies.
where u is the system state, the subscripts denote partial differentiation, and N(·) prescribes the generically non-
linear evolution. The vector β represents the parameters on which the system depends. Eq. (10) also requires a set
of initial and boundary conditions on a given domain. Typically, high-fidelity simulations of this system of equa-
tions is computationally involved, and it may be prohibitively expensive to simulate a truly multiscale system, such
as the turbulent fluid flow over a full-scale wing at all scales. Instead, it is often necessary to leverage reduced-
order models that capture dominant physical phenomena at a fraction of the cost. Of course, some systems are
time-independent and some are spatially independent. In either case, what is critical is that a proxy, physics-based
model exists that is capable of informing how a system behaves. From robots to manufacturing lines, an accurate
virtual representation holds tremendous promise for technological advancement.
The success of digital twin technology centers on accurate virtual representations of the physical world. For
precision manufacturing, for instance, current digital twin technologies do not provide the necessary level of fidelity
for a number of processes. What distinguishes digital twins from traditional modeling efforts is the integration of
multi-physics systems and components. Thus the digital twin often represents an entire engineering process versus
individual components in the process, for which we may have good models. However, the digital twin must ensure
end-to-end performance across the multi-physics system, placing stringent requirements on the fidelity of models
and how they communicate. Discrepancy models (discussed below) provide an adaptive framework, whereby the
digital twin can continuously learn updated, high-precision physics models over the course of time from its own
sensor network. The integration of data and multi-physics models across a system is a grand challenge, requiring
intelligent, robust, and adaptive architectures for learning and control. Many of the data-driven strategies discussed
here are ideally poised to help in building accurate and viable digital twin models.
10
part across several aircraft). In this case, the number of measurements required to estimate the full vector x may
be dramatically reduced from the ambient dimension n. These sparse sensors can be selected to best identify the
coefficients a in the basis Φr , thus enabling robust estimation of the high-dimensional state x. There are a number of
sparse sensing paradigms, including gappy sampling [106, 107], empirical interpolation methods (EIM) [108–110],
including the discrete empirical interpolation method, or DEIM, and compressed sensing [111–116].
Sparse sampling in a tailored basis [63, 117, 118] has been widely applied to problems in the imaging sciences
as well as to develop reduced order models of complex physics, such as unsteady fluid dynamics. Thus, even if
we cannot measure the full state x, it is often possible to estimate the state from r n point measurements in
the observation vector y ∈ Rr given by y = Cx = CΦr a, where C ∈ Rr×n is the point measurement operator.
Then, the reconstruction of the remaining state reduces to a least squares estimation problem for the coefficients
â = (CΦr )† y, where † is the Moore-Penrose pseudoinverse. This procedure was first used to reconstruct facial
images from a subsampled pixel mask [106]. Importantly, it permits a drastic reduction in computation by solving
for r n unknowns. The full state estimate x̂ is subsequently recovered using the feature basis: x̂ = Φr â. The
accuracy of reconstruction depends on the structure of the basis Φr and the choice of observations C.
The number of observations can be greatly reduced by optimizing the sensors to maximize the accuracy of
reconstruction. However, the combinatorial search over all nr possible sensor locations is computationally in-
tractable even for moderately large n and r. However, there are convex relaxations that can be solved using stan-
dard optimization techniques and semidefinite programs in O(n3 ) runtime. We advocate a greedy matrix volume
maximization scheme using the pivoted matrix QR factorization [63, 110]. This algorithm treats the selected mea-
surements as optimal rows of the linear operator CΦr , which designs sampled features Cφi to be as orthogonal to
each other as possible. Since the row selected columns φi are no longer orthonormal, greedy row selection methods
attempt to maintain near-orthonormality of the features Cφi for a numerically well-conditioned inverse.
where a(t) ∈ Rr is the time-dependent coefficient vector and Φr is an optimal basis of orthogonal columns, typically
generated by singular value decomposition; in this case r n. We then substitute this modal expansion into Eq. (10)
and leverage the orthogonality of Φr to yield the reduced evolution equations
da(t)
= ΦTr LΦr a(t) + ΦTr N(Φr a(t); β). (12)
dt
By solving this small system, the solution of the original high-dimensional system can be approximated. Of critical
importance is evaluating the nonlinear terms in an efficient way, for example using gappy POD or other sparse
sampling techniques. Otherwise, evaluating the nonlinear terms still requires calculations of the high-dimensional
function with the original dimension n. In certain cases, such as the quadratic nonlinearity of Navier-Stokes, the
nonlinear terms can be computed once in an off-line manner. However, parametrized systems generally require
repeated evaluation of the nonlinear terms as the basis may change with β. Regardless, ROMs allow one to approx-
imate an n-dimensional system with an r-dimensional system, where r n, making many computations possible
that would otherwise be intractable. As such, ROMs are transforming high-performance computing by allowing
for computational studies that have been intractable in the past. Recently, model discovery techniques are also
providing low-order models of complex systems [22, 24, 44, 120–122].
11
3.4 Discrepancy modeling
First principles modeling of physical systems has led to significant technological advances across all branches of
science. For nonlinear systems, however, small modeling errors can lead to significant deviations from the true
(measured) behavior. Even in mechanical systems, where the equations are assumed to be well-known, there are
often model discrepancies corresponding to nonlinear friction, wind resistance, etc. Discovering models for these
discrepancies remains an open challenge for many complex systems. There are many reasons why model discrep-
ancies occur [123, 124]. First, there may be measurement noise and exogenous disturbances. In this case, the Kalman
filter may be thought of as a discrepancy model where the mismatch between a simplified model and observations
is assumed to be a Gaussian process [56]. Second, the parameters of the system may be inaccurately modeled. Even
worse, the structure of the model may not be correct, either because important terms are missing or erroneous terms
are present. This is known as model inadequacy or model structure mismatch. Other challenges include incomplete
measurements and latent variables, delays, and sensitive dependence on initial data in chaotic systems.
Discrepancy modeling centers on parameter and structural uncertainties. One can consider the following gov-
erning equations for a given engineering system given by Eq. (10). In general, there is also an output measurement,
y = h(u; β) (13)
from which the state may be estimated. The discrepancy modeling problem evaluates the difference between the
model output of a quantity of interest φm (t) and the observed value φo (t):
where δφ is the discrepancy. The goal of discrepancy modeling is then to characterize the discrepancy δφ(t), for
example, using standard Gaussian process regression [125], dynamic mode decomposition [40] for approximating
δφ(t) with a best fit linear model, and/or model discovery for generating a nonlinear dynamical system [126].
where Z(β) denotes the predictions provided by the candidate models, p(Y | Z) is the observation likelihood given
by the observation model, and p(β) is the prior on the model coefficients. A result of such UQ metrics is to have
performance bounds and guarantees so that reduction to practice can be assessed and achieved.
12
(a) Robotics: Physics-Informed (b) Self-Driving Cars: Physics-Free (c) Autonomous Aircraft: Hybrid
Figure 5: Robotics and self-driving cars, both broadly under the aegis of autonomy, are two leading data-intensive
fields of the modern era. (a) Robotics is largely physics-informed, with strong adherence to governing physical
laws that enable robust control with data from high-fidelity sensors on the robot. (b) In contrast, self-driving cars
are largely physics-free, powered instead by computer vision algorithms which have learned to characterize its
environment and rules. (c) The future of aviation will require both key technologies.
for leveraging data, physics-informed and physics-free, have experienced tremendous success in the past decade.
They have also encountered fundamental limitations which must be overcome for the technologies to become com-
mercially viable. Self-driving cars have been empowered by deep learning algorithms, which are known to have
significant shortcomings in extrapolation tasks. Thus, some of the more spectacular failings of self-driving cars have
come from situations which were not part of their training sets. In contrast, our idealized physics-based models for
complex and networked systems, such as robots, often fail to properly account for discrepancies between models
and data, leading to control issues. The aerospace industry ultimately will need to leverage both paradigms in order
to fully exploit their large-scale data sets.
A traditional feedback system involves guidance (what state do we want a system to have), navigation (what
is the current state of the system), and control (how will we affect the system to achieve the desired state), shown in
Fig. 6. Key elements of autonomy and control that must be considered when leveraging of data science include:
• Performance criteria. Performance criteria for certification are generally specified in either the frequency
domain or the time domain. In the time domain, the primary consideration is the transient response to a
disturbance, relative to settling time, rise time, and overshoot. In the frequency domain, steady state stability
relative to harmonic forcing is considered (bandwidth, stability to loop closure, gain margin, phase margin).
• Safety criticality. Certification generally requires demonstration of a system meeting design criteria to a
certain probability of failure (e.g., “five nines” (1 × 10−5 ), “seven nines” (1 × 10−7 ) or “nine nines” (1 × 10−9 )).
• Human-in/on-the-loop. Effective control design must account for the interaction of the physical system with
human pilots and passengers. For example, in pilot-induced oscillation, the reaction time of a human pilot
relative to system dynamics can result in the pilot’s actions causing system oscillation to the point of instability.
Modeling human interaction with a physical system is exceptionally complex.
• Variable autonomy. When humans and autonomy interact, the relative amount of human authority versus
autonomous authority must be selected. Typically systems are neither fully human operated nor fully au-
tonomous. The greatest challenges appear when the level of autonomy changes during active operation.
disturbances noise
wk
<latexit sha1_base64="PWbUdsNhyoB63WGPInAlEqbRbtw=">AAAB6nicdVDLSgMxFM3UV62vquDGTbAIroZMi1p3pW5ctmgf0A4lk2ba0ExmSDJKGfoJblwo4tatf+EXuHPjt5hOK6jogQuHc+7l3nu8iDOlEXq3MguLS8sr2dXc2vrG5lZ+e6epwlgS2iAhD2Xbw4pyJmhDM81pO5IUBx6nLW90PvVb11QqFoorPY6oG+CBYD4jWBvp8qY36uULyD5GztkJgshGKVJSdkoOdOZKobJX/2Av1ddaL//W7YckDqjQhGOlOg6KtJtgqRnhdJLrxopGmIzwgHYMFTigyk3SUyfw0Ch96IfSlNAwVb9PJDhQahx4pjPAeqh+e1PxL68Ta7/sJkxEsaaCzBb5MYc6hNO/YZ9JSjQfG4KJZOZWSIZYYqJNOjkTwten8H/SLNpOyS7WTRpVMEMW7IMDcAQccAoq4ALUQAMQMAC34B48WNy6sx6tp1lrxprP7IIfsJ4/AXOBkZw=</latexit>
vk
<latexit sha1_base64="SlixnOA3RKyaA0Q6t9N9oYKCSHg=">AAAB6nicdVDLSgMxFM3UV62vquDGTbAIroZMi1p3pW5ctmgf0A4lk2ba0ExmSDKFMvQT3LhQxK1b/8IvcOfGbzGdVlDRAxcO59zLvfd4EWdKI/RuZZaWV1bXsuu5jc2t7Z387l5ThbEktEFCHsq2hxXlTNCGZprTdiQpDjxOW97ocua3xlQqFoobPYmoG+CBYD4jWBvpetwb9fIFZJ8i5+IMQWSjFCkpOyUHOgulUDmof7CX6mutl3/r9kMSB1RowrFSHQdF2k2w1IxwOs11Y0UjTEZ4QDuGChxQ5SbpqVN4bJQ+9ENpSmiYqt8nEhwoNQk80xlgPVS/vZn4l9eJtV92EyaiWFNB5ov8mEMdwtnfsM8kJZpPDMFEMnMrJEMsMdEmnZwJ4etT+D9pFm2nZBfrJo0qmCMLDsEROAEOOAcVcAVqoAEIGIBbcA8eLG7dWY/W07w1Yy1m9sEPWM+fcfuRmw==</latexit>
current
desired state error
+
controller ++
dynamics/ state
++ sensor
xd,k -
<latexit sha1_base64="BWdZjpRmrxE3SiOTW7lcmuZxZrk=">AAAB7nicdVDLSgMxFM34rK2Pqks3wSq4kCHTotZd0Y3LCvYB7VAymUwbJvMgyRTL0I9w40IRt36BP+AfuPNDdG06raCiBy4czrmXe+9xYs6kQujNmJtfWFxazq3kC6tr6xvFza2mjBJBaINEPBJtB0vKWUgbiilO27GgOHA4bTn++cRvDamQLAqv1CimdoD7IfMYwUpLrete6h76416xhMwjZJ0eI4hMlCEjVatiQWumlGp7788vw8JHvVd87boRSQIaKsKxlB0LxcpOsVCMcDrOdxNJY0x83KcdTUMcUGmn2bljuK8VF3qR0BUqmKnfJ1IcSDkKHN0ZYDWQv72J+JfXSZRXtVMWxomiIZku8hIOVQQnv0OXCUoUH2mCiWD6VkgGWGCidEJ5HcLXp/B/0iybVsUsX+o0zsAUObADdsEBsMAJqIELUAcNQIAPbsAduDdi49Z4MB6nrXPGbGYb/IDx9AlFX5P1</latexit>
ek
<latexit sha1_base64="T1XhHFxQLoDDLAFRbog5G11obGA=">AAAB6nicdVDLSgNBEJz1GeMrKnjxMhgET8tsghpvIV48JmgekCxhdtJJhsw+mJkVwpJP8OJBEa9e/Qu/wJsXv8XJJoKKFjQUVd10d3mR4EoT8m4tLC4tr6xm1rLrG5tb27md3YYKY8mgzkIRypZHFQgeQF1zLaAVSaC+J6DpjS6mfvMGpOJhcK3HEbg+HQS8zxnVRrqC7qibyxP7hDjnpwQTm6RISckpOtiZK/nyfu2Dv1Req93cW6cXstiHQDNBlWo7JNJuQqXmTMAk24kVRJSN6ADahgbUB+Um6akTfGSUHu6H0lSgcap+n0ior9TY90ynT/VQ/fam4l9eO9b9kpvwIIo1BGy2qB8LrEM8/Rv3uASmxdgQyiQ3t2I2pJIybdLJmhC+PsX/k0bBdop2oWbSqKAZMugAHaJj5KAzVEaXqIrqiKEBukX36MES1p31aD3NWhes+cwe+gHr+RNYFZGK</latexit>
actions uk
<latexit sha1_base64="BJtVIcyZkW0gUJFtDd+cIzYUujw=">AAAB6nicdVDLSgNBEOz1GeMrKnjxMhgET8tsghpvIV48JmgekCxhdjJJhsw+mJkVwpJP8OJBEa9e/Qu/wJsXv8XJJoKKFjQUVd10d3mR4Epj/G4tLC4tr6xm1rLrG5tb27md3YYKY0lZnYYilC2PKCZ4wOqaa8FakWTE9wRreqOLqd+8YVLxMLjW44i5PhkEvM8p0Ua6irujbi6P7RPsnJ9ihG2cIiUlp+ggZ67ky/u1D/5Sea12c2+dXkhjnwWaCqJU28GRdhMiNaeCTbKdWLGI0BEZsLahAfGZcpP01Ak6MkoP9UNpKtAoVb9PJMRXaux7ptMneqh+e1PxL68d637JTXgQxZoFdLaoHwukQzT9G/W4ZFSLsSGESm5uRXRIJKHapJM1IXx9iv4njYLtFO1CzaRRgRkycACHcAwOnEEZLqEKdaAwgFu4hwdLWHfWo/U0a12w5jN78APW8ydwdZGa</latexit>
model xk
<latexit sha1_base64="/0ZIJDtdBKPob9aoRboG85/+5uI=">AAAB6nicdVDLSgMxFM3UV62vquDGTbAIroZMi1p3pW5ctmgf0A4lk2ba0ExmSDJiGfoJblwo4tatf+EXuHPjt5hOK6jogQuHc+7l3nu8iDOlEXq3MguLS8sr2dXc2vrG5lZ+e6epwlgS2iAhD2Xbw4pyJmhDM81pO5IUBx6nLW90PvVb11QqFoorPY6oG+CBYD4jWBvp8qY36uULyD5GztkJgshGKVJSdkoOdOZKobJX/2Av1ddaL//W7YckDqjQhGOlOg6KtJtgqRnhdJLrxopGmIzwgHYMFTigyk3SUyfw0Ch96IfSlNAwVb9PJDhQahx4pjPAeqh+e1PxL68Ta7/sJkxEsaaCzBb5MYc6hNO/YZ9JSjQfG4KJZOZWSIZYYqJNOjkTwten8H/SLNpOyS7WTRpVMEMW7IMDcAQccAoq4ALUQAMQMAC34B48WNy6sx6tp1lrxprP7IIfsJ4/AXUHkZ0=</latexit>
measurements
Yk = {y0 , ..., yk }
<latexit sha1_base64="o5MZUx0KFOQgyg6u6AYx5MM9l4w=">AAAB/HicdVDLSgMxFM3UV62v0S7dhBZBsAyZFrUuhKIblxXsQzplyKRpG5p5kGSEYahf4d6NC0Xc+iHu+jemD0FFD1w4nHMv997jRZxJhdDEyCwtr6yuZddzG5tb2zvm7l5ThrEgtEFCHoq2hyXlLKANxRSn7UhQ7HuctrzR5dRv3VEhWRjcqCSiXR8PAtZnBCstuWb+1h2dO2niopJlWaXEHTlj1ywi6xjZZycIIgvNMCNVu2JDe6EUawXn6GFSS+qu+eH0QhL7NFCEYyk7NopUN8VCMcLpOOfEkkaYjPCAdjQNsE9lN50dP4YHWunBfih0BQrO1O8TKfalTHxPd/pYDeVvbyr+5XVi1a92UxZEsaIBmS/qxxyqEE6TgD0mKFE80QQTwfStkAyxwETpvHI6hK9P4f+kWbbsilW+1mlcgDmyYB8UwCGwwSmogStQBw1AQAIewTN4Me6NJ+PVeJu3ZozFTB78gPH+CTyIltc=</latexit>
cockpit
power systems
flight controls
environment
sensor systems
estimator/
estimated state filter
x̂k
<latexit sha1_base64="q2YwIOkAJbRmYidR+4pKDsr7gqc=">AAAB8HicdVDLSgMxFM3UV62vquDGTbAIroZMi1p3pW5ctmAf0g4lk2ba0GRmSDJiGfoVblwo4lbwL/wCd278FtNpBRU9cOFwzr3ce48XcaY0Qu9WZmFxaXklu5pbW9/Y3Mpv7zRVGEtCGyTkoWx7WFHOAtrQTHPajiTFwuO05Y3Op37rmkrFwuBSjyPqCjwImM8I1ka66g6xTm4mvVEvX0D2MXLOThBENkqRkrJTcqAzVwqVvfoHe6m+1nr5t24/JLGggSYcK9VxUKTdBEvNCKeTXDdWNMJkhAe0Y2iABVVukh48gYdG6UM/lKYCDVP1+0SChVJj4ZlOgfVQ/fam4l9eJ9Z+2U1YEMWaBmS2yI851CGcfg/7TFKi+dgQTCQzt0IyxBITbTLKmRC+PoX/k2bRdkp2sW7SqIIZsmAfHIAj4IBTUAEXoAYagAABbsE9eLCkdWc9Wk+z1ow1n9kFP2A9fwJERJRq</latexit>
14
Figure 7: Model-based engineering design diamond.
forming, requires deep knowledge of the physical processes that govern the final structure [136]. Physics-informed
machine learning may contribute to elevating first-principles-based models to the level of accuracy and efficiency
required for an MDO study.
15
5 Aerospace Manufacturing
Manufacturing is a highly complex and dynamic process, involving the coordination and merging of several elab-
orate and precisely times stages. In a modern manufacturing environment, tremendous volumes of data are being
generated, stored, and analyzed to improve process quality, reliability, and efficiency. The data generated is inher-
ently multi-modal, including hand-written reports from humans working alongside automation, high-fidelity data
from metrology equipment, video feeds, supply chain logs, part catalogues, and results from in-process inspection,
to name a few. There are several key opportunities to leverage machine learning and other data-intensive optimiza-
tion techniques to improve manufacturing processes. Several areas of high-priority include: part standardization;
automation and robotics; streamlined assembly, including reduced measurements, processing, and inspection, to-
wards just-in-time manufacturing; supply chain management; material design and fabrication; and non-destructive
inspection. In this section, we will explore several of these areas.
5.2 Standardization
The standardization of aircraft designs, manufacturing processes, parts and machines is an open industrial chal-
lenge. New designs are often airplane specific iterations that start the design process from scratch and ultimately
offer no significant differences over prior versions. However, current manufacturing paradigms do not exploit or
streamline designs to reuse and retool existing versions, resulting in inefficiencies and stresses on supply chains,
maintenance and the critical path of assembly. In aircraft design, some simple examples are brackets and fasteners.
Brackets support wiring, hydraulics, systems, and fasteners for joining many types of structures and components. If
data from prior designs can be extracted or harvested, then detailed comparisons can be performed and similarities
can be identified resulting in a reduction of unique parts that are airplane specific.
Data mining and machine learning are poised to revolutionize industry paradigms for standardization. Cur-
rently, modern digital tools for designing, inventory and quality control produce a glut of data that often sits un-
used. This data can be mined for dominant features to inform future decisions and streamline the design process.
To discover relevant, redundant patterns, feature selection criteria can be tailored to extract the desired topological,
spatiotemporal, physical, and material properties. Recent strides have been made in interpretable machine learning
models that directly incorporate sparsity-promoting, physical, temporal and topological constraints into optimiza-
tion objectives. The key features can be used to identify standard designs and reduce the number of unique designs
or parts so that they can be shared across future designs and applications.
5.3 Automation
The aircraft industry has driven the development of automated machines to fabricate detailed parts, including
machining metals, composites fabrication, assembly of structure, painting, and inspection. These systems improve
16
quality, reduce production cycle times and cost, and reduce repetitive injury to mechanics; however, they are expen-
sive to develop, implement, and maintain. Due to the unique requirements of aircraft designs, materials, and tight
tolerances, the decision to automate is complex and driven by specific manufacturing processes and applications.
The development of successful automated systems starts with a focus on automating the process and understanding
requirements. Thus, controlling the process is key to success. The tight tolerances of automation require monitoring
and process control to ensure consistency, which in turn employs many sensors. Sensor feedback can provide in-
sight into where processes are stable and where variation occurs. They tell us the level of control needed in specific
area, process repeatability, what and where to measure, and a host of other outcomes.
When control is present, algorithms for machine learning control can be used to disambiguate the relevant
process dynamics from control inputs. The tools of machine learning can also be used off-the-shelf on sensor data,
not only to identify repetitive patterns and faults, but also pinpoint potential failures. Such failures can be predicted
by studying the extent to which data deviate from dominant features, or by detecting outliers using robust feature
extraction techniques on raw sensor data. The goal of robust feature extraction is to efficiently learn features and
outliers simultaneously. One example is robust principal component analysis, which has previously demonstrated
success in predictive aircraft shimming applications [138] and is discussed in detail in Section 8.
5.4 Assembly
Aircraft are manufactured in many pieces, including fuselage barrel sections, wings, stablizers, and fasteners, that
are joined as sections or sub-assemblies and ultimately get integrated into the final product. Wings are built from
sub-assemblies of fabricated wing skins, spars, flaps and ailerons; likewise, fuselages consist of multiple barrel
sections that are assembled separately and then joined. Wings are joined to fuselages in a manufacturing position
called “wing-to-body join.” The alignment, positioning and joining of these assemblies require high precision tool-
ing and part positioning systems. In some cases, automated positioning tooling controlled by metrology systems
are used to align and fit structure. In addition, even though large sections are made to very tight tolerances, the
combined tolerances result in very small gaps that must be shimmed to meet engineering requirements. Parts are
measured with metrology equipment, 3D models are generated to determine gaps between the mating parts, and
automated machines are used to fabricate the shims that fill the gaps to ensure proper fit.
Throughout the process, enormous quantities of data are generated. Many sensors are used to provide feedback
to control the automation, measure parts, and validate quality. The use of these data are expanding from building
and assembling specific parts to being applied to predict future builds via machine learning and data analytics.
Structural features mined from historical build data can be used to predict gaps in future builds, using only a tar-
geted subset of measurements to infer the high fidelity structure. This sparse sensing methodology was successfully
applied for the predictive shimming of new wing-to-body join builds (Section 8). Not only does sparse sensing
bypass the heavy planning and processing required of high fidelity metrology, but it also identifies a set of key
structural features that may be analyzed for diagnostics and defects.
5.5 Materials
Adoption of new materials is a major challenge in the aerospace industry, due to the amount of structural testing
done to certify a material. The standard approach is the building block shown in Figure 8. This approach starts
with a large amount of tests at small coupon levels, a much smaller number of tests at subcomponent scale, a
handful of tests of subassemblies and one or two full-scale test articles. Moreover, additional validation is needed
when the manufacturing process or process parameters change, because they can introduce different features in
the part. There are two areas where machine learning can have significant impact: physical testing and materials
characterization.
The use of data analytics and machine learning, in addition to physics-based models, can result in a significant
reduction in testing. First, we might be able to correlate behavior at larger scales to a small number of fundamental
material properties by analyzing data on existing materials. That could already help filter out the most promis-
ing material candidates in the material screening and selection phase, without wasting time and money on extra
characterization testing. In subsequent phases the sheer number of tests for new materials could be reduced by
learning information from existing material systems. Secondly, in the absence of physical tests on new material
parameters, physics-guided ML may be applied to process and structural simulations to correlate parameters to
performance, thus enabling highly efficient exploration of material parameters. ML is also being used to design
new materials [15], including superalloys [139–142].
More recently there has been an increased focus on material characterization for fiber-reinforced composites
process modeling to reduce the amount of physical trial and error. Processing often happens when the material is
17
Figure 8: Schematic materials building block diagram.
uncured and at elevated temperature, requiring a different set of tests than is needed for the structural characteri-
zation. For example, numerical models used to predict the quality of thermoplastic composite laminates [143, 144]
require information on shear and bending behavior of the laminate, friction between layers (which changes when
their relative orientation changes), friction between the composite and any contacting tooling material (which in
turn could have some surface coating). Most of these properties change with temperature, and some also depend
on the forming speed and pressure. Here, sparse regression techniques in ML can help discover fundamental laws
and relationships, resulting in significant reductions in the number of tests to be conducted, as well as the time and
costs needed to introduce new materials.
18
Figure 9: Example of Hat-Shaped Stringers on a 787 Fuselage [148].
Therefore every part is inspected and structural integrity is verified for every occurrence, and if necessary, repairs
are made to recover any loss in performance. Data analytics and machine learning have the potential to improve
composite forming processes in three critical areas: (i) support the development of physics-based models by iden-
tifying features that are dominant in the formation of wrinkles, (ii) automate the detection and characterization
of wrinkles in inspection data, and (iii) improve part quality by using vision-based systems to track humans and
determine patterns in their actions that result in better quality parts, which could lead to better work instructions
or the development of an automated system that mimics the best manual process.
19
sampling rates varying from less than 1 Hz to upwards of 65 kHz. Some measurement data is only collected when
triggered or when a change is detected. All measurement data is stored for each flight test, generating gigabytes
of data. Much of this data must be synthesized across sensor modalities and in time, both within a single flight
test and across multiple tests. Similarly, high-rate asynchronous lab test data is collected in unique demanding en-
vironmental conditions, such as extreme vibration, icing conditions, sandblasting, extreme temperature, and high
structural demands. These environments impose unique requirements on the sensors and can result in the sensors
falling out of calibration.
The complexity of a typical commercial aircraft must also be considered. There are approximately 2.3 million
parts on the Boeing 787 with 70 miles of wire and 18 million lines of source code in the avionics and flight control
systems alone. Additionally, there are ten major systems interacting together in a complex, dynamic and rapidly
changing environment. As for constraints, critical system components must have a failure rate of less than 10−9 or
10−12 , depending on the system, and backup components that seamlessly step in when required. Flight testing is
the final validation step for every design and performance requirement. Certification is achieved when each compo-
nent, subsystem, and integrated system process can be shown to have a proven level of performance, repeatability,
and accuracy. Because aircraft are designed to have near zero tolerance for failures, these failures are intentionally
triggered during testing to evaluate the resulting effects on function and performance. The rarity of failures in the
high volume of flight test data presents an additional challenge to machine learning algorithms, which require a
sufficient number of examples to build models for failure.
6.1 Digitization
There is a significant opportunity to improve and automate verification and validation through data digitization,
which will enable a wealth of downstream data analytics for pattern extraction and anomaly detection. To un-
derstand the benefits of a digitized workflow, it is helpful to understand the current testing landscape. Existing
operations are rampant with mundane manual touch points, requiring significant process, time, and resources to
validate that there are no quality lapses. Current workflows rely on experienced engineers to synthesize upwards
of 20 data sources to search for information to substantiate a report or troubleshoot a problem. Systematic efforts to
leverage data across flights and programs and to perform exploratory analyses are virtually non-existent.
A digitized flight test workflow that leverages machine learning would enable the automation of tasks that are
currently laborious, reactive, last minute, discrete, and manual. The result would be a flow of information that is
effortlessly available to all test stakeholders and participants from the preliminary concept to the final test report.
Collection and aggregation of process metadata will enable visualization of core components, such as the test plan,
test approvals, aircraft configuration, instrumentation configuration, and conformity status, with each component
being updated as information becomes available. This centralized, real-time repository will provide engineers with
the time and resources to process critical information, collaborate with peers, and engage with the test data as it is
generated.
There are several significant challenges that face efforts to digitize verification and validation workflows. Dig-
itizing the workflow cannot compromise the integrity of the information or its ease of access. Any changes must
improve upon the users’ ability to confidently assess the quality, reliability, and traceability of the data. However,
the benefits of making operations more efficient by leveraging machine learning and uncertainty quantification
make digitation worth the effort. It is expected that digitization will result in sweeping benefits across the produc-
tion cycle and in-service life cycles.
20
small percentage of all data was ever stored and analyzed on any given flight. Further, if you weren’t data on during
a critical event in flight, either planned or unplanned, that data was lost without being recorded or analyzed.
Engineers used paper strip charts that look similar to an EKG printout for real time plotting. If the strip chart was
not set to plot the critical parameter during the test, there was no way to retrieve that data in flight. Cross plots were
often created by hand with paper and pencil. Computers had limited computing capability and often only crucial
calculations were computed in real-time. More intensive computing was performed by a mainframe computer after
the test.
Technological innovations in computing and sensors have led to significant improvements in flight test data
collection, storage, and visualization. Present day testing requirements are still largely developed in discipline-
specific silos. Domain expertise has helped identify areas where tests can be modified to be more compatible or
concurrent with other tests. Additionally, computational tools have been developed to aid human management
and optimization of the flow of the test program. However, data science methods have yet to be used at significant
scale to improve test optimization. Much of the improvement in present day testing has been the result of improved
computing hardware. Storage has improved so all data parameters can be recorded and stored for the duration of an
entire flight. Data analysis has moved from mainframe computers to laptops. There have been some improvements
in algorithms and calculation methods especially in the areas of data fitting and filtering. Data visualization has
seen the greatest improvements: paper has been replaced with electronic strip charts and plots, and the use of colors
and data triggers has allowed more data to be monitored and analyzed, with less cognitive overload. Although
computational and visualization improvements have been significant, performance and evaluation metrics and
algorithms have remained largely unchanged. Most data is still analyzed on a condition-by-condition basis, leaving
large amounts of flight data unanalyzed.
Future testing will be heavily informed by the models which make up the digital twin. Safety-critical test
conditions, where margins are thin, will be identified from models and simulations. Additional test conditions
will be determined by a principled and data-informed selection, where physical tests will increase confidence in
the model. Active learning [149, 150] may be used to streamline this testing, reducing the number of physical
queries required to validate a model. Effective test planning will involve a multi-dimensional and multi-objective
optimization to efficiently achieve convergence of the coupled, multi-scale and multi-physics models. Test plans
will need to be flexible enough to support model exploration and refinement when the model diverges from the
physical world; i.e., testing will need to inform a discrepancy model for the system. The goal of flight test will evolve
from completing as many test conditions as possible to increasing certainty in the multitude of physical, functional,
and logical models that make up the digital twin. Such future testing will require a more seamless synchronization
with the digital twin to the physical entity it represents, providing behavioral or functional predictions, which are
then informed by measurement data. Robust and computationally tractable reduced-order modeling approaches,
along with data assimilation and uncertainty quantification, will be required to validate and explore models on the
physical asset in real-time. Data from these models will then be used to update the digital twins.
Flight test will likely continue to require discrete test conditions. However, an increased emphasis should be
placed on analyzing data beyond what is required for the test condition. This holistic approach differs significantly
from the past approach of only capturing the “data on” condition data to now making 100% of the data available
for analysis. Modern techniques in machine learning will facilitate this analysis of high-dimensional, multi-modal
data, potentially extracting patterns and correlations that would have been inaccessible with previous approaches.
For this model and data revolution to be successful, additional data from disparate sources will be required beyond
what is needed to validate the digital twin. For example, meta data will link these data, including information about
what parts were installed (configuration), what airport(s) were involved, and what were the qualitative character-
istics of the aircraft. Combining this meta-data with the digital twin will be crucial to flight test of the future.
21
Figure 10: Schematic overview of digital verification and validation processes, integrated with digital twin technol-
ogy and optimized via machine learning.
drifts. Improved sensor fusion and filtering algorithms, based on physics-informed machine learning models of
the system and its components, will enable real-time diagnoses with confidence intervals. Sparse sensor optimiza-
tion, or identifying which sensors to listen to for a given task, will also reduce the computational burden, enabling
real-time decisions and updates to the flight plan, instead of post-mortem analysis.
7 Aerospace Services
In-service operations are the support provided to customers or by the customer operations team to support an oper-
ational fleet. From the initial delivery to long-term maintenance, there exists significant opportunities for leveraging
data to improve and optimize maintenance and airside support. A major source of customer expense arises from
aircraft damage, injury, delays to airside support, and unscheduled maintenance. The positive business impacts of
autonomous airside operations and not incurring unscheduled maintenance can be met with emerging sensor and
algorithmic innovations.
22
Figure 11: Illustration of many potentially data-intensive aspects of ground service.
23
8 Case Study: Predictive Assembly and Shimming
Aircraft are built to exceedingly high tolerances, with components sourced from around the globe. Even when parts
are manufactured to specification, there may be significant gaps between structural components upon assembly due
to manufacturing tolerances adding up across large structures. One of the most time-consuming and expensive ef-
forts in part-to-part assembly is the shimming required to bring an aircraft into the engineering nominal shape. A
modern aircraft may require on the order of thousands of custom shims to fill gaps between structural components
in the airframe. These shims, whether liquid or solid, are necessary to eliminate gaps, maintain structural perfor-
mance, and minimize pull-down forces required to bring the aircraft into engineering nominal configuration for
peak aerodynamic efficiency.
Historically, parts have been dry-fit, gaps measured manually, and custom shims manufactured and inserted,
often involving disassembly and reassembly. Recent advancements in 3-D scanning have enabled their use for
surface measurement prior to assembly, known as predictive shimming [151–163]. Gap filling is a time-consuming
process, involving either expensive by-hand inspection or computations on vast measurement data from increas-
ingly sophisticated metrology equipment. In either case, this amounts to significant delays in production, with
much of the time being spent in the critical path of the aircraft assembly.
In this case study, we present a recent strategy for predictive shimming [138], based on machine learning and
sparse sensing to learn gap distributions from historical data and then design optimized sparse sensing strategies
to streamline the collection and processing of data; see Fig. 12. This new approach is based on the assumption that
patterns exist in shim distributions across aircraft, and that these patterns may be mined from historical produc-
tion data and used to reduce the burden of data collection and processing in future aircraft. Specifically, robust
principal component analysis (RPCA) [96] from Sec. 2.3.2 is used to extract low-dimensional patterns in the gap
measurements while rejecting outliers. RPCA is based on the computationally efficient singular value decomposi-
tion (SVD) [9, 164], and yields the most correlated spatial structures in the aircraft measurements, identifying areas
of high variance across different aircraft. Next, optimized sparse sensors [63, 118, 165] are obtained that are most
informative about the dimensions of a new aircraft in these low-dimensional principal components. The success
of the proposed approach, known within Boeing as PIXel Identification Despite Uncertainty in Sensor Technology
(PIXI-DUST), is demonstrated on historical production data from 54 representative Boeing commercial aircraft. This
algorithm successfully predicts 99% of the shim gaps within the desired measurement tolerance using around 3%
of the laser scan points that are typically required; all results are rigorously cross-validated.
Figure 12: Cartoon illustrating recent progress in predictive shimming in the last decade.
24
This approach to predictive shimming combines robust dimensionality reduction and sparse sensor optimiza-
tion algorithms to dramatically reduce the number of measurements required to shim a modern aircraft. In partic-
ular, RPCA from Sec. 2.3.2 is used to extract coherent patterns from historical aircraft production data. Thus, RPCA
is used to develop low-dimensional representations for high-dimensional aircraft metrology data (e.g., laser scans
or point cloud measurements). Shim scan data is collected across multiple aircraft, either leveraging historical data,
or collecting data in a streaming fashion as each new aircraft is assembled. The shim measurements for a region
of interest are flattened into column vectors xk ∈ Rn , where n corresponds to the number of measurements
andk
refers to the aircraft line number. These vectors are then stacked as columns of a matrix X = x1 x2 · · · xm ,
where the total number of aircraft m is assumed to be significantly smaller than the number of measurements per
aircraft, i.e. m n.
Because of tight manufacturing tolerances and a high degree of reproducibility from part to part, it is assumed
that the matrix X possesses low-rank structure. As described above, there may be sparse outliers that will corrupt
these coherent features, motivating the use of RPCA to extract the dominant features. Next, sparse sensors are
identified which maximally inform the patterns in future aircraft using sparse optimization techniques from Sec. 3.2.
The goal is to identify a small number of key locations that, if measured, will enable the shim geometry to be
predicted at all other locations; this is possible because of the low-dimensional structure extracted through RPCA.
Only measuring at these few locations will dramatically reduce measurement and computational times, improving
efficiency.
We demonstrate the PIXI-DUST architecture on production data from a challenging part-to-part assembly on
a Boeing aircraft. This data set consists of 10076 laser gap measurements of the part assembly for 54 different
production instances of the same aircraft type. Measurement locations are aligned between the instances, making
the data amenable to SVD. We build a low-order model of the shim distribution using RPCA and then design
optimized measurement locations based on these data-driven features. We train the model on 53 aircraft and then
validate on the remaining aircraft; this process is repeated for all 54 possible training/test partitions to cross-validate
the results. Thus, a data matrix X ∈ R10076×53 of training data is constructed, in which each column contains all of
the shim gaps for one aircraft, and each row contains the measured gap values at one specific location for all aircraft
in the training set.
Figure 13 displays the seven separately manufactured shim segments, as well as the sensor ensembles for each
shim; the error distributions are given in Fig. 14. Prediction results are shown in Table 1. Prediction accuracy is
vastly improved, and 96-99% of the shim gap locations are predicted to within the desired machining tolerance.
Furthermore, we note that the rates of optimal measurements r vary from anywhere between 2% to 6% of all points
within the shim, which indicates that some shims are higher-dimensional and require more features (hence, sensors)
to be fully characterized. This is also reflected in the sensor ensembles in Fig. 13.
This case study demonstrates the ability of data-driven sensor optimization to dramatically reduce the number
of measurements required to accurately predict shim gaps in aircraft assembly. Sparse sensor optimization was
Figure 13: Gap measurements. This figure plots high-fidelity gap measurement locations segmented into 7 canon-
ical shim regions (left). Ensembles of measurement locations (sensors) selected by our method are pictured on the
right. Sensors are sized by the number of times they are selected by our method across 53 different gap measure-
ment training sets. They are more concentrated around edges and corners, which is where we expect more systemic
gap variation. Shim boundaries for segmented prediction (left) and sensor ensembles across 53 instances of training
(right), sized by the number of times they appear in training. Reproduced with permission from [138].
25
Shim No. 1 2 3 4 5 6 7
Percent Accurate 97.90 98.05 99.82 99.94 99.99 99.03 99.97
Optimal sensors (avg) 26 26 25 26 25 26 25
Total points 1003 1116 453 692 709 768 664
Table 1: Segmented prediction results show vastly improved prediction accuracies, with 97-99% of gaps predicted
to within the desired 0.005 inch measurement tolerance.
Optimized sensors
Random sensors
Figure 14: Absolute error distribution for optimal (blue) vs. random (orange) sensors. The above are histograms
of pointwise absolute error across all validation tests using reconstruction from optimal and an equal number of
random sensors (which are randomly selected from available measurement locations shown in Figure 13). The red
line represents the desired measurement tolerance, and the legend indicates the percentage of points that fall within
this error tolerance. The histograms indicate that optimal sensors predict nearly twice as well as random sensors.
Reproduced with permission from [138].
combined with robust feature extraction and applied to historical Boeing production data from a representative
aircraft. With around 3% of the original laser scan measurements, this learning algorithm is able to predict the
vast majority of gap values to within the desired measurement tolerance, resulting in accurate shim prediction.
These optimized measurements exhibit excellent cross-validated performance and may inform targeted, localized
laser scans in the future. Reducing the burden of data acquisition and downstream computations has tremendous
potential to improve the efficiency of predictive shimming applications, which are ubiquitous and often take place
in the critical path of aircraft assembly. Thus, streamlining this process may result in billions of dollars of savings.
26
Figure 15: V-22 Osprey. Image reproduced from https://en.wikipedia.org/wiki/File:V22-Osprey.jpg.
A 2001 review panel on the V-22 recommended the program “Extend high-rate-of-descent testing, formation
flying (and other deferred flight tests as appropriate) to sufficiently define and understand the high-risk portion
of the flight envelope under all appropriate flight conditions.” [166] The panel went further and recommended
that the results of the high risk, high rate-of-descent testing be used to update operating limitations as well as the
flight simulator used for crew training. The call for additional high rate-of-descent testing came as a result of a
fatal accident in April 2000 involving both high-rate-of-descent and formation flying. The cause of the accident
was attributed to an aerodynamic condition known as a vortex ring state (VRS), which is an unsteady aerodynamic
condition that occurs for rotorcraft operating at low forward speed and high rate-of-descent. In this condition,
the rotorcraft does not fly away from the rotor wake, resulting in highly unsteady airflow. For a helicopter, this
often results in a high rate-of-descent, and for a tiltrotor the result is typically roll-off. Both situations are perilous,
especially in close proximity to the ground. VRS still remains a significant area of research.
The goal of a digital twin is to bridge the gap between the physical world and the virtual world. Two aspects
of a digital twin that differentiate it from traditional modeling are: 1) continuously learning and updating internal
models with real-world data, and 2) integration of data from one or more multi-physics models across a system. For
a digital twin to continuously learn, it needs to be tested, updated, and re-tested. In the case of the V-22, a digital
twin may begin with a model of VRS such as the one presented in [166], which is merged with the model and flight
test data for the specific aircraft. Discrepancy modeling can be used as a data-driven method to explain and update
model divergence. On at least two occasions V-22 pilots experienced significant uncommanded roll events during
formation flying [167]. As a result of this, and the aforementioned accident, the V-22 kicked off an extensive forma-
tion flying and high-rate-of-descent flight test campaign. Combining a VRS model with formation flying models
would be an ideal opportunity to integrate multi-physics models across a system. The goal of integration as data
becomes available is to work towards model convergence, or the bridging of the physical and virtual such that the
digital model accurately represents the physical environment. The integration of unsteady, nonlinear, multi-physics
models is no easy task. Implementing a real-time reduced-ordered model may help discover and explore areas of
significant model discrepancies in flight test. The use of physics informed learning and custom regularization in the
machine learning optimizations may aid in maintaining physical constraints as the two models are merged to form
a digital twin. With that enhanced model in hand, there is benefit in learning from a high number of simulations.
Flight testing could be reduced to the most critical test conditions: safety, mission critical, and model confirmation.
Certainly, this simple case study is not without conjecture. However, the tools presented in this paper will form the
basis for data-driven hypotheses that serve as the roadmap for future aerospace engineering possibilities.
27
10 Case Study: Urban Air Mobility
Urban air mobility is an emerging transportation framework with the goal of providing on-demand, personal
point-to-point transportation within and between the obstacle-rich environments of urban domains by leveraging
the airspace above roads and between and above buildings and other physical infrastructure. Urban air mobility
has great promise for alleviating transportation congestion, decreasing travel times, decreasing levels of pollution
through the use of next-generation electric aircraft, and producing technology to benefit a wide range of industries;
see Fig. 16. The purpose of this case study is to demonstrate how the component subsystems of urban air mobility
have and can benefit from data science.
In order to achieve the promise of point-to-point, on-demand personal transportation, a number of key tech-
nological challenges must be addressed. These challenges include, but are not limited to: vehicle design, testing,
and certification; vehicle fleet health monitoring; vertical garages (vertiport) design and construction for passenger
on- and off-boarding; minor maintenance such as battery charging, fueling, cleaning and major maintenance and
repair; logistical scheduling systems, resembling land-based ride-shares, taxis, trains, and buses; air traffic rout-
ing, requiring next generation air traffic control; integration of autonomy for guidance, navigation, and control;
and autonomous deconfliction, such as automatic dependent surveillance-broadcast (ADS-B). Nearly all of these
technologies will rely on improved sensor networks, robust data communication and processing, and advanced
machine learning algorithms. As discussed next, these system elements are at varying levels of maturity and tech-
nology readiness.
A large number of companies, both established and startup, have been exploring the development of vehicles
suitable for dense and safe personal transportation in urban settings using electric propulsion. Traditional fixed
wing aircraft require long runways that are spatially infeasible in cities, while helicopter designs are unable to scale
to large numbers because of maneuverability and safety issues stemming from the lack of unpowered gliding ca-
pability. Innovative designs have been proposed over the past decade to achieve the needs of safe, dense operation
with high maneuverability, safety, and green fuel. The designs of such vehicles require the perspective of multi-
disciplinary optimization with regard to aero-servo-elasticity considerations. Specifically, there is a tight physical
coupling between the unsteady aerodynamics over the flexible airframe, which is further influenced by the propul-
sion and control surfaces. Thus, these components cannot be decoupled and designed independently, making the
design space quite rich and challenging.
Figure 16: The urban airspace of the future will involve multiple platforms operating on various size and timescales.
28
Historically, successful innovation of viable airframes has required years of extensive experience. The advent of
modern computational capabilities will enable faster exploration of the design space through computational fluid
dynamics (CFD) and finite element analysis (FEA) tools to simulate vehicle performance in a range of flight and
loading conditions. These tools, and the design space over which they are leveraged, are inherently data rich.
While CFD and FEA are effectively already data science tools, the potential exists for more extensive analytics to
intelligently reduce the vast design space to a more manageable level. An example of how such design tools can
impact outcomes can be seen in the multirotor design tool [168] that produces unexpected configurations for vehi-
cles with multiple sets of rotating blades like a helicopter. While those designs are not likely options for passenger
vehicles, the current innovation environment coupled with both the traditional design experience and modern data
science tools has led to a rich array of proposed vehicles for urban operation. Options being considered include
multirotors, short runway vertical take off and landing, and various hybrid rotor/propulsion designs. At this time,
no single vehicle design is a clear frontrunner, and as with traditional fixed wing aircraft, the most likely outcome
will be a series of vehicle designs from companies that will share the market and the urban air space.
As with long-range air transportation, stable operation of fielded fleets of vehicles requires vehicle health mon-
itoring, maintenance systems, and technical support. Onboard sensing and data logging capabilities for mechanical
systems has been developing rapidly, with many individual subsystems (e.g., power, propulsion, cabin environ-
ment) being equipped with data systems that log and either save or regularly transmit system health information
to a data server (e.g., Rolls Royce and BMW). The sheer volume of data provided by these systems is immense,
and appropriate data mining and monitoring tools are needed to flag operation and maintenance issues and to
remediate issues that arise during flight. Maintenance and service for these vehicles will likely be a bit different
than for long-range flight vehicles. For aircraft powered by batteries, down time between passengers may require
a quick battery charge or replacement. For vehicles with combustion engines, fuel tanks will need to be filled.
Operational inspections will need to take place to ensure passenger safety. These tasks must all be performed in
such a way as to facilitate efficient vehicle take offs, landings, and ground maneuvering. Solutions for efficiently
managing these tasks are actively in development for other applications such as autonomous ocean transportation,
autonomous ground transportation, and smart city infrastructure. In addition, vertical garages will be key to the
successful adoption of urban air transportation, enabling safe takeoff and landing and facilitating efficiencies that
might not be possible otherwise.
Logistical systems appropriate for handling on-demand transportation requests are quite mature. The advent
of ride-share companies, such as Uber and Lyft, have driven the development of effective mobile applications that
handle all aspects of the process, from user registry, payment, localization, service connection, tasking, scheduling,
routing, and feedback surveys. These tools are at a level of maturity that should translate readily to the urban air
mobility domain. Data science tools have been developed specifically to address these needs and are extensively
leveraged for many aspects of the entire logistics system. One exception in this technology is air traffic routing.
Current tools for ground transportation rely on a well-structured operational environment which has extensive
physical infrastructure (e.g., roads, buildings, bridges, traffic lights, signage, etc) and rules-of-the-road that inform
and are informed by regulatory authorities.
A key question regarding urban air transportation is that of whether aircraft will be piloted, unpiloted or a mix
of both. Given the designs under development, the most likely scenario is the hybrid case where a mix of many
different types of vehicle piloting operate simultaneously. Even in the case of human piloted vehicles, a great deal
of vehicle autonomy is in operation at any time to manage the myriad subsystems that are required for a modern
flight system. The dense airspace of the future will likely exacerbate the existing challenges of autonomy. Further,
the current human-centric air traffic control framework will simply be unable to address the temporal and spatial
density needed for urban air mobility to be viable. However, autonomous routing, deconfliction, and obstacle
avoidance are not yet at a level of maturing to be fielded. The issues are partly technological and partly policy. On
the technology side, sensors and algorithms are in late stage development (e.g., ADS-B), but they have not yet been
adopted into standard practice. On the policy side, work has been underway for Next Generation Air Traffic Control
(NextGen ATC) for the past two decades, but final adoption has not yet reached final stages that were planned for
2025. To realize the NextGen ATC system, data science techniques are needed to provide real-time routing of all
vehicles in the same way that ground-based transportation systems utilize.
11 Outlook
This paper has provided a high-level review and roadmap for the uses of data science and machine learning in the
aerospace industry. Aerospace engineering is a data-rich endeavor, and it is poised to capitalize on the big-data and
machine learning revolution that is redefining the modern scientific, technological, and industrial landscapes. We
29
have outlined a number of high-priority opportunities in aerospace design, manufacturing, testing, and services
that all may be enabled by advanced data science algorithms. For example, future data-driven algorithms may
enable the design of new super-materials, streamlined flight testing for safer and more flexible designs, and en-
hanced manufacturing processes including standardization, predictive assembly, and non-destructive inspection.
The array of potential applications is both riveting and at times may be overwhelming. Some of these advanced
technologies, such as predictive shimming and assembly, are already experiencing considerable success in commer-
cial production. Other applications will be enabled by the digital twin of the future, with advanced optimization
and learning algorithms seamlessly fusing multi-physics and multi-fidelity models with real-world data streams.
These models will continuously improve, with potentially radical implications for the entire aerospace production
pipeline.
There are clear parallels between the recent rise of data science and the transformative impact of scientific
computing beginning in the 1960s. Computational science began as a specialized focus and has since evolved into
a core engineering competency across many disciplines. Likewise, data science and machine learning proficiency
will be expected as a core competency in the future workforce, highlighting the need for robust education initiatives
focused on data-driven science and engineering [9]1 . Data-driven aerospace engineering will also require changes
to how teams of researchers and engineers are formed and how decisions are made. A recent Harvard Business
Review article [169] pointed out the need for data scientists to be integrated into the decision making process,
likening this to having Spock on the bridge in Star Trek. The last challenge to discuss is how to develop a data-first
culture, which is imperative to fully harness the potential of data-enabled engineering. A data-first culture requires
enterprise-wide education and adoption of principles that promote cooperation and data sharing, reproducible
practices, common terminology, and an understanding of the value of data and the promise and limitations of
machine learning algorithms. Moreover, this transformation will require ongoing curiosity and deep thinking from
engineers and leaders about how processes might be improved with emerging technologies.
Current and future aircraft programs will be increasingly enabled by a wealth of data. There is low-hanging
fruit in current aircraft programs that may provide immediate benefits with existing data. However, the full po-
tential of data-enabled aerospace engineering will take generations of aircraft programs to realize. New programs
under development, such as the new midsize airplane (NMA), will provide an opportunity to develop and test en-
tirely new design, manufacturing, and testing capabilities. The digital twin will improve design and testing cycles
through a more wholistic data-driven model of critical processes and their interactions. These integrated models
will rely on the integration of myriad data sources into a digital thread. Lessons learned may then be leveraged into
more seamless integration of data into the aerospace industry for future programs, such as the future small aircraft
(FSA). Much as information technology companies like Google are valued based on their data, aerospace giants will
learn to extract value and competitive advantage by leveraging their wealth of data.
Never before has there been an architecture problem of this size, requiring the integrated efforts of informa-
tion technology experts, aerospace domain engineers, and data science teams. The aerospace industry currently
generates tremendous volumes of data throughout a product life cycle, but the data storage systems are not always
designed to have their data extracted, much less at near real-time rates. Data must be seamless to access while main-
taining security and control; software must enable analytics while performing its primary objective. Data must be
integrated while being served by decades-old systems. Data exploration must be near-effortless and allow for anal-
ysis of complex systems. Analytic results must be fed back into the systems providing suggestions or predictions
must be resolved without forcing all data to be migrated or new systems wholesale be replaced. There is also the
risk of a data mortgage and paralysis, where more effort is spent in collecting and curating data than analyzing it.
This risk motivates a shift from big data to smart data, where edge-computing and sparse/robust algorithms are used
to extract key data features in real time. Moving from a process largely grounded in storing data in local silos to
openly sharing data will be difficult and will motivate entirely new incentive structures for engineers. Teams must
come to identify the value they provide in part based upon the data they generate and curate.
It is also important to reiterate the need to develop machine learning algorithms that are specifically tailored for
the aerospace industry. Machine learning must be demystified: it is not a magic wand, but rather a growing corpus
of applied optimization algorithms to build models from data. These models are only as good as the data used
to train them, and great care must be taken to understand how and when these models are valid. Most machine
learning algorithms are fundamentally interpolative, and extrapolation algorithms are both rare and challenging.
Because of the need for reliable and certifiable algorithms, it is critical that physics is baked into machine learning
algorithms. Many of the concepts related to data analytics detailed within this paper constitute novel applications
that may not support a direct showing of compliance to existing regulations. Regulatory compliance is a critical
challenge for emerging approaches based on data analytics. Historically, civil aviation requirements have been cre-
1 See databookuw.com for video lectures, syllabi, and code that are tailored for a science and engineering audience.
30
ated and revised in response to negative events in the industry: primarily in reaction to accidents and incidents.
Further, these requirements are often made under the expectation that they are deterministic in nature. As non-
deterministic systems become introduced into commercial aviation systems, the basic regulatory philosophy will
not support the current methods of compliance. Anywhere that data analytics are leveraged in the certification
process, the same philosophical issues will need to be addressed. In place of a deterministic answer, statistical
approaches, and potentially models with dis-similar architectures, or other approaches not yet identified will be-
come a necessity. Fortunately, statistics are used in various areas of aircraft certification, related to aircraft safety
assurance following single and multiple failure events.
Powered human flight is one of the greatest achievements in the history of humankind, realizing the culmina-
tion of thousands of years of science fiction, and having a profound impact on the past century. The next century of
aerospace engineering will challenge us to envision and realize a new science fiction future based on the breathtak-
ing array of new data-enhanced technologies.
Acknowledgments
We would like to acknowledge several individuals and organizations within the University of Washington and The
Boeing Company: The Boeing Advanced Research Center (BARC) and its directors Per Reinhall, Santosh Devasia,
and Sam Pedigo; Tia Benson Tolle, Steve Chisholm, Chris Schuller, and Todd Zarfos at Boeing; Ashis Banerjee, Bing
Brunton, Ramulu Mamidala, and Navid Zobeiry at UW; the eScience Institute and its directors Ed Lazowska and
Magda Balazinska. We would also like to give special thanks to Mike Bragg for his support as Dean of the College
of Engineering at UW and Greg Hyslop, Lynne Hopper, and Nancy Pendleton at Boeing for expanding the scope
of this collaboration. We are also grateful for advice and interviews with Ryan Smith, Laura Garnett, Phil Crothers,
Nia Jetter, Howard McKenzie, and Darren Macer. Finally, we are indebted to the creative efforts of Michael Erickson
and the envisioneering team who translated our engineering discussions into many of the graphics in this paper.
References
[1] J. Harding, M. Shahbaz, A. Kusiak et al., “Data mining in manufacturing: a review,” Journal of Manufacturing Science and Engineering, vol.
128, no. 4, pp. 969–976, 2006.
[2] C. Lynch, “Big data: How do your data grow?” Nature, vol. 455, no. 7209, pp. 28–29, 2008.
[3] X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, S. Y. Philip et al., “Top 10 algorithms in data
mining,” Knowledge and Information Systems, vol. 14, no. 1, pp. 1–37, 2008.
[4] V. Marx, “Biology: The big challenges of big data,” Nature, vol. 498, no. 7453, pp. 255–260, 2013.
[5] M. J. Khoury and J. P. Ioannidis, “Medicine. big data meets public health.” Science, vol. 346, no. 6213, pp. 1054–1055, 2014.
[6] L. Einav and J. Levin, “Economics in the age of big data,” Science, vol. 346, no. 6210, p. 1243089, 2014.
[7] M. Jordan and T. Mitchell, “Machine learning: Trends, perspectives, and prospects,” Science, vol. 349, no. 6245, pp. 255–260, 2015.
[8] J. N. Kutz, Data-Driven Modeling & Scientific Computation: Methods for Complex Systems & Big Data. Oxford University Press, 2013.
[9] S. L. Brunton and J. N. Kutz, Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Cambridge University
Press, 2019.
[10] D. Donoho, “50 years of data science,” in Princeton NJ, Tukey Centennial Workshop, 2015.
[11] A. J. Hey and A. E. Trefethen, The data deluge: An e-science perspective. Wiley and Sons, 2003.
[12] A. J. Hey, S. Tansley, K. M. Tolle et al., “The fourth paradigm: data-intensive scientific discovery,” Microsoft Research Redmond, WA, 2009.
[13] K. Duraisamy, G. Iaccarino, and H. Xiao, “Turbulence modeling in the age of data,” Annual Reviews of Fluid Mechanics, vol. 51, pp. 357–377,
2019.
[14] S. L. Brunton, B. R. Noack, and P. Koumoutsakos, “Machine learning for fluid mechanics,” Annual Review of Fluid Mechanics, vol. 52, pp.
477–508, 2020.
[15] S. L. Brunton and J. N. Kutz, “Methods for data-driven multiscale model discovery for materials,” Journal of Physics: Materials, vol. 2, no. 4,
p. 044002, 2019.
[16] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016.
[17] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference
on computer vision and pattern recognition. Ieee, 2009, pp. 248–255.
[18] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural
information processing systems, 2012, pp. 1097–1105.
[19] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
[20] D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel et al., “A general
reinforcement learning algorithm that masters chess, shogi, and go through self-play,” Science, vol. 362, no. 6419, pp. 1140–1144, 2018.
[21] K. Hornik, M. Stinchcombe, H. White et al., “Multilayer feedforward networks are universal approximators.” Neural networks, vol. 2, no. 5,
pp. 359–366, 1989.
[22] J.-C. Loiseau and S. L. Brunton, “Constrained sparse Galerkin regression,” Journal of Fluid Mechanics, vol. 838, pp. 42–67, 2018.
[23] M. Raissi and G. E. Karniadakis, “Hidden physics models: Machine learning of nonlinear partial differential equations,” Journal of Compu-
tational Physics, vol. 357, pp. 125–141, 2018.
[24] J.-C. Loiseau, B. R. Noack, and S. L. Brunton, “Sparse reduced-order modeling: sensor-based dynamics to full-state estimation,” J. Fluid
Mech., vol. 844, pp. 459–490, 2018.
31
[25] P. W. Battaglia, J. B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. Zambaldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkner
et al., “Relational inductive biases, deep learning, and graph networks,” arXiv preprint arXiv:1806.01261, 2018.
[26] M. Raissi, P. Perdikaris, and G. Karniadakis, “Physics-informed neural networks: A deep learning framework for solving forward and
inverse problems involving nonlinear partial differential equations,” Journal of Computational Physics, vol. 378, pp. 686–707, 2019.
[27] F. Noé, S. Olsson, J. Köhler, and H. Wu, “Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning,”
Science, vol. 365, no. 6457, p. eaaw1147, 2019.
[28] J. Köhler, L. Klein, and F. Noé, “Equivariant flows: sampling configurations for multi-body systems with symmetric energies,” arXiv
preprint arXiv:1910.00753, 2019.
[29] M. D. Cranmer, R. Xu, P. Battaglia, and S. Ho, “Learning symbolic physics with graph networks,” arXiv preprint arXiv:1909.05862, 2019.
[30] M. Raissi, A. Yazdani, and G. E. Karniadakis, “Hidden fluid mechanics: Learning velocity and pressure fields from flow visualizations,”
Science, vol. 367, no. 6481, pp. 1026–1030, 2020.
[31] M. Cranmer, A. Sanchez-Gonzalez, P. Battaglia, R. Xu, K. Cranmer, D. Spergel, and S. Ho, “Discovering symbolic models from deep
learning with inductive biases,” arXiv preprint arXiv:2006.11287, 2020.
[32] M. Cranmer, S. Greydanus, S. Hoyer, P. Battaglia, D. Spergel, and S. Ho, “Lagrangian neural networks,” arXiv preprint arXiv:2003.04630,
2020.
[33] K. Taira, S. L. Brunton, S. Dawson, C. W. Rowley, T. Colonius, B. J. McKeon, O. T. Schmidt, S. Gordeyev, V. Theofilis, and L. S. Ukeiley,
“Modal analysis of fluid flows: An overview,” AIAA Journal, vol. 55, no. 12, pp. 4013–4041, 2017.
[34] K. Taira, M. S. Hemati, S. L. Brunton, Y. Sun, K. Duraisamy, S. Bagheri, S. T. Dawson, and C.-A. Yeh, “Modal analysis of fluid flows:
Applications and outlook,” AIAA Journal, vol. 58, no. 3, pp. 998–1022, 2020.
[35] T. Colonius, “An overview of simulation, modeling, and active control of flow/acoustic resonance in open cavities,” in 39th Aerospace
Sciences Meeting and Exhibit, 2001, p. 76.
[36] S. L. Brunton and B. R. Noack, “Closed-loop turbulence control: Progress and challenges,” Applied Mechanics Reviews, vol. 67, pp. 050 801–
1–050 801–48, 2015.
[37] P. Jordan and T. Colonius, “Wave packets and turbulent jet noise,” Annual Review of Fluid Mechanics, vol. 45, pp. 173–195, 2013.
[38] P. J. Schmid, “Dynamic mode decomposition of numerical and experimental data,” Journal of Fluid Mechanics, vol. 656, pp. 5–28, Aug. 2010.
[39] J. H. Tu, C. W. Rowley, D. M. Luchtenburg, S. L. Brunton, and J. N. Kutz, “On dynamic mode decomposition: theory and applications,”
Journal of Computational Dynamics, vol. 1, no. 2, pp. 391–421, 2014.
[40] J. N. Kutz, S. L. Brunton, B. W. Brunton, and J. L. Proctor, Dynamic Mode Decomposition: Data-Driven Modeling of Complex Systems. SIAM,
2016.
[41] I. Mezic, “Analysis of fluid flows via spectral properties of the Koopman operator,” Annual Review of Fluid Mechanics, vol. 45, pp. 357–378,
2013.
[42] S. L. Brunton, B. W. Brunton, J. L. Proctor, and J. N. Kutz, “Koopman invariant subspaces and finite linear representations of nonlinear
dynamical systems for control,” PLoS ONE, vol. 11, no. 2, p. e0150171, 2016.
[43] B. J. McKeon and A. S. Sharma, “A critical layer model for turbulent pipe flow,” J. Fluid Mech., vol. 658, no. 336382, 2010.
[44] S. L. Brunton, J. L. Proctor, and J. N. Kutz, “Discovering governing equations from data by sparse identification of nonlinear dynamical
systems,” Proceedings of the National Academy of Sciences, vol. 113, no. 15, pp. 3932–3937, 2016.
[45] S. A. Renganathan, “Koopman-based approach to nonintrusive reduced order modeling: Application to aerodynamic shape optimization
and uncertainty propagation,” AIAA Journal, vol. 58, no. 5, pp. 2221–2235, 2020.
[46] B. R. Noack, K. Afanasiev, M. Morzynski, G. Tadmor, and F. Thiele, “A hierarchy of low-dimensional models for the transient and post-
transient cylinder wake,” Journal of Fluid Mechanics, vol. 497, pp. 335–363, 2003.
[47] K. Carlberg, C. Farhat, J. Cortial, and D. Amsallem, “The GNAT method for nonlinear model reduction: effective implementation and
application to computational fluid dynamics and turbulent flows,” Journal of Computational Physics, vol. 242, pp. 623–647, 2013.
[48] T. Bui-Thanh, K. Willcox, O. Ghattas, and B. van Bloemen Waanders, “Goal-oriented, model-constrained optimization for reduction of
large-scale systems,” Journal of Computational Physics, vol. 224, no. 2, pp. 880–896, 2007.
[49] T. Bui-Thanh, K. Willcox, and O. Ghattas, “Model reduction for large-scale systems with high-dimensional parametric input space,” SIAM
Journal on Scientific Computing, vol. 30, no. 6, pp. 3270–3288, 2008.
[50] D. Amsallem, M. Zahr, Y. Choi, and C. Farhat, “Design optimization using hyper-reduced-order models,” Structural and Multidisciplinary
Optimization, vol. 51, no. 4, pp. 919–940, 2015.
[51] P. Benner, S. Gugercin, and K. Willcox, “A survey of projection-based model reduction methods for parametric dynamical systems,” SIAM
review, vol. 57, no. 4, pp. 483–531, 2015.
[52] K. Carlberg, M. Barone, and H. Antil, “Galerkin v. least-squares Petrov–Galerkin projection in nonlinear model reduction,” Journal of
Computational Physics, vol. 330, pp. 693–734, 2017.
[53] K. T. Carlberg, A. Jameson, M. J. Kochenderfer, J. Morton, L. Peng, and F. D. Witherden, “Recovering missing CFD data for high-order
discretizations using deep neural networks and dynamics learning,” arXiv preprint arXiv:1812.01177, 2018.
[54] A. P. Singh, S. Medida, and K. Duraisamy, “Machine-learning-augmented predictive modeling of turbulent separated flows over airfoils,”
AIAA journal, vol. 55, no. 7, pp. 2215–2227, 2017.
[55] S. Boyd and L. Vandenberghe, Convex optimization. Cambridge university press, 2004.
[56] R. E. Kalman, “A new approach to linear filtering and prediction problems,” Journal of Fluids Engineering, vol. 82, no. 1, pp. 35–45, 1960.
[57] D. P. Bertsekas, Dynamic programming and optimal control. Athena scientific Belmont, MA, 1995, vol. 1(2).
[58] C. V. Rao, S. J. Wright, and J. B. Rawlings, “Application of interior-point methods to model predictive control,” Journal of optimization theory
and applications, vol. 99, no. 3, pp. 723–757, 1998.
[59] G. E. Dullerud and F. Paganini, A course in robust control theory: A convex approach, ser. Texts in Applied Mathematics. Berlin, Heidelberg:
Springer, 2000.
[60] S. Skogestad and I. Postlethwaite, Multivariable feedback control: analysis and design, 2nd ed. Hoboken, New Jersey: John Wiley & Sons,
Inc., 2005.
[61] S. Joshi and S. Boyd, “Sensor selection via convex optimization,” IEEE Transactions on Signal Processing, vol. 57, no. 2, pp. 451–462, 2009.
[62] A. Aravkin, J. V. Burke, L. Ljung, A. Lozano, and G. Pillonetto, “Generalized Kalman smoothing: Modeling and algorithms,” Automatica,
vol. 86, pp. 63–86, 2017.
[63] K. Manohar, B. W. Brunton, J. N. Kutz, and S. L. Brunton, “Data-driven sparse sensor placement for reconstruction: Demonstrating the
benefits of exploiting known patterns,” IEEE Cont. Sys. Mag., vol. 38, no. 3, pp. 63–86, 2018.
[64] L. Bottou, F. E. Curtis, and J. Nocedal, “Optimization methods for large-scale machine learning,” Siam Review, vol. 60, no. 2, pp. 223–311,
2018.
[65] A. Shapiro, D. Dentcheva, and A. Ruszczyński, Lectures on stochastic programming: modeling and theory. SIAM, 2009.
32
[66] J. Tsitsiklis, D. Bertsekas, and M. Athans, “Distributed asynchronous deterministic and stochastic gradient optimization algorithms,” IEEE
transactions on automatic control, vol. 31, no. 9, pp. 803–812, 1986.
[67] L. Bottou, “Large-scale machine learning with stochastic gradient descent,” in Proceedings of COMPSTAT’2010. Springer, 2010, pp. 177–
186.
[68] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[69] D. Davis, D. Drusvyatskiy, S. Kakade, and J. D. Lee, “Stochastic subgradient method converges on tame functions,” Foundations of Compu-
tational Mathematics, pp. 1–36, 2018.
[70] S. J. Reddi, S. Sra, B. Poczos, and A. J. Smola, “Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization,” in
Advances in Neural Information Processing Systems, 2016, pp. 1145–1153.
[71] A. Aravkin and D. Davis, “Trimmed statistical estimation via variance reduction,” Mathematics of Operations Research, 2019.
[72] R. T. Rockafellar and R. J.-B. Wets, Variational analysis. Springer Science & Business Media, 2009, vol. 317.
[73] J. V. Burke and M. C. Ferris, “A gauss—newton method for convex composite optimization,” Mathematical Programming, vol. 71, no. 2, pp.
179–194, 1995.
[74] D. Davis and D. Drusvyatskiy, “Stochastic model-based minimization of weakly convex functions,” SIAM Journal on Optimization, vol. 29,
no. 1, pp. 207–239, 2019.
[75] G. Golub and V. Pereyra, “Separable nonlinear least squares: the variable projection method and its applications,” Inverse problems, vol. 19,
no. 2, p. R1, 2003.
[76] P. Zheng, T. Askham, S. L. Brunton, J. N. Kutz, and A. Y. Aravkin, “Sparse relaxed regularized regression: SR3,” IEEE Access, vol. 7, no. 1,
pp. 1404–1423, 2019.
[77] J. Nocedal and S. Wright, Numerical optimization. Springer Science & Business Media, 2006.
[78] P. L. Combettes and J.-C. Pesquet, “Proximal splitting methods in signal processing,” in Fixed-point algorithms for inverse problems in science
and engineering. Springer, 2011, pp. 185–212.
[79] N. Parikh, S. Boyd et al., “Proximal algorithms,” Foundations and Trends® in Optimization, vol. 1, no. 3, pp. 127–239, 2014.
[80] H. Attouch, J. Bolte, and B. F. Svaiter, “Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms,
forward–backward splitting, and regularized gauss–seidel methods,” Mathematical Programming, vol. 137, no. 1-2, pp. 91–129, 2013.
[81] M. W. Mahoney, “Randomized algorithms for matrices and data,” Foundations and Trends in Machine Learning, vol. 3, pp. 123–224, 2011.
[82] N. Halko, P. G. Martinsson, and J. A. Tropp, “Finding structure with randomness: Probabilistic algorithms for constructing approximate
matrix decompositions,” SIAM Review, vol. 53, no. 2, pp. 217–288, 2011.
[83] E. Liberty, “Simple and deterministic matrix sketching,” in Proceedings of the 19th ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining. ACM, 2013, pp. 581–588.
[84] N. B. Erichson, S. Voronin, S. L. Brunton, and J. N. Kutz, “Randomized matrix decompositions using R,” Journal of Statistical Software,
vol. 89, no. 1, pp. 1–48, 2019.
[85] T. Sarlos, “Improved approximation algorithms for large matrices via random projections,” in Foundations of Computer Science. 47th Annual
IEEE Symposium on, 2006, pp. 143–152.
[86] P.-G. Martinsson, V. Rokhlin, and M. Tygert, “A randomized algorithm for the decomposition of matrices,” Applied and Computational
Harmonic Analysis, vol. 30, pp. 47–68, 2011.
[87] V. Rokhlin, A. Szlam, and M. Tygert, “A randomized algorithm for principal component analysis,” SIAM Journal on Matrix Analysis and
Applications, vol. 31, pp. 1100–1124, 2009.
[88] N. Halko, P.-G. Martinsson, Y. Shkolnisky, and M. Tygert, “An algorithm for the principal component analysis of large data sets,” SIAM
Journal on Scientific Computing, vol. 33, pp. 2580–2594, 2011.
[89] J. A. Duersch and M. Gu, “Randomized QR with column pivoting,” SIAM Journal on Scientific Computing, vol. 39, no. 4, pp. C263–C291,
2017.
[90] G. Shabat, Y. Shmueli, Y. Aizenbud, and A. Averbuch, “Randomized lu decomposition,” Applied and Computational Harmonic Analysis,
2016.
[91] N. B. Erichson, K. Manohar, S. L. Brunton, and J. N. Kutz, “Randomized CP tensor decomposition,” Machine Learning: Science and Technol-
ogy, vol. 1, no. 2, p. 025012, 2020.
[92] N. B. Erichson, L. Mathelin, J. N. Kutz, and S. L. Brunton, “Randomized dynamic mode decomposition,” SIAM Journal on Applied Dynamical
Systems, vol. 18, no. 4, pp. 1867–1891, 2019.
[93] A. Alla and J. N. Kutz, “Randomized model order reduction,” Advances in Computational Mathematics, vol. 45, no. 3, pp. 1251–1271, 2019.
[94] Z. Bai, E. Kaiser, J. L. Proctor, J. N. Kutz, and S. L. Brunton, “Dynamic mode decomposition for compressive system identification,” AIAA
Journal, vol. 58, no. 2, pp. 561–574, 2020.
[95] P. J. Huber, “John W. Tukey’s contributions to robust statistics,” Annals of statistics, pp. 1640–1648, 2002.
[96] E. J. Candès, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?” Journal of the ACM, vol. 58, no. 3, p. 11, 2011.
[97] M. Gavish and D. L. Donoho, “The optimal hard threshold for singular values is 4/sqrt(3),” IEEE Transactions on Information Theory, vol. 60,
no. 8, pp. 5040–5053, 2014.
[98] T. Bouwmans and E. H. Zahzah, “Robust PCA via principal component pursuit: A review for a comparative evaluation in video surveil-
lance,” Computer Vision and Image Understanding, vol. 122, pp. 22–34, 2014.
[99] D. Kondor, I. Csabai, L. Dobos, J. Szule, N. Barankai, T. Hanyecz, T. Sebok, Z. Kallus, and G. Vattay, “Using robust PCA to estimate
regional characteristics of language use from geo-tagged twitter messages,” in Cognitive Infocommunications (CogInfoCom), 2013 IEEE 4th
International Conference on. IEEE, 2013, pp. 393–398.
[100] J. Wright, A. Ganesh, S. Rao, Y. Peng, and Y. Ma, “Robust principal component analysis: Exact recovery of corrupted low-rank matrices
via convex optimization,” in Advances in neural information processing systems, 2009, pp. 2080–2088.
[101] S. Boschert and R. Rosen, “Digital twin—the simulation aspect,” in Mechatronic futures. Springer, 2016, pp. 59–74.
[102] M. Grieves and J. Vickers, “Digital twin: Mitigating unpredictable, undesirable emergent behavior in complex systems,” in Transdisci-
plinary perspectives on complex systems. Springer, 2017, pp. 85–113.
[103] F. Tao, J. Cheng, Q. Qi, M. Zhang, H. Zhang, and F. Sui, “Digital twin-driven product design, manufacturing and service with big data,”
The International Journal of Advanced Manufacturing Technology, vol. 94, no. 9-12, pp. 3563–3576, 2018.
[104] A. Rasheed, O. San, and T. Kvamsdal, “Digital twin: Values, challenges and enablers,” arXiv preprint arXiv:1910.01719, 2019.
[105] F. Chinesta, E. Cueto, E. Abisset-Chavanne, J. L. Duval, and F. El Khaldi, “Virtual, digital and hybrid twins: a new paradigm in data-based
engineering and engineered data,” Archives of computational methods in engineering, vol. 27, no. 1, pp. 105–134, 2020.
[106] R. Everson and L. Sirovich, “Karhunen–Loève procedure for gappy data,” Journal of the Optical Society of America A, vol. 12, no. 8, pp.
1657–1664, 1995.
33
[107] K. Willcox, “Unsteady flow sensing and estimation via the gappy proper orthogonal decomposition,” Computers & Fluids, vol. 35, no. 2,
pp. 208–226, 2006.
[108] M. Barrault, Y. Maday, N. C. Nguyen, and A. T. Patera, “An ‘empirical interpolation’ method: application to efficient reduced-basis
discretization of partial differential equations,” Comptes Rendus Mathematique, vol. 339, no. 9, pp. 667–672, 2004.
[109] S. Chaturantabut and D. C. Sorensen, “Nonlinear model reduction via discrete empirical interpolation,” SIAM Journal on Scientific Com-
puting, vol. 32, no. 5, pp. 2737–2764, 2010.
[110] Z. Drmac and S. Gugercin, “A new selection operator for the discrete empirical interpolation method—improved a priori error bound and
extensions,” SIAM Journal on Scientific Computing, vol. 38, no. 2, pp. A631–A648, 2016.
[111] E. J. Candès et al., “Compressive sampling,” in Proceedings of the International Congress of Mathematicians, vol. 3. Madrid, Spain, 2006, pp.
1433–1452.
[112] D. L. Donoho, “Compressed sensing,” Information Theory, IEEE Transactions on, vol. 52, no. 4, pp. 1289–1306, 2006.
[113] E. J. Candès, J. Romberg, and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Communications in Pure and
Applied Mathematics, vol. 8, no. 1207–1223, 59.
[114] E. J. Candès and T. Tao, “Near optimal signal recovery from random projections: Universal encoding strategies?” IEEE Transactions on
Information Theory, vol. 52, no. 12, pp. 5406–5425, 2006.
[115] E. J. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency infor-
mation,” IEEE Transactions on Information Theory, vol. 52, no. 2, pp. 489–509, 2006.
[116] R. G. Baraniuk, “Compressive sensing,” IEEE signal processing magazine, vol. 24, no. 4, 2007.
[117] N. K. Dhingra, M. R. Jovanovic, and Z.-Q. Luo, “An ADMM algorithm for optimal sensor and actuator selection,” 53rd IEEE Conference on
Decision and Control, pp. 4039–4044, 2014.
[118] K. Manohar, S. L. Brunton, and J. N. Kutz, “Environmental identification in flight using sparse approximation of wing strain,” Journal of
Fluids and Structures, vol. 70, pp. 162–180, 2017.
[119] K. Maute, M. Nikbay, and C. Farhat, “Coupled analytical sensitivity analysis and optimization of three-dimensional nonlinear aeroelastic
systems,” AIAA journal, vol. 39, no. 11, pp. 2051–2061, 2001.
[120] J. Bongard and H. Lipson, “Automated reverse engineering of nonlinear dynamical systems,” Proceedings of the National Academy of Sciences,
vol. 104, no. 24, pp. 9943–9948, 2007.
[121] M. Schmidt and H. Lipson, “Distilling free-form natural laws from experimental data,” Science, vol. 324, no. 5923, pp. 81–85, 2009.
[122] S. L. Brunton, B. W. Brunton, J. L. Proctor, E. Kaiser, and J. N. Kutz, “Chaos as an intermittently forced linear system,” Nature Communica-
tions, vol. 8, no. 19, pp. 1–9, 2017.
[123] M. C. Kennedy and A. O’Hagan, “Bayesian calibration of computer models,” Journal of the Royal Statistical Society: Series B (Statistical
Methodology), vol. 63, no. 3, pp. 425–464, Aug 2001.
[124] P. Arendt, D. W. Apley, and W. Chen, “Quantification of model uncertainty: Calibration, model discrepancy, and identifiability,” Journal of
Mechanical Design, vol. 134, p. 100908, 09 2012.
[125] J. Quiñonero-Candela and C. E. Rasmussen, “A unifying view of sparse approximate Gaussian process regression,” Journal of Machine
Learning Research, vol. 6, no. Dec, pp. 1939–1959, 2005.
[126] K. Kaheman, E. Kaiser, B. Strom, J. N. Kutz, and S. L. Brunton, “Learning discrepancy models from experimental data,” arXiv preprint
arXiv:1909.08574, 2019.
[127] D. P. Raymer, Aircraft Design: A Conceptual Approach, 2nd ed., ser. AIAA Education Series. Washington, DC: American Institute of
Aeronautics and Astronautics, 1992.
[128] W. B. Oswald, “The transverse force distribution on ellipsoidal and nearly ellipsoidal bodies moving in an arbitrary potential flow,” Ph.D.
dissertation, California Institute of Technology, 1932.
[129] ——, “General formulas and charts for the calculation of airplane performance,” National Advisory Commitee for Aeronautics, Tech. Rep.
NACA-TR-408, 1932.
[130] E. J. Cramer, J. E. Dennis, Jr., P. D. Frank, R. M. Lewis, and G. R. Shubin, “Problem formulation for multidisciplinary optimization,” SIAM
Journal on Optimization, vol. 4, pp. 754–776, 11 1994.
[131] A. Booker, J. Dennis, P. Frank, D. Serafini, V. Torczon, and M. Trosset, “A rigorous framework for optimization of expensive functions by
surrogates,” Structural Optimization, vol. 17, 09 1998.
[132] K. Bowcutt, G. Kuruvila, T. A. Grandine, T. A. Hogan, and E. J. Cramer, “Advancements in multidisciplinary design optimization applied
to hypersonic vehicles to achieve closure,” in 15th International Space Planes and Hypersonic Systems and Technologies Conference. Dayton,
OH: AIAA 2008-2591, 2008.
[133] R. Henderson, J. Martins, and R. Perez, “Aircraft conceptual design for optimal environmental performance,” The Aeronautical Journal, vol.
116, pp. 1–22, 01 2012.
[134] J. Martins and A. Lambe, “Multidisciplinary design optimization: A survey of architectures,” AIAA Journal, vol. 51, pp. 2049–2075, 09
2013.
[135] N. Bons and J. Martins, “Aerostructural wing design exploration with multidisciplinary design optimization,” in AIAA Scitech 2020
Forum. Orlando, FL: AIAA 2020-0544, 2020. [Online]. Available: https://arc.aiaa.org/doi/abs/10.2514/6.2020-0544
[136] I. B. et al., “A review on the mechanical modeling of composite manufacturing processes,” Archives of Computational Methods in Engineering,
no. 24, pp. 365–395, 1 2017.
[137] S. Singh, E. Shebab, N. Higgins, K. Fowler, T. Tomiyama, and C. Fowler, “Challenges of digital twin in high value manufacturing,” SAE
Technical Papers, 2018.
[138] K. Manohar, T. Hogan, J. Buttrick, A. G. Banerjee, J. N. Kutz, and S. L. Brunton, “Predicting shim gaps in aircraft assembly with machine
learning and sparse sensing,” Journal of Manufacturing Systems, vol. 48, pp. 87–95, 2018.
[139] B. Conduit, N. Jones, H. Stone, and G. Conduit, “Design of a nickel-base superalloy using a neural network,” Materials & Design, vol. 131,
pp. 358–365, 2017.
[140] P. Verpoort, P. MacDonald, and G. J. Conduit, “Materials data validation and imputation with an artificial neural network,” Computational
Materials Science, vol. 147, pp. 176–185, 2018.
[141] B. Conduit, N. G. Jones, H. J. Stone, and G. J. Conduit, “Probabilistic design of a molybdenum-base alloy using a neural network,” Scripta
Materialia, vol. 146, pp. 82–86, 2018.
[142] A. G. Green, G. Conduit, and F. Krüger, “Quantum order-by-disorder in strongly correlated metals,” Annual Review of Condensed Matter
Physics, vol. 9, pp. 59–77, 2018.
[143] U. Sachs, “Friction and bending in thermoplastic composites forming processes,” PhD Thesis, University of Twente, Enschede, The Netherlands,
2014.
[144] S. Hannappel, “Forming of UD fibre reinforced thermoplastics,” PhD Thesis, University of Twente, Enschede, The Netherlands, 2013.
34
[145] P. Juarez, E. Gregory, and K. Cramer, “In situ thermal inspection of automated fiber placement manufacturing,” SAMPE journal, 2020.
[146] J. Cemenska, T. Rudberg, and M. Henscheid, “Automated in-process inspection system for AFP machines,” SAE Int. J. Aerosp., no. 8, 2
2015.
[147] C. Sacco, A. Radwan, T. Beatty, and R. Harik, “Machine learning based AFP inspection: A tool for characterization and integration,”
SAMPE journal, 2020.
[148] A. Blom, “Structural performance of fiber-placed, variable-stiffness composite conical and cylindrical shells,” PhD Thesis, Delft University
of Technology, 2010.
[149] D. A. Cohn, Z. Ghahramani, and M. I. Jordan, “Active learning with statistical models,” Journal of artificial intelligence research, vol. 4, pp.
129–145, 1996.
[150] B. Settles, “Active learning literature survey,” University of Wisconsin-Madison Department of Computer Sciences, Tech. Rep., 2009.
[151] B. J. Marsh, “Laser tracker assisted aircraft machining and assembly,” SAE Technical Paper, Tech. Rep., 2008.
[152] J. Jamshidi, A. Kayani, P. Iravani, P. G. Maropoulos, and M. Summers, “Manufacturing and assembly automation by integrated metrology
systems for aircraft wing fabrication,” Proceedings of the Institution of Mechanical Engineers, Part B: Journal of Engineering Manufacture, vol.
224, no. 1, pp. 25–36, 2010.
[153] J. E. Muelaner and P. G. Maropoulos, “Design for measurement assisted determinate assembly (MADA) of large composite structures,” in
Journal of the Coordinate Metrology Systems Conference. University of Bath, 2010.
[154] B. J. Marsh, T. Vanderwiel, K. VanScotter, and M. Thompson, “Method for fitting part assemblies,” Jul. 13 2010.
[155] J. E. Muelaner, A. Kayani, O. Martin, and P. Maropoulos, “Measurement assisted assembly and the roadmap to part-to-part assembly,” in
7th international conference on digital enterprise technology. University of Bath, 2011, pp. 11–19.
[156] B. Chouvion, A. Popov, S. Ratchev, C. Mason, and M. Summers, “Interface management in wing-box assembly,” SAE Technical Paper,
Tech. Rep., 2011.
[157] J. E. Muelaner and P. Maropoulos, “Integrated dimensional variation management in the digital factory,” in 7th International Conference on
Digital Enterprise Technology. University of Bath, 2011, pp. 39–46.
[158] J. Muelaner, O. Martin, and P. Maropoulos, “Achieving low cost and high quality aero structure assembly through integrated digital
metrology systems,” Procedia CIRP, vol. 7, pp. 688–693, 2013.
[159] T. M. Boyl-Davis, D. D. Jones, and T. E. Zimmerman, “Digitally designed shims for joining parts of an assembly,” 2014.
[160] C. M. Vasquez, T. M. Boyl-Davis, D. I. Valenzuela, and D. D. Jones, “Systems and methods for robotic measurement of parts,” 2014.
[161] D. Valenzuela, T. M. Boyl-Davis, and D. D. Jones, “Systems, methods, and apparatus for automated predictive shimming for large struc-
tures,” 2015.
[162] T. M. Boyl-Davis, D. D. Jones, and T. E. Zimmerman, “Methods of fabricating shims for joining parts,” Aug. 30 2016.
[163] J. C. Antolin-Urbaneja, J. Livinalli, M. Puerto, M. Liceaga, A. Rubio, A. San-Roman, and I. Goenaga, “End-effector for automatic shimming
of composites,” SAE Technical Paper, Tech. Rep., 2016.
[164] G. Golub and W. Kahan, “Calculating the singular values and pseudo-inverse of a matrix,” Journal of the Society for Industrial & Applied
Mathematics, Series B: Numerical Analysis, vol. 2, no. 2, pp. 205–224, 1965.
[165] B. W. Brunton, S. L. Brunton, J. L. Proctor, and J. N. Kutz, “Sparse sensor placement optimization for classification,” SIAM Journal on
Applied Mathematics, vol. 76, no. 5, pp. 2099–2122, 2016.
[166] W. Johnson, “Model for vortex ring state influence on rotorcraft flight dynamics,” 2005.
[167] G. J. Gray, “Report of the panel to review the V-22 program,” Department of Defense Panel Review, Tech. Rep., 2001.
[168] T. Du, A. Schulz, B. Zhu, B. Bickel, and W. Matusik, “Computational multicopter design,” 2016.
[169] T. Patil and T. Davenport, “Data scientist: The sexiest job of the 21st century,” Harvard business review, vol. 90, no. 10, pp. 70–76, 2012.
35