0% found this document useful (0 votes)
2 views

Machine_Learning_Approaches_to_Single-Cell_Data_Integration_and_Translation

This article discusses the integration of machine learning techniques to address challenges in analyzing single-cell biological data, which often suffers from limitations due to its destructive nature and lack of comprehensive measurements. It highlights three key translation problems: integrating different data modalities, generating cell lineages over time, and predicting the effects of perturbations across cell types. The authors propose using generative modeling, optimal transport, and causal inference as computational approaches to overcome these experimental limitations and gain biological insights.

Uploaded by

siaseria05
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Machine_Learning_Approaches_to_Single-Cell_Data_Integration_and_Translation

This article discusses the integration of machine learning techniques to address challenges in analyzing single-cell biological data, which often suffers from limitations due to its destructive nature and lack of comprehensive measurements. It highlights three key translation problems: integrating different data modalities, generating cell lineages over time, and predicting the effects of perturbations across cell types. The authors propose using generative modeling, optimal transport, and causal inference as computational approaches to overcome these experimental limitations and gain biological insights.

Uploaded by

siaseria05
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Machine Learning

Approaches to Single-Cell
Data Integration and
Translation
This article describes how problems arising in the analysis of individual biological cells
and of cell-to-cell variability have inspired foundational developments in
machine learning.
By CAROLINE U HLER AND G. V. S HIVASHANKAR

ABSTRACT | Experimental single-cell data often presents an single-cell biology, illustrated through concrete examples from
incomplete picture due to its destructive nature: 1) we collect our own work. We end with open problems and a perspective
certain experimental measurements of cells but lack mea- on how biology may not only be uniquely suited to being one
surements under different experimental conditions or data of the greatest beneficiaries of machine learning but also one
modalities; 2) we collect data of cells at certain time points of the greatest sources of inspiration for it.
but lack measurements at other time points; or 3) we col-
KEYWORDS | Causality; deep learning; generative model-
lect data of cells under certain perturbations but lack data
ing; genome organization; imaging; optimal transport; rep-
for other types of perturbations. In this article, we will dis-
resentation learning; sequencing; single-cell biology; spatial
cuss machine learning approaches to address these types
transcriptomics.
of translation and counterfactual problems. We will begin by
giving an overview on single-cell biology applications and the I. I N T R O D U C T I O N
relevant translation problems. Subsequently, we will provide Cells in tissue microenvironments are highly heteroge-
an overview of approaches for multidomain alignment and neous [1], [2]. For one, humans are made of many
translation in machine learning, including methods based on different cell types (muscle cells, bone cells, neurons,
generative modeling, optimal transport, and causal inference. and so on) that are very different in their architecture
The bulk of this article will focus on how these approaches have (i.e., appearance) and function. In addition, within a cell
been tailored and applied to important translation problems in type, there are many different subpopulations of cells that
take on different roles and show different spatial organiza-
tion within a tissue. Research in recent years has started to
Manuscript received September 6, 2021; revised March 15, 2022; accepted reveal that how a cell interacts with its neighbors impacts
March 31, 2022. Date of publication April 25, 2022; date of current version
May 19, 2022. The work of Caroline Uhler was supported in part by NSF under
both its architecture, which can be analyzed via imaging,
Grant DMS-1651995, in part by the Office of Naval Research (ONR) under Grant as well as its functional output, which is measured in
N00014-17-1-2147, in part by the Eric and Wendy Schmidt Center, in part by the
MIT-IBM Watson AI Lab, in part by the MIT J-Clinic for Machine Learning and
the expression of a cell’s genes and the translation to
Health, and in part by a Simons Investigator Award. The work of proteins [3], [4]. In order to better understand the role
G. V. Shivashankar was supported by ETH Zurich. (Corresponding author:
Caroline Uhler.)
of heterogeneity at the single-cell level as well as the
Caroline Uhler is with the Broad Institute and the Laboratory for Information relationship between structure/architecture and function
and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA
02139 USA (e-mail: [email protected]).
of a cell, it has become essential to study single cells.
G. V. Shivashankar is with the Paul Scherrer Institute, 5232 Villigen, Furthermore, an increasing number of studies pro-
Switzerland, and also with ETH Zürich, 8092 Zürich, Switzerland (e-mail:
vide compelling evidence that many diseases start at the
[email protected]).
single-cell level; for example, tumorigenesis is believed
Digital Object Identifier 10.1109/JPROC.2022.3166132 to be initiated at the single-cell level within the stromal

0018-9219 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Vol. 110, No. 5, May 2022 | P ROCEEDINGS OF THE IEEE 557

Authorized licensed use limited to: CATHOLIC UNIVERSITY OF KOREA. Downloaded on April 15,2023 at 03:14:20 UTC from IEEE Xplore. Restrictions apply.
Uhler and Shivashankar: Machine Learning Approaches to Single-Cell Data Integration and Translation

Fig. 1. Coupling between cellular architecture and function. (a) Fibroblast cells embedded into a collagen gel showing heterogeneous
cellular architectures. (b) Geometric control of cellular architectures results in distinct gene expression profiles; color coding in expression
output indicates a number of differentially expressed genes in the two geometries. (c) Cells in different architectures when stimulated with
the same signal exhibit distinct architecture-dependent expression responses. (d) Schematic of how cellular architecture is linked to gene
expression response to environmental stimuli via the spatial organization of the genome. Figures adapted from our works [9], [16].

microenvironment [5]. In addition, aging is conjectured This information template has to be packed into a few
to arise due to the decline of cellular function in single micrometers sized cell nucleus to produce cell-state specific
cells that transition into more senescent cell states [6], gene expression programs. Recent studies have revealed
leading to aging-related disorders such as fibrosis [7] that how the DNA is packed in the cell nucleus is central to
as well as neurodegeneration [8]. It is therefore critical the control of gene expression [11], [12]. DNA packing
to study single cells, how they behave in their physical involves two major principles of organization: cell-state
microenvironment and how they interpret signals from specific condensation patterns as well as spatial proximity
their microenvironment to regulate their functional prop- of specific sequences of the DNA [13], [14]. More precisely,
erties, in order to understand the heterogeneous cell states recent research has shown that the sequences of genes
that are necessary to promote tissue homeostasis and coding for proteins that are not required for a given cell
discern from this the cell states leading to disease. type are highly condensed, whereas genes that need to be
From a genomics perspective, the coupling between the transcribed are found in more open regions of the DNA.
architecture of a cell and gene expression is intimately In addition, some of the key regulatory genes that define a
linked to the way our genetic material is packed in the cell type are often found to be spatially clustered in the cell
cell nucleus [9]. Although our body consists of many nucleus, thereby facilitating their efficient transcription
different cell types, it is the same genetic material that and coregulation [15], see Fig. 1.
is packed in each of the cells. This genetic material, i.e., Recent years have seen an explosion of single-cell tech-
the DNA, can be seen as an information template. While nologies that allow profiling thousands of cells as well
it is the same in every cell, only parts of this template as measuring in every cell thousands of variables. For
are read out in each cell to produce cell-type-specific example, imaging technologies can visualize single cells
expression. For example, neurons, bone cells, and muscle at high resolution using multiple color channels to resolve
cells all show very distinct gene expression programs. The subcellular structures, such as the cell membrane, pack-
DNA in human cells is packed into 46 chromosomes, and ing of the DNA in the cell nucleus, other organelles, as
when stretched out, it is approximately 2 m long [10]. well as RNA and protein localization [17]–[19]. Likewise,

558 P ROCEEDINGS OF THE IEEE | Vol. 110, No. 5, May 2022

Authorized licensed use limited to: CATHOLIC UNIVERSITY OF KOREA. Downloaded on April 15,2023 at 03:14:20 UTC from IEEE Xplore. Restrictions apply.
Uhler and Shivashankar: Machine Learning Approaches to Single-Cell Data Integration and Translation

single-cell sequencing methods provide detailed informa- experiments on all subsets of 20 000 genes or thousands
tion on the expression of all genes in a cell [20], [21]; in of compounds. This provides unique opportunities for the
humans, this is an approximately 20 000-D measurement development of computational methods to predict the
for every cell. In addition, other sequencing methods, effect of unseen perturbations and thus virtually screen
such as ATAC-seq and ChIP-seq measure which regions of perturbations and identify interesting ones without having
the genome, are accessible and what regulatory factors to fully explore the space experimentally.
are decorating the genome at these locations [22], [23]. In this article, we will discuss three important prob-
Furthermore, chromosome conformation capture methods lems for single-cell biology identified above, which provide
provide detailed information on which sequences of the unique opportunities for the development of computa-
DNA come together in the 3-D space [24], [25]. While tional methods to overcome experimental limitations and
these experimental techniques provide high-throughput provide important biological insights.
and high-dimensional measurements, an important cur- 1) How can we integrate and translate between differ-
rent limitation is that acquiring such data requires fixing ent data modalities, such as imaging or sequencing,
a cell to either purify the DNA/RNA for various sequencing which cannot be measured in the same cell?
measurements or stain a cell for imaging. These prepara- 2) How can we generate lineages of single cells during
tion methods mean that it is challenging experimentally cell-state transitions (e.g., between healthy and dis-
to: 1) track a cell over long periods of time to obtain time- eased states) forward and backward in time when
course data and 2) obtain paired (i.e., in the same cell) only having access to snapshots of different popula-
measurements of different data modalities such as imaging tions in time?
and sequencing. However, studying and understanding 3) How can we predict the effect that one perturbation
the structure–function relationship in single cells as well has in a new cell type or that a new perturbation has
as how cell states are altered during disease onset and in a given cell type?
progression requires temporal data that measure different Although these three computational challenges may
data modalities simultaneously. These experimental limi- seem unrelated at first, we propose to view them under
tations are a unique opportunity for the development of a unifying framework; namely, we propose to view these
computational methods that could help provide important important problems arising in single-cell biology as trans-
biological insights. fer/translation problems as follows (see also Fig. 2):
In addition to the recent explosion of single-cell imaging
1) transferring between different data modalities;
and sequencing data, another revolution has taken place in
2) transferring between different time points;
biology that provides a unique opportunity for the devel-
3) transferring between perturbation-cell-type pairs.
opment of computational methods. Unlike many other
fields, biology has access to precise and large-scale per- Here, we present emerging computational approaches that
turbational tools, such as CRISPR-based technologies and can overcome some of these experimental limitations, illus-
small molecule chemical screens, that make it possible to trated through examples derived from our own work. We
experimentally manipulate and probe the role of different will in particular concentrate on three areas in machine
variables in a cell. For example, CRISPR-Cas methods allow learning that have seen a lot of developments in recent
editing the DNA sequence and deleting a single gene or years, namely, generative modeling, optimal transport, and
even subsets of genes and measure how the expression of causal inference, where some of our own works have
the other genes changes [26]. In addition to genetic pertur- contributed methodology as well as examples of how these
bations, small molecules can be used to perturb cells and methods could be applied to gain important biological
probe the role of particular genes and their function [27]. insights. We will first provide a brief introduction to these
Such genetic and chemical perturbation experiments can three areas in machine learning and then discuss each of
be performed at single-cell level in high throughput com- the three transfer problems separately, showing how gen-
bined with sequencing or imaging assays. For example, erative modeling, optimal transport, and causal inference
perturb-seq provides single-cell RNA-seq data together could be used to overcome the above described experimen-
with information on which genes were deleted in every cell tal limitations of current technologies. We end by outlining
[28], [29], whereas cell painting provides single-cell open problems and future computational challenges in the
images together with information on which genes were field.
deleted or overexpressed or which pharmacological
inhibitors were applied [30], [31]. Collectively, these II. M A C H I N E L E A R N I N G A P P R O A C H E S
experimental methods provide unique opportunities to FOR SINGLE-CELL BIOLOGY
understand regulatory modules in cellular function and A. Generative Models
identify novel therapeutic interventions to revert dis- Machine learning has seen rapid progress in the last
eased cell states back to the normal state. Although such decade [32], [33], with deep learning algorithms sur-
perturbational experiments can be performed in high passing human-level performance in many tasks, includ-
throughput, the main bottleneck is the huge space of ing computer vision, natural language processing, online
possible perturbations. It will never be possible to perform advertisement, and recommender systems, to name a

Vol. 110, No. 5, May 2022 | P ROCEEDINGS OF THE IEEE 559

Authorized licensed use limited to: CATHOLIC UNIVERSITY OF KOREA. Downloaded on April 15,2023 at 03:14:20 UTC from IEEE Xplore. Restrictions apply.
Uhler and Shivashankar: Machine Learning Approaches to Single-Cell Data Integration and Translation

Fig. 2. Data integration and translation challenges in single-cell biology. (a) Translating between different data modalities such as imaging
and sequencing. (b) Translating between different time points such as snapshots taken of different populations of cells during disease
progression. (c) Transporting the effect of perturbations between different cell types/diseases. (d) Predicting the effect of new perturbations
based on other perturbations.

few [34]. This progress has been achieved by directly and interpreted. Many successful approaches for unsu-
optimizing predictive performance with few, if any, mod- pervised representation learning are based on generative
eling assumptions. While being extremely successful in models [37], which estimates the joint distribution of all
many applications, this approach relies on the availability variables and can also be used to generate new examples
of large curated datasets. However, in certain applica- from this joint distribution (see Fig. 3(a) for a compar-
tions, labeled data can be expensive to collect or even ison of discriminative versus generative models). In the
unavailable because it is not clear a priori what to anno- following, we will review three prominent examples of
tate. This is, for example, the case in many problems generative models [see Fig. 3(b)], namely, autoencoders,
in biology, where we want to develop approaches that generative adversarial networks (GANs), and flow-based
help us identify the unknown underlying biological mecha- models, which will build the basis for the development of
nisms. Therefore, traditional supervised machine learning, methods for translating between different data modalities.
such as methods for discriminative modeling, e.g., neural
networks, random forests, support vector machines, and 1) Autoencoders: They are a family of neural network
logistic regression, which learn a conditional distribution models that consist of two parts: an encoder that maps
of a class label given the features to perform classification, the data into a latent space and a decoder that maps
are not sufficient. back to the data space [36], [38]. Classically, autoencoder
Representation learning is recognized as a key driver architectures contain a bottleneck, i.e., a layer with fewer
for practical success in such applications, as it allows units than the input. The neural network is trained to
for extracting latent structures that capture the intrin- minimize reconstruction error and thus aims to learn (in
sic behavior without labeled data [35], [36]. In biology, the bottleneck) a compressed representation of the high-
such latent variables correspond to fundamental biologi- dimensional data. It can be seen as a nonlinear version
cal processes, and the goal of representation learning is of principal component analysis (PCA). More precisely, let
to map biological measurements to latent embeddings, PX on Rp denote the data distribution. Let d < p and
where downstream analyses can be more easily performed E : Rp → Rd denote the encoder and D : Rd → Rp the

560 P ROCEEDINGS OF THE IEEE | Vol. 110, No. 5, May 2022

Authorized licensed use limited to: CATHOLIC UNIVERSITY OF KOREA. Downloaded on April 15,2023 at 03:14:20 UTC from IEEE Xplore. Restrictions apply.
Uhler and Shivashankar: Machine Learning Approaches to Single-Cell Data Integration and Translation

Fig. 3. Machine learning approaches for single-cell biology. (a) Distinction between discriminative and generative modeling. (b) Different
classes of generative models. (c) Optimal transport to map between different distributions.

decoder; both maps are represented by neural networks. would correspond to two observed cell states, where the
Then, the weights of these neural networks are trained to “cabbit” would correspond to a cell state that is unobserved
minimize a reconstruction loss in the dataset and lies between the two observed cell
states. To obtain better generative properties, variational
min Ex∼PX [L(x, D(E(x)))] autoencoders have been introduced [39]. In these autoen-
E,D coders, both the encoder and the decoder are replaced by
stochastic maps PZ|X and PX|Z . PZ|X is standardly chosen
where the loss L(·) is often taken to be the squared to be an isotropic Gaussian for computational reasons; as
2 -norm. a consequence, the distribution in the latent space has
Denoting by E# the pushforward operator, i.e., continuous support, thereby allowing random sampling
E#PX is the data distribution pushed through the and interpolating between examples.
encoder, the autoencoder not only provides an informative Motivated by the fact that humans can easily recognize
low-dimensional representation of the data distribution an object even under substantial corruption or occlusion,
(namely, E#PX ) but also allows for generating new data denoising autoencoders were introduced [40], where M :
points by sampling z ∈ Rd and mapping it through the Rp → Rp defines a stochastic map that adds noise to a
decoder to obtain D(z) ∈ Rp . This is an important feature sample, and the autoencoder is trained by optimizing
for probing and understanding the structure of the latent
space. For example, if in the latent space, we identify two
min Ex∼PX [L(x, D(E(M (x))))].
clusters with mean μ1 and μ2 in Rd (e.g., corresponding E,D

to healthy and diseased), then one can analyze how a


data point x from cluster 1 would look if it were to move Such a loss function encourages the autoencoder to learn
toward cluster 2 by comparing x with D(E(x)+λ(μ2 −μ1 )) a map that is robust and can “repair” corrupted samples.
for a small λ > 0. This is represented by the cartoon Building on this idea, contractive autoencoders were intro-
in Fig. 3(a), where taking a step from the cat cluster to duced that encourage the autoencoder to learn a map
the rabbit cluster results in a mixture of a rabbit and a that is contractive at the training examples by adding a
cat, e.g., a “cabbit,” by introducing critical rabbit features term to the loss function that penalizes sensitivity to the
into the image. In the single-cell context, cats and rabbits input [41]. Such sensitivity can be measured by JE (x)2F ,

Vol. 110, No. 5, May 2022 | P ROCEEDINGS OF THE IEEE 561

Authorized licensed use limited to: CATHOLIC UNIVERSITY OF KOREA. Downloaded on April 15,2023 at 03:14:20 UTC from IEEE Xplore. Restrictions apply.
Uhler and Shivashankar: Machine Learning Approaches to Single-Cell Data Integration and Translation

the squared Frobenius norm of the Jacobian matrix of noise distribution on Rd (often taken to be the uniform
the encoder with respect to the input, resulting in the distribution on [0, 1]d ) with d < p; G : Rd → Rp generates
following loss function: fake samples and C : Rp → [0, 1] critiques the sample and
outputs a probability of it belonging to the real dataset. As
min Ex∼PX [L(x, D(E(x))) + λJE (x)2F ] before, both maps are represented by neural networks and
E,D the weights are optimized to minimize the following loss
function:
where λ > 0 can be tuned to balance reconstruction versus
contractivity properties of the learned map.
min max Ex∼PX [log(C(x))] + Ez∼PZ [log(1 − C(G(z)))].
Note that if d ≥ p, then the autoencoder has sufficient G C

parameters to learn the identity map (denoted id), in


which case E ≈ D ≈ id and, thus, the autoencoder would Although GANs have been highly successful in gener-
be useless, since the learned latent representation would ating realistic images, language, and music, their training
be the same as the input representation. Such a phenom- can be slow and unstable. One reason for this is that both
enon is considered “overfitting.” When d < p, while the PX and G#PZ are supported on a low-dimensional man-
reconstruction loss encourages the autoencoder to learn ifold (for PX because real images have been observed to
a map similar to the identity, the bottleneck prevents concentrate on a low-dimensional manifold and for G#PZ
this from happening. The main motivation for introducing because d < p) and thus are disjoint almost surely, mean-
denoising as well as contractive autoencoders was to fur- ing that a perfect discriminator can be found [46]. This
ther constrain the learned map and avoid overfitting. Moti- leads to vanishing gradients and, as a consequence, slow
vated by research in the classification setting showing that convergence. Another common issue is mode collapse,
neural networks can generalize well despite being over- where the generator may be able to fool the discriminator
parameterized [42], [43], we considered autoencoders by outputting very similar images that do not represent the
with d ≥ p. Interestingly, we showed that while such full data distribution.
overparameterized autoencoders have the capacity to learn Interestingly, approaches to overcome these issues are
the identity map, they do not “overfit” [44]. To be more connected to optimal transport, a topic that we will discuss
precise, we showed that while there are many interpolating in more detail in the following. In particular, one can easily
solutions, i.e., pairs of functions (E, D) that achieve zero show that for an optimal discriminator C ∗ , the GAN-loss
training error provided above corresponds to the Jensen–Shannon diver-
gence between PX and G#PZ (the distribution PZ pushed
through the generator). If these distributions are disjoint,
min Ex∼PX [L(x, D(E(x)))] = 0
E,D then the Jensen–Shannon divergence is not differentiable.
Wasserstein-GANs [47] have been introduced to overcome
the identity map being one of them, an autoencoder this issue by replacing the Jensen–Shannon divergence by
trained using gradient descent learns a map that is con- the Wasserstein distance (also known as the Kantorovich–
tractive at the training examples without the need of any Rubinstein metric)
additional regularizers. Thus, overparameterized autoen-
coders are self-regularizing; in fact, overparameterization W (P, Q) := inf E(x,y)∼R [x − y] (1)
has a similar effect as the added regularizer in contractive R∼Π(P,Q)

autoencoders. In practice, we have observed that overpa-


rameterized autoencoders are often much easier to train where Π(P, Q) denotes the set of all couplings of P and Q,
and more stable, making them an interesting alternative i.e., joint distributions with marginals P and Q. W (P, Q)
to the standardly used bottlenecked autoencoders. Such is also known as the Earth mover’s distance and was
autoencoders will be used in the biological applications introduced by Kantorovich in the setting of optimal trans-
discussed in the following. port [48] (see also the following); it can be understood
as moving a pile of mud (or a probability distribution)
2) Generative Adversarial Networks: They are a family P into a pile of mud (or a probability distribution) Q
of neural network models that consist of two parts: a by minimizing the transport. Since it is computationally
generator that outputs synthetic samples from noise and intractable to compute the infimum over joint distrib-
a discriminator that estimates the probability of a given utions Π(P, Q), Wasserstein-GANs instead optimize the
sample coming from the real data distribution [45]. Opti- Kantorovich–Rubinstein dual formulation
mizing both leads to a minimax problem, a competition
where the discriminator works as a critic that is optimized W (P, Q) = sup Ex∼P [f (x)] − Ex∼Q [f (x)] (2)
to distinguish between the real and fake samples, while f :f L ≤1

the generator tries to fool the discriminator to believe that


the generated images are real. More precisely, let PX on over all 1-Lipshitz functions. Computationally, the Lipshitz
Rp denote the data distribution as before and PZ be a constraint is often enforced by clipping the weights of

562 P ROCEEDINGS OF THE IEEE | Vol. 110, No. 5, May 2022

Authorized licensed use limited to: CATHOLIC UNIVERSITY OF KOREA. Downloaded on April 15,2023 at 03:14:20 UTC from IEEE Xplore. Restrictions apply.
Uhler and Shivashankar: Machine Learning Approaches to Single-Cell Data Integration and Translation

the neural network [47] or using gradient penalties [49]. from it as in generative models, but couple two different
Wasserstein-GANs often show improved stability and less distributions, such as the distributions at two time points,
issues with mode collapse. e.g., healthy and diseased cell states. Recently, an old
concept has been rediscovered for this purpose. Optimal
3) Flow-Based Models: They are a more recent class
transport is a framework for comparing two distributions
of generative models that explicitly learn the data dis-
by finding a way to “push” or redistribute one distribution
tribution PX by transforming it through a sequence of
to the other while incurring minimal transport cost [56],
invertible functions to a simple latent distribution, such
see Fig. 3(c). Optimal transport was first developed by
as an isotropic Gaussian [50]. More precisely, a generative
Monge [57], who was tasked by Napoleon with figur-
flow F consists of a sequence of invertible functions F =
ing out how to transport soil between different places
F1 ◦ · · · ◦ F that transform a variable x ∼ PX into a
for building fortifications so as to minimize the required
Gaussian variable. Then, the generative process is defined
labor. Despite being over two centuries old, the utility
by inverting these functions and mapping Gaussian noise
for data science has become apparent only recently with
to a sample from PX , i.e.,
applications spanning computer graphics, computational
linguistics, statistics, and biology [58]. In the following, we
z0 ∼ N (μ, Σ), and zi = Fi−1 (zi−1 ) for all 1 ≤ i ≤  provide various mathematical descriptions of the problem
of optimal transport and review methods for solving this
with x = z . The transformations Fi , i = 1, . . . , , are problem. These will build the basis for the development of
obtained by maximizing the log likelihood, which using the methods for translating between different time points.
change-of-variables formula can be computed to be Given two distributions P and Q on Rp and a transport

log p(x) = log pN (z; μ, Σ) −


 
log det
∂Fi−1
 (3)
cost c : Rp ×Rp → R≥0 , Monge [57] formulated the optimal
transport problem as a search over deterministic transport
maps T : Rp → Rp that satisfy T #P = Q and optimize
i=1
∂zi−1

inf Ex∼P [c(x, T (x))]. (4)


where pN (z; μ, Σ) denotes the Gaussian probability density T

function. It is thus desirable to use transformations Fi


for which the inverse can be easily computed and the While this problem has an intuitive interpretation, it is
Jacobian matrix (∂Fi−1 /∂zi−1 ) has a simple form. Various in general nonconvex and may be infeasible. As above,
such transformations have been proposed [51]–[53], a let Π(P, Q) denote the set of all couplings of P and Q,
prominent one being GLOW, where each transformation i.e., joint distributions with marginals P and Q. Kan-
Fi is parameterized by an activation normalization layer, torovich [48] proposed a convex relaxation of this problem
followed by an invertible 1 × 1 convolution, followed by by formulating the problem as a search over couplings
affine coupling layers. In this case, it can be shown that (also known as transport maps)
the resulting Jacobian matrix is triangular, and thus its
log determinant can easily be computed. Consequently, the W (P, Q) = inf E(x,y)∼R [c(x, y)] (5)
log likelihood in (3) is tractable and can be optimized R∼Π(P,Q)

efficiently.
While flow-based models have been found to be less which generalizes (1). It is interesting to note that in
prone to training instability and mode collapse, the above contrast to the Monge formulation in (4), the Kantorovich
flow architectures cannot scale to microscopy images of formulation in (5) results in a stochastic transport map
standard resolution (e.g., 512 × 512). To enable scaling to given by the conditional distribution defined by Π(P, Q).
such images, in recent work [54], we developed a flow- We discussed above that Wasserstein-GANs make use of
based model building on Haar wavelet image pyramids, the dual formulation to the Kantorovich problem [pro-
which allows generating image features at different spatial vided in (2)]. For discrete probability distributions, the
resolutions separately. While separate generation of image Kantorovich relaxation in (5) is a linear program, which
features at different resolutions has been demonstrated can be simplified by introducing an entropic regulariza-
using GANs [55], this was an open problem for flow- tion term and considering the dual problem. In fact, this
based models but is important in order to disentangle dual optimization problem can be solved efficiently using
coarse features (such as the location of a cell in an image) the Sinkhorn algorithm [59]. Building on the entropy-
from fine features (such as the subcellular localization regularized dual problem of (5), stochastic algorithms
of proteins) and at the same time allows scaling to high were developed that could also handle continuous mea-
resolutions. sures [60], [61].
In Section IV, we will discuss how to use optimal
B. Optimal Transport transport formulations to transfer single-cell data between
In some applications, we may not only want to model different time points. While classical optimal transport as
a particular joint distribution and may be able to sample described above does not allow for mass variation, this

Vol. 110, No. 5, May 2022 | P ROCEEDINGS OF THE IEEE 563

Authorized licensed use limited to: CATHOLIC UNIVERSITY OF KOREA. Downloaded on April 15,2023 at 03:14:20 UTC from IEEE Xplore. Restrictions apply.
Uhler and Shivashankar: Machine Learning Approaches to Single-Cell Data Integration and Translation

is a highly restrictive assumption in biological systems the 1920s. Neyman [67] established a formal distinction
since cells can divide and/or die. In fact, understanding between ordinary random variables and random variables
which subpopulations become more or less prevalent in the under randomization via the potential outcome notation.
target distribution is often the main goal of the analysis. Wright [68], [69] pioneered the use of graphs to represent
Unbalanced optimal transport considers generalizations the cause–effect relationships using structural equation
of the theory of optimal transport that allow for mass models. However, skepticism among statisticians resulted
variation [62]. In particular, the Kantorovich formulation in the causal interpretation of structural equation models
in (5) was extended to the unbalanced setting, and similar being overlooked and almost forgotten (see [70] for a
to the classical setting, the dual of the entropy-regularized historical account). The reemergence of causal inference
Kantorovich formulation was considered [63]. Based on in statistics began in the 1970s driven by major contri-
this dual formulation, generalizations of the Sinkhorn butions by Pearl [71], Robins [72], Rubin [73], [74], and
algorithm were developed for solving unbalanced optimal Spirtes et al. [75].
transport in the discrete setting [64], [65]. In recent work, Since in many fields, it has been unethical, too expen-
we provided a method based on GANs for solving the sive, or even impossible to perform large-scale interven-
unbalanced optimal transport problem that could also han- tional studies, research traditionally concentrated on the
dle continuous distributions. In particular, instead of con- purely observational setting. The development of genome
sidering the Kantorovich formulation in (5), we extended editing technologies [26], [76] and the explosion of
the Monge formulation in (4) to the unbalanced setting interventional data from genetic or chemical perturbation
by explicitly modeling mass variation and allowing for the experiments in biology [28], [31], [77] call for a theoret-
transport map to be stochastic [66]. Allowing stochastic ical and algorithmic framework for causal inference from
transport maps is natural in biological applications since a mix of observational and interventional data. In the fol-
a cell in a source population may potentially give rise to lowing, we will first discuss the problem of causal structure
different cells in a target population. To allow for mass discovery in this setting, i.e., how to learn a causal graph
variation, let ξ : Rp → R≥0 denote a scaling factor, and from observational and interventional data on the nodes.
to allow for stochasticity, let R denote a distribution on Rp Without any constraints on the causal graph, interventions
(e.g., the Gaussian distribution). Then, we consider on every node are needed to fully identify the graph. In
most applications, such complete interventional data are
inf Ex∼P,z∼R[c1 (x, T (x, z))ξ(x)] + λEx∼P [c2 (ξ(x))] (6) not available, but one has access to partial interventions
T,ξ in different related cell types or states (i.e., domains). It
is thus of interest to infer the effect of an intervention
subject to the constraint that T #(ξ#P × R) = Q, where in a particular domain from other observed intervention-
c1 : Rp × Rp → R≥0 denotes the transport cost and c2 : domain pairs. We will subsequently review this causal
Rp → R≥0 the scaling cost and λ ≥ 0 balances transport imputation problem.
cost versus mass variation cost. To provide more intuition
for the first term in (6), consider the deterministic setting 1) Causal Structure Discovery: It is the problem of learn-
where we remove z ∼ R; then, the term simplifies to ing a causal graph from data and has been the focus
Ex∼P [c1 (x, T (x))ξ(x)], which is the cost of transporting x of much recent work in causality [78], [79]. We repre-
to T (x) scaled by ξ(x), which describes how much more or sent a causal network by a directed graph G = (V, E)
less prevalent x becomes in the target distribution. Since consisting of vertices V = {1, . . . , p} and directed edges
the equality constraint T #(ξ#P ×R) = Q is challenging to E representing direct causal relationships. We make the
satisfy from a computational perspective, it can be replaced common assumption that G is a directed acyclic graph
with a divergence penalty. The problem can then be solved (DAG), meaning that there are no directed cycles i0 →
using stochastic gradient descent and was applied to lin- i1 → · · · → im → i0 , since causal effects only act forward
eage tracing of cells during Zebrafish embryogenesis based in time. In a structural equation model [68], [69], each
on single-cell RNA-seq data [66]. node i ∈ V is associated with a random variable Xi and is a
deterministic function of its parents, denoted by pa(i), and
independent noise i . For example, a structural equation
C. Causal Inference From Interventional Data
model on the four-node DAG 1 → 2, 2 → 3, 3 → 4, and
Unlike classical machine learning applications, such as 1 → 4 is given by
image classification or online advertising in which the
goal is to optimize predictive performance, in biology,
X1 ← f1 (1 ), X2 ← f2 (X1 , 2 ), X3 ← f3 (X2 , 3 )
there are natural laws to be discovered, phenomena are
physically interpretable, and predictive accuracy is often X4 ← f4 (X1 , X3 , 4 ).
not sufficient, but identifying the underlying regulatory
or causal mechanisms is the ultimate goal. The founda- A structural equation model not only encodes the
tions of a principled statistical framework of causality observational distribution, i.e., the distribution of X =
were laid by two crucial advances made independently in (X1 , . . . , Xp ), but also the interventional distributions.

564 P ROCEEDINGS OF THE IEEE | Vol. 110, No. 5, May 2022

Authorized licensed use limited to: CATHOLIC UNIVERSITY OF KOREA. Downloaded on April 15,2023 at 03:14:20 UTC from IEEE Xplore. Restrictions apply.
Uhler and Shivashankar: Machine Learning Approaches to Single-Cell Data Integration and Translation

A theoretical framework for modeling different types optimizes a penalized likelihood score over causal models.
of interventions was developed in [80]: A hard (also More recently, we showed that a greedy search over a
known as perfect, ideal, or deterministic) intervention smaller search space, namely, the space of perturbations
assumes that all causal dependencies between intervened instead of the space of graphs, could also be used for
targets and their causes are removed (as in Pearl’s do- consistent causal structure discovery [86], [87]. Given the
calculus [71]), an example being a perfectly performed surge of interventional datasets in various fields including
gene knockout experiment, where the expression of a gene genomics, advertising, and online education, causal struc-
is set to zero, and hence, all interactions between this gene ture discovery algorithms have been developed, which can
and its upstream regulators are eliminated. In practice, make use of a mix of observational and interventional data.
interventions often only modify the causal dependencies The first such approach was an interventional adaptation
between a targeted node and its causes without fully of GES [83]. Unfortunately, this approach is in general not
removing them. An example of such a soft intervention is a consistent, i.e., it generally does not output the correct
gene knockdown experiment. In both cases, it is important causal model even with infinite samples [88]. Interestingly,
to note that intervening on a node is generally not the same greedy search over permutations can easily be extended
as conditioning on it; for example, consider the two-node to the interventional setting to give rise to a consistent
DAG X1 → X2 , and then, intervening on X2 does not algorithm when hard or soft interventions are available
change the distribution of X1 , while conditioning on X2 [84], [88]. In fact, we showed recently that such an
might, i.e., in general P (X1 ) = P (X1 | X2 ), see [71] for approach can also learn the intervention targets simulta-
an introduction. neously to learning the causal model [89]. This is critical
A structural equation model provides a factorization of for applications to learning gene regulatory networks from
the joint distribution, which implies certain conditional CRISPR perturbations, which are known to have off-target
independence (CI) relations through the Markov property, effects.
namely, Xi ⊥ ⊥ Xnd(i) | Xpa(i) , where nd(i) denotes the Given the large space of possible perturbations in
nondescendents and pa(i) denotes the parents of node i genomics (e.g., humans have 20 000 genes and just per-
(see, e.g., [81] for an introduction to graphical models). forming all combinations of knockouts on three genes
A standard approach for causal structure discovery is to would mean 20 0003 experiments), an interesting line of
infer CI relations from the sample distribution and then research going forward is to identify the most informative
infer the causal DAG from these relations. However, in perturbations and develop active learning strategies for
general, the causal DAG is not identifiable since multiple causal structure discovery. Most existing works in exper-
DAGs can encode the same set of CI relations [82]. Inter- imental design for causal DAGs are focused on identify-
ventional data can increase identifiability in a causal graph. ing the causal DAG uniquely while minimizing some cost
For example, the edge direction in the two-node DAG associated with doing experiments [90]–[93]. However, in
X1 → X2 is not identifiable from purely observational data many applications, the budget is fixed and the goal is to
but can, for example, be identified from an intervention on choose the best interventions within the budget constraints
node X1 , since under the true model, the distribution of in an active fashion. Toward this, two recent approaches
X2 will change under such an intervention, while if the have been proposed, which greedily optimize different
DAG was X2 → X1 , then the distribution of X2 would not objective functions, namely, mutual information [94] or
change by intervening on X1 . A characterization of causal the number of oriented edges in the causal graph [95].
edges that are identifiable from interventional data was While it is now possible to perturb multiple genes at once,
first presented in [81] for hard interventions [83], and we both approaches only consider single-node interventions.
subsequently showed that the same results also hold for In general, a set of q -node interventions may orient up
soft interventions [84]. This has interesting consequences to q -times more edges in a DAG than single-node inter-
for the design of perturbations in genomics. Namely, this ventions, but allowing for multinode interventions leads
means that the more invasive hard interventions such to an exponentially larger search space. Very recently,
as gene knockout experiments generally do not provide novel submodularity arguments were established to deal
more information about the underlying causal regulatory with such doubly exponential search domains and obtain
network than soft interventions such as gene knockdowns. greedy algorithms with approximation guarantees also
Since the overwhelming majority of available data so in the multinode intervention setting [96]. Most works
far has been observational, much research in causality remain to be done to make such intervention design algo-
has concentrated on this setting. A standard approach rithms scalable to thousands of nodes for applications to
to causal structure discovery is constraint-based, i.e., to genomics.
treat causal inference as a constraint satisfaction prob-
lem, with the constraints being the CI relations inferred 2) Causal Imputation: In large-scale genetic and chemi-
from the data, a prominent example being the PC algo- cal screens such as the CMAP dataset [97], there are over
rithm [75]. A different approach that often performs better 40 000 different perturbations and over 70 different cell
in practice is score-based, a prominent example being types, leading to ∼3 million possible pairs, only ∼200 000
greedy equivalence search (GES) [85], which greedily of which have been measured. When the effect of each

Vol. 110, No. 5, May 2022 | P ROCEEDINGS OF THE IEEE 565

Authorized licensed use limited to: CATHOLIC UNIVERSITY OF KOREA. Downloaded on April 15,2023 at 03:14:20 UTC from IEEE Xplore. Restrictions apply.
Uhler and Shivashankar: Machine Learning Approaches to Single-Cell Data Integration and Translation

perturbation on each cell type completely unrelated, then objective would replace all missing entries with a fixed
every single pair would need to be measured. However, in constant [106]. Thus, there is a need for more general
many applications, it is reasonable to assume underlying approaches to matrix completion that can easily adapt to
structure to the data matrix so that we may be able to the structures in different applications, a recent approach
predict the effect of an intervention in a domain where being to encode the relationships between coordinates
it has not been observed. This leads to a problem that of the matrix, akin to semisupervised learning [106].
we label causal imputation—predicting the effect of per- Since matrix completion is not inherently causal, an
turbations across different “domains” (where a “domain” interesting problem with some initial work [107] is to
refers to an experimental condition, such as a cell type or a understand for what causal models such an imputa-
disease state), given a partial dataset describing the effects tion procedure works and, more generally, to connect
of some perturbations on some domains. This encompasses the synthetic interventions approach to work on causal
the traditional causal transportability problem [98], where transportability.
the goal is to predict the effect of a particular intervention
in a target domain, given the effect of that intervention III. M U L T I D O M A I N D A T A
in one or more source domains. Interestingly, necessary I N T E G R AT I O N A N D T R A N S L AT I O N
and sufficient conditions for causal transportability of an USING AUTOENCODERS
effect between different domains have been provided, Integrating different data modalities taken from a pop-
given that the underlying causal graph as well as the ulation of cells is critical in order to understand single-
targets of the intervention are known [98]. Unfortunately, cell heterogeneity and its function [4]. To appreciate the
in biological applications, the underlying causal graph is difficulty of this problem, note that batch correction (or
generally unknown, and thus, causal imputation methods more generally domain adaptation) is a special case, where
need to be developed, which do not require such complete the same data modality is measured in different exper-
background knowledge. iments. Recently, various methods have been developed
One approach is to make use of the causal struc- for integrating data from the same modality (e.g., for
ture discovery methods described above and then apply batch correction or domain adaptation) [108]–[112] or
results on causal transportability. However, learning the from data modalities with similar data structures such
underlying causal graph requires huge sample sizes [99], as sequencing-based datasets [113]–[116]. These meth-
while knowing the full causal graph may not even be ods range from methods for image-to-image translation
required for causal imputation. The causal imputation using GANs, to the use of canonical correlation analysis to
problem is also closely related to techniques for estimat- identify relationships between single cells from different
ing counterfactuals. Synthetic control [100] that is an datasets based on the same set of genes, to methods
especially popular approach from the policy evaluation based on autoencoders to map to a shared latent space
literature, which estimates the counterfactual trajectory (described in detail next). These methods allow relating
of an intervened entity, had the intervention not taken gene expression (measured using RNA-seq [20], [21])
place. The key idea behind this approach is to express the to DNA accessibility (measured using ATAC-seq [23]) as
counterfactual trajectory for the target entity as a linear well as modifications of the DNA (measured using ChIP-
combination of the observed trajectories of entities that seq [22]). While such datasets are often unpaired, i.e.,
did not receive the intervention; the linear coefficients can obtained in different cells, experimental developments in
be picked via regression on preinterventional data, time- recent years have made paired sequencing measurements
independent covariates, or both. The recent framework of possible [117], [118], thereby allowing us to validate the
synthetic interventions [101] extends the synthetic control computational methods.
approach to predict the counterfactual of any outcome Single-cell imaging modalities are often cheaper to
(e.g., the outcome under any intervention or control), by obtain than sequencing modalities and can provide very
viewing the problem as a matrix completion task (e.g., complimentary information on the architecture of cells.
with rows corresponding to interventions and columns to While for the integration of sequencing modalities, such as
domains). Matrix completion is a fundamental problem in RNA-seq and ATAC-seq, genes represent a common coor-
machine learning, a prominent application being the Net- dinate system, the integration of imaging and sequencing
flix challenge of inferring movie preferences from sparsely data is a significantly more difficult task since there is no
populated matrices of user ratings. Standard approaches direct approach to map image pixels to genes and vice
to matrix completion, such as nuclear norm minimiza- versa. In addition, imaging and sequencing datasets are
tion [102]–[104] or deep matrix factorization [105], aim usually unpaired, and thus, supervised approaches cannot
for a completion that yields a low-rank matrix. While such be applied. Importantly, a successful approach would also
methods can be effective in applications such as the Netflix combine the benefits of prior approaches that provide a
challenge, where low rank can capture user similarity, joint embedding of the different datasets (as is the case
such an objective function can lead to ineffective solu- for canonical correlation analysis [108], [113]) in order to
tions for applications, including drug response imputation, be able to perform downstream analysis across different
where the imputation for a new drug using a low-rank data modalities and be generative (as is the case for

566 P ROCEEDINGS OF THE IEEE | Vol. 110, No. 5, May 2022

Authorized licensed use limited to: CATHOLIC UNIVERSITY OF KOREA. Downloaded on April 15,2023 at 03:14:20 UTC from IEEE Xplore. Restrictions apply.
Uhler and Shivashankar: Machine Learning Approaches to Single-Cell Data Integration and Translation

Fig. 4. Multidomain data integration and translation using autoencoders. (a) Schematic showing how different data modalities are mapped
to an integrated latent space via autoencoders that are specialized to each data modality. (b) Translation between different data modalities
is achieved using the encoder from one modality and the decoder from another modality. (c) Autoencoder framework applied to single-cell
images and RNA-seq data in T-cells. (d) Autoencoder framework applied to the CelebA dataset to translate black-haired women into
blond-/brown-/black-haired women and black-/blond-haired men demonstrating the generality of this method. Figures adapted from
[119] and [120].

GANs [112]) in order to be able to translate from one the case with just two data modalities)
modality to another and generate the missing modality of
a cell. min Ex∼PX1 [L1 (x, D1 (E1 (x)))+λL2 (E1 #PX1 , E2 #PX2 )]
In recent work, we presented a computational E1 ,D1

framework using autoencoders for integrating and trans- min Ex∼PX2 [L1 (x, D2 (E2 (x)))+λL2 (E1 #PX1 , E2 #PX2 )]
E2 ,D2
lating between different data modalities with very dis-
tinct structures and without the need for paired datasets where the first loss is a standard reconstruction loss and
[119], [120]. In these studies, each data distribution is the second loss ensures that the distributions from the
mapped to a common latent distribution using an autoen- different data modalities are matched in the latent space.
coder [see Fig. 4(a)]. To properly align the embeddings of This can, for example, be done using the discriminative
each data modality, we matched the distributions obtained loss
by the different autoencoders in the latent space using
adversarial training [121]. To be more precise, let Ei and
L2 (P, Q) := max Ex∼P [log(f (x))] + Ex∼Q [log(1 − f (x))]
Di denote the encoder and decoder of the autoencoder f

that maps modality i into the shared latent space, respec-


tively, PXi denote the distribution given by each dataset, which tries to identify the best possible classifier f for
and Ei #PXi denote the distribution obtained after embed- distinguishing samples that come from P versus Q. Min-
ding the data from modality i into the latent space. The imizing the discriminative loss means identifying a joint
autoencoders are trained by minimizing a weighted sum embedding where even the best classifier cannot distin-
of two losses (to keep notation simple, we discuss this for guish between samples from the two data modalities,

Vol. 110, No. 5, May 2022 | P ROCEEDINGS OF THE IEEE 567

Authorized licensed use limited to: CATHOLIC UNIVERSITY OF KOREA. Downloaded on April 15,2023 at 03:14:20 UTC from IEEE Xplore. Restrictions apply.
Uhler and Shivashankar: Machine Learning Approaches to Single-Cell Data Integration and Translation

thereby ensuring that the two modalities are matched in the T-cell gene expression programs are altered. Recent
the latent space. studies have shown that such activation processes are
Importantly, such an autoencoder framework allows also accompanied by changes in DNA packing to facili-
integrating unpaired data modalities of very different tate large-scale gene expression changes [125]. In par-
structures using specialized neural network architectures ticular, these studies identified that naive T-cells exhibit
for each modality (e.g., convolutional neural networks for two distinct DNA packing patterns (one with more cen-
images, fully connected networks for sequencing data, and trally condensed DNA and the other with more periph-
graph neural networks for Hi-C data), see Fig. 4(b). If erally condensed DNA) in about equal proportions, while
paired data are available, a loss can be added to encourage upon activation, one of the patterns becomes highly dom-
the embeddings of the paired samples to be close in the inant (the one with more centrally condensed DNA).
latent space. In addition, if the information on clusters Experiments also showed that the two subsets of naive
(e.g., different cell types) or markers (e.g., a differentiation T-cells are functionally distinct; namely, cells with centrally
marker) is available in each modality, the discriminative condensed DNA were more mechanically pliable and more
loss can be conditioned on these factors in order to match prone for activation [125]. We hypothesized that the differ-
clusters and align markers to obtain a better latent space ential DNA patterns would also correspond to distinct gene
embedding. The joint latent space enables performing expression patterns. Indeed, when analyzing single-cell
downstream analysis effectively across all the different RNA-seq data of naive T-cells, we identified two distinct
modalities at once. subclusters, one being closer to the gene expression profile
In addition, using the encoder and decoder modules of activated T-cells [119]. These separate imaging and
of different autoencoders allows for an efficient transla- sequencing experiments provided an interesting applica-
tion between different data modalities that cannot yet be tion for translating between very distinct data modalities
measured together in the same cell [see Fig. 4(b)]. For in order to connect structure to function. Fig. 2 shows the
example, translating between imaging data (containing translation between these data modalities. In particular,
information on DNA packing) and sequencing data can this allowed us to identify marker genes that are specific
provide unique insights into structure–function links that to the particular DNA packing patterns. Such multidomain
cannot be obtained experimentally [see Fig. 4(c) and data integration methods provide powerful and compre-
Case Study 1]. We also illustrate this via an application hensive insights into the complex T-cell activation process.
of this autoencoder framework to the CelebA dataset in
Fig. 4(d), where the different “data modalities” correspond IV. L I N E A G E T R A C I N G F R O M
to actors (male or female) with different hair colors, and SNAPSHOTS IN TIME USING
we translate from black-haired female actors (real images OPTIMAL TRANSPORT
on the left) to blond-/brown-/black-haired female actors Cell-state transitions are a major hallmark of development
and black-/blond-haired male actors (all images on the and disease [126]. To better understand these processes,
right are generated). it is critical to be able to trace single cells within het-
The ability to perform multidomain data translation erogeneous populations as they undergo cell-state tran-
also provides a unique opportunity for single-cell biolog- sitions. However, with current experimental methods, it
ical applications where obtaining data is easier in one is often not feasible to perform time-course single-cell
data modality than the other data modality. For example, experiments over time scales relevant to development or
alterations in DNA packing are a hallmark of many dis- disease progression. Developing computational methods
eases, including aging, cancer, fibrosis, and neurodegener- to model single-cell trajectories has therefore become
ation [122]; we and others have shown that morphometric essential. This problem, also known as lineage tracing or
features of DNA packing can be identified and used as pow- pseudotemporal ordering, requires the identification of all
erful disease diagnostic measures [123]. However, many ancestors and descendants of a given cell. Current exper-
important applications, including the identification of new imental methods can provide high-throughput and high-
therapeutic targets, require linking these morphometric dimensional observations of single-cell states, either using
features to functional features. Given that the imaging imaging or sequencing methods, in different populations
space is often much easier to explore experimentally, mul- of cells at different time points of development or disease
tidomain data translation methods provide unique oppor- progression. In this context, a computational framework
tunities to link the imaging and sequencing modalities to to infer lineages of cells from snapshot datasets of the cell
trace disease progression and identify novel therapeutic population at different time points is critical to understand
avenues. the functional implications of cell-state transitions.
Case Study 1 (Translating Between Single-Cell Images Recent research has explored a number of strategies
and RNA-seq During T-Cell Activation): T-cell activation to infer pseudo-lineages from snapshots in time based on
is a fundamental biological process involved in eliciting single-cell transcriptomic measurements; methods include
immune response [124]. A critical step in the activation Monocle [127], [128], SLICE [129], Waterfall [130],
process is that the T-cells in our blood start proliferat- TSCAN [131], SCUBA [132], Wanderlust [133], Wish-
ing upon recognizing infection signals. To achieve this, bone [134], PAGA [135], PBA [136], and others. To

568 P ROCEEDINGS OF THE IEEE | Vol. 110, No. 5, May 2022

Authorized licensed use limited to: CATHOLIC UNIVERSITY OF KOREA. Downloaded on April 15,2023 at 03:14:20 UTC from IEEE Xplore. Restrictions apply.
Uhler and Shivashankar: Machine Learning Approaches to Single-Cell Data Integration and Translation

infer single-cell trajectories, these methods use a combi- the data. This is critical since the inferred transport maps
nation of PCA, independent component analysis (ICA), depend heavily on the choice of the cost function for
or t-distributed stochastic neighbor embedding (t-SNE) matching distributions between time points. More pre-
to first embed the high-dimensional datasets into a low- cisely, our approach, ImageAEOT, proceeds as follows.
dimensional space and then apply various cluster- or First, the single-cell imaging datasets are embedded into a
graph-based approaches together with curve-fitting meth- joint coordinate system using an autoencoder (see Fig. 5(a)
ods to infer the trajectories. The differences in these and Section II-A). Subsequently, optimal transport is used
strategies mainly result from different assumptions about (see Fig. 5(b) and Section II-B) to learn probabilistic cou-
the nature of the underlying biological process, such as a pling between different time points and trace cell lineage
limited number of trajectories or branching points. While trajectories. By decoding the trajectories in the latent space
information on the order (i.e., time) between the different back to the original image space [see Fig. 5(c)], we can
snapshots is often available, these methods do not explic- interrogate the structural cellular features that are chang-
itly use this information for inferring the pseudo-lineages. ing along the predicted lineage trajectories, thereby pro-
A more recent approach that takes time information into viding important information on cell architectural changes
account is Waddington-OT [137], which proposes the use that accompany cell-state transitions.
of optimal transport, after dimension reduction as in the An important assumption in the use of optimal transport
previous methods, to reconstruct cell trajectories from methods is that the time resolution of the data is high
single-cell RNA sequencing experiments taken at differ- enough to capture the critical cell-state transitions and
ent time points during the process of reprogramming. By that cells move along straight lines in the latent space
assuming that the data distributions are discrete, a trans- learned by an autoencoder. If a straight line does not fit
port map (balanced or unbalanced) is learned between the different time points in the latent space, then a curve
the data distributions of consecutive time points using the could be fit and distances along this curve could be used
Sinkhorn algorithm. This approach provides more flexi- for the transport cost. A complementary approach is to use
bility than previous methods in which it does not make overparameterized autoencoders [44]. In recent work, we
assumptions on the number of trajectories or branching showed that such autoencoders learn a latent space that
points between consecutive time points and it has provided stretches along the main axis of variation [139], which
unique insights into single-cell trajectories during cellular in time-varying processes is likely to be along the time
reprogramming. axis. Stretching along this direction would lead to a latent
Critical in any method for lineage tracing is how the space where the different time points are better aligned,
similarity between two cells is measured. All the above and thus, optimal transport could be applied using the
approaches rely on classical methods for dimension reduc- Euclidean distance as a cost function. In addition, unbal-
tion such as PCA, ICA, or t-SNE and then measure similar- anced optimal transport can be used to account for cell
ity between cells in this reduced feature space using the proliferation and/or death and analyze which subpopula-
Euclidean distance. Features learned using linear methods tions of cells become more or less prominent in earlier or
such as PCA or ICA may be insufficient since the real data later time points. Using our unbalanced optimal transport
distribution is likely to live in a complex low-dimensional framework described in Section II-B, we were able to use
manifold. While the above approaches have been applied the inferred scaling factor ξ(·) of each cell to identify
to sequencing data, this limitation becomes particularly cells that are prone for differentiation during zebrafish
apparent when attempting to learn single-cell trajectories embryogenesis as well as genes that are critical to this
based on imaging data. While genes provide a natural process [66]. If different data modalities are available at
coordinate system for sequencing data, pixel i in one image different time points, then the multimodal data integration
usually has no relation with pixel i in another image, methods as described in Section III can be combined with
and thus, the Euclidean distance in a linearly transformed optimal transport in the latent space in order to identify
feature space of the images is not adequate capture mean- and analyze the features in each data modality that are
ingful semantic relationships between cells. While t-SNE critical to the underlying cell-state transitions.
allows for a nonlinear embedding, the transformation is Case Study 2 (Lineage Tracing During Breast Cancer
not reversible and therefore does not allow for interrogat- Progression Using Single-Cell Images): Breast cancer initia-
ing the latent space. tion and progression is a slow process involving multiple
Given what we have discussed in Sections II and III, it is cell-state transitions. In particular, this involves breast
natural to consider the use of an autoencoder to embed epithelial cells, for example, in the ducts, accumulating
the datasets from the different time points into a joint mutations and transitioning into more mesenchymal (i.e.,
latent space and then perform optimal transport in this migratory) states and eventually turning into invasive can-
latent space, an approach that we proposed in [138] to cer cells. Since these transitions occur over long periods,
infer pseudo-lineages from DAPI-stained single-cell images following a population of cells throughout disease pro-
during breast cancer progression (see Fig. 5 and Case gression is currently experimentally infeasible. The current
Study 2). Importantly, an autoencoder framework allows approaches to track disease progression involve carrying
for using a cost function that is directly learned from out biopsies. This provides datasets showing snapshots

Vol. 110, No. 5, May 2022 | P ROCEEDINGS OF THE IEEE 569

Authorized licensed use limited to: CATHOLIC UNIVERSITY OF KOREA. Downloaded on April 15,2023 at 03:14:20 UTC from IEEE Xplore. Restrictions apply.
Uhler and Shivashankar: Machine Learning Approaches to Single-Cell Data Integration and Translation

Fig. 5. Lineage tracing from snapshots in time using a combined autoencoder and optimal transport framework. (a) Overview of method
consisting of three steps. (b) Encoding the images from different snapshots in time into a common latent space using an autoencoder.
(c) Learning a transport map between the different time points using (unbalanced) optimal transport. (d) Decoding points in the latent space
back to single-cell images of earlier cell states generated from metastatic cells. Figures adapted from [66] and [138].

of different populations in time. A major challenge is to interest. At a cellular level, this often means to understand
develop computational approaches to use these datasets how genes regulate each other to give rise to a particular
and generate at the single-cell level predictions of how a phenotype. For example, comparing such gene regula-
cell would look like at earlier or later time points during tory networks between a diseased and normal cell state
disease progression. Such an analysis could provide novel can provide insights into candidate targets for therapeu-
biomarkers for early onset of disease. Fig. 5 shows a tic interventions. Predicting the effect of an intervention
3-D cell culture model and the resulting DNA images of requires taking a causal approach. Given that performing
breast cancer cell lines at various stages of tumor pro- large-scale perturbations is possible in biology [28]–[31],
gression, i.e., normal, fibrocystic, cancerous, and metasta- [77], [97] but the space of domains (cell types/states) and
tic. By embedding these images into a joint latent space perturbations is huge (see Section II-C), the question then
and connecting the distributions via optimal transport, often becomes to transfer the effect of a perturbation from
we were able to generate cell lineages as they transition one domain to another [e.g., given expression data from
through cell-state transitions. We thereby identified spe- a large-scale drug screen in cancer such as CMap [97],
cific features in the DNA packing that are altered during predict the effect of the screened drugs on SARS-CoV-2
disease progression and could be used as novel disease infected cells to identify drugs that could be repurposed
biomarkers. against COVID-19, see Fig. 2(c)] or predict the effect
of an unseen intervention from the observed ones in a
V. F R O M S I N G L E - C E L L fixed domain [e.g., given imaging data from a large-scale
P E R T U R B AT I O N S T O G E N E drug screen in cancer such as cell painting [30], [31],
R E G U L AT I O N A N D D R U G predict the effect of new molecules on cancer cells to
DISCOVERY USING identify new drugs, see Fig. 2(d)]. Viewing large-scale
CAUSAL INFERENCE perturbation screens as a partially observed matrix of
The ultimate goal in biological applications is often perturbations× domains, both of these transfer questions
to obtain a mechanistic understanding of a process of are causal imputation problems, where the former asks to

570 P ROCEEDINGS OF THE IEEE | Vol. 110, No. 5, May 2022

Authorized licensed use limited to: CATHOLIC UNIVERSITY OF KOREA. Downloaded on April 15,2023 at 03:14:20 UTC from IEEE Xplore. Restrictions apply.
Uhler and Shivashankar: Machine Learning Approaches to Single-Cell Data Integration and Translation

Fig. 6. From single-cell perturbations to gene regulation and drug discovery using causal inference. (a) Overparameterized autoencoder
framework for aligning the effect of a perturbation (e.g., drug) across different cell types. (b) Overparameterized autoencoder shows better
alignment of drugs across different cell types (A549 and MCF7) than standard bottlenecked autoencoders while also being able to perfectly
reconstruct the data. (c) Identifying drugs that could revert the effect of a disease on cell states. (d) Causal graph inferred from single-cell
RNA-seq data to identify the effect of targeting specific genes (e.g., RIPK1) for drug discovery. Figures adapted from [140].

complete across columns (domains) and the latter across this explicit by writing a given intervention or domain
rows (perturbations). as a linear combination of other interventions/domains.
Various prior works have considered this causal impu- Taking inspiration from generative models that are com-
tation problem as a matrix completion task. In particular, monly used for style transfer in image-to-image translation
the large-scale perturbation screen, CMap [97], motivated tasks [112], [143], an interesting approach was proposed
various computational approaches for matrix completion: in [144] and [145] that overcomes the linearity assump-
A low-rank matrix completion approach as well as a tion in the synthetic interventions approach. In particular,
weighted nearest neighbor scheme for predicting missing the effect of a drug is represented by a vector in the latent
drug/cell-type combinations were considered in [141]. At space of a GAN or an autoencoder and is seen as a “style”
a high level, the goal is to develop methods that can that can be applied to a new cell type by appending this
use the given observations to identify similarities/structure vector to the cell type in the latent space to generate the
between drugs as well as between cell types to fill in the effect of a drug on this cell type [see Fig. 6(a)]. Note that
missing entries. This was done explicitly in [106] by taking such an approach only works if the effect of a drug on
an approach akin to semisupervised learning and encoding different cell types is aligned in the latent space. Given our
the similarities between drug-cell-type pairs. Additional recent work describing various benefits of using autoen-
datasets can be used to infer such similarities; for example, coders to learn a latent representation of the data that
representations of molecules using graph neural networks is high dimensional than the original space [44], [139],
have been found to be extremely powerful to predict chem- it is natural to consider overparameterized autoencoders
ical properties [142]; such representations were combined for this task; we found that such autoencoders not only
with a flow-based approach in [54] to predict the effect lead to the better reconstruction of the data than bot-
of new drugs on single-cell images and solve the causal tlenecked autoencoders but also to a better alignment of
imputation task on the Cell Painting dataset [30], [31]. drug signatures between different cell types (see Fig. 6(b)
Also, the synthetic interventions approach in [107] makes and [140]). An approach based on vector arithmetic in

Vol. 110, No. 5, May 2022 | P ROCEEDINGS OF THE IEEE 571

Authorized licensed use limited to: CATHOLIC UNIVERSITY OF KOREA. Downloaded on April 15,2023 at 03:14:20 UTC from IEEE Xplore. Restrictions apply.
Uhler and Shivashankar: Machine Learning Approaches to Single-Cell Data Integration and Translation

the latent space also provides an avenue for predicting the be used to identify the gene regulatory network affected
effect of combinations of perturbations [146], an impor- by SARS-CoV-2 infection and propose candidate drugs and
tant research problem going forward. drug targets against COVID-19.
Approaches to causal imputation based on matrix com- Case Study 3 (Predicting the Effect of Drugs in the Context
pletion described above usually treat the rows (perturba- of COVID-19 for Identifying Candidates for Drug Repurpos-
tions) and columns (domains) in the same way. However, ing): The recent COVID-19 pandemic has made it impor-
it is important to note that this is inappropriate from a tant to develop methods that could rapidly identify FDA
causal perspective. The causal imputation problem asks to approved drugs that could also be effective in the context
predict the effect of an intervention (perturbation) when of COVID-19. Making use of large-scale drug screens that
conditioning on a particular domain (cell type/state). As were obtained in other disease contexts to identify such
described also in Section V, intervening and conditioning drugs leads to a causal imputation problem. COVID-19 is a
on a variable are different operations. An important open disease that shows elevated morbidity and fatality rates in
problem is to develop generative models that can distin- the aging population. The lung tissue becomes stiffer with
guish between these two operations and obtain truly causal age and we conjectured that SARS-CoV-2 may make use of
representations. these altered mechanical properties of the lung tissue for
We now consider a specific instance of the causal impu- its replication [153]. In particular, the cells lining the lung
tation problem, which is particularly difficult to solve; epithelium undergo epithelial-to-mesenchymal transitions
namely, consider the transport problem in Fig. 2(d) of pre- as a function of the increased stiffness accompanied with
dicting the effect of a particular unseen perturbation in a aging [154], and the virus signals may turn on different
given domain from observed perturbations in this domain expression programs in these different cell states. In recent
and assume that the perturbation of interest has not been work [140], we used gene expression profiles available
observed in any other domains (i.e., no prior data are avail- from the lung of young and old individuals, as well as
able on this perturbation). If the perturbation is of chem- epithelial cells infected and noninfected with SARS-CoV-2.
ical nature, then the molecular structure can be used to Interestingly, we found that the gene expression signatures
relate the perturbation of interest to other observed pertur- of aging and SARS-CoV-2 infection were highly correlated,
bations and thereby solve the causal imputation problem providing further evidence for their interplay. Using an
(see above and also [54]). However, if the perturbation of overparameterized autoencoder framework, we predicted
interest is of genetic nature, the problem becomes more the effect of FDA-approved drugs on SARS-CoV-2 infected
difficult. Given interventional data based on interventions lung epithelial cells through transport from the CMap
on a subset of genes I ⊂ {1, . . . , p}, how can we predict dataset. The drugs were ranked based on how well their
the effect of an intervention on gene j ∈ / I or a subset effect was predicted to revert the effect of SARS-CoV-2
of genes J = I ? This question is, for example, of great infection [see Fig. 6(c)]. The top-ranked drugs consisted
importance for cell reprogramming to identify combina- mainly of serine/threonine and tyrosine kinase inhibitors.
tions of transcription factors that can give rise to cell-state These drugs have many known targets. In order to identify
transitions [147], [148], e.g., early exhaustive experiments the putative causal gene that could be targeted more
identified four transcription factors, known as Yamanaka specifically, we used causal structure discovery algorithms
factors, that when overexpressed could reprogram somatic for identifying a causal graph on a subset of genes that
cells into induced pluripotent stem cells [149]. A com- are differentially expressed by SARS-CoV-2 expression and
putational approach to this problem requires relating the aging and identified RIPK1 as a target of the identified
effect of an unobserved intervention on gene j to observed top-ranked drugs that is upstream of the differentially
interventions on genes I . The effect of an intervention on expressed genes [see Fig. 6(d)]. Interestingly, RIPK1 has
gene i is similar to the effect of an intervention on gene j if been found to bind to SARS-CoV-2 proteins [155], and
and only if i and j are close in the causal DAG representing an RIPK1 inhibitor is currently in clinical trials against
the gene regulatory network (where a directed edge from COVID-19. Such computational approaches combining
gene i to j means that (the product of) gene i directly gene expression data with causal discovery methods pro-
regulates (the product of) gene j ). In recent work, we vide powerful approaches for repurposing drugs in various
thus used causal structure discovery algorithms that can disease contexts.
make use of a mix of observational and interventional
data (see Section II-C) to learn the regulatory network and VI. C O N C L U S I O N A N D F U T U R E
predict the effect of unobserved genetic interventions. In DIRECTIONS
particular, in recent work, we developed and used causal With the plethora of available methods to profile sin-
structure discovery algorithms for directly learning the gle cells in functional contexts comes the opportunity to
difference between gene regulatory networks in different understand the programs of life, including how genes
cell states (e.g., healthy and diseased) [150], [151] and interact in order to form all the different cell types
jointly learning the gene regulatory networks of related that make up our body, how homeostasis is main-
cell types [152]. In the following case study, we discuss tained, and how diseases start at the single-cell level.
how causal structure discovery and causal imputation can While providing high-throughput and high-dimensional

572 P ROCEEDINGS OF THE IEEE | Vol. 110, No. 5, May 2022

Authorized licensed use limited to: CATHOLIC UNIVERSITY OF KOREA. Downloaded on April 15,2023 at 03:14:20 UTC from IEEE Xplore. Restrictions apply.
Uhler and Shivashankar: Machine Learning Approaches to Single-Cell Data Integration and Translation

measurements, due to the destructive nature of single- states change during disease initiation and progression,
cell experiments, a major challenge to obtain a com- missing in this picture is that cells do not operate in the
plete picture of a cell is the integration and translation vacuum. A cell is sculpted by its microenvironment, it
between different data modalities, time points, and pertur- has neighbors that signal to each other, and they work
bations. In this article, we presented a brief introduction as a community. Understanding the spatial organization
to three machine learning areas (generative modeling, of cells in tissues and deciphering the communication
optimal transport, and causal inference), in which devel- between cells is critical to our understanding of how
opments have opened new avenues for overcoming these organisms develop as well as of how defects at cellular
experimental challenges and providing insights into the scale can affect tissue organization and lead to disease.
structure–function relationship at the cellular level. The spatial transcriptomic revolution has been made pos-
Using three specific examples, we aimed to showcase sible by melding imaging and genomics, with technologies
not only how single-cell biology has benefited from devel- such as MERFISH [156], seqFISH [157], STARmap [158],
opments in these machine learning areas but also how Visium [159], and Slide-seq [160] spanning the spectrum
biological problems have been a source of inspiration for of more imaging to more genomics-based technologies.
novel foundational developments in these areas. In the While rapid progress has been made on the experimen-
first example, we discussed developments in generative tal front, a new computational framework is needed in
models that allow integrating and translating between very order to tackle single-cell biology in its spatial context.
different data modalities such as imaging and sequenc- We envision that current and future progress in spatial
ing, which cannot yet be obtained in the same cell. In transcriptomics will also drive foundational developments
the second example, we discussed research in optimal in machine learning and that the three areas of machine
transport that allows integrating and translating between learning identified in this article will continue to play
snapshots in time to obtain pseudo-lineages of cells. In a critical role in these endeavors. In particular, to more
particular, we discussed how such a framework applied to deeply understand the structure–function relationship and
imaging data acquired during cancer progression provides identify tissue motifs, we need representations of cells that
unique opportunities to study cell-state transitions and are not only informed by the architecture and expression
provide novel biomarkers for early disease diagnostics. In profile of a cell but also by its neighbors in the 3-D tissue
the third example, we moved from predictive to causal context. In addition, we need methods for generating
modeling and discussed developments in causal methods whole tissue lineages forward and backward in time in
for predicting the effect of unseen interventions and how order to understand how cell-state transitions in one cell
such methods could be used for virtual drug screening. In can affect other cells and lead to disease. Finally, we
summary, the explosion of high-throughput experimental need causal inference methods to infer intercellular gene
technologies for profiling single cells together with the regulatory networks and predict the effect of perturba-
tremendous progress in machine learning provide count- tions, thereby providing an avenue for novel therapeutic
less possibilities and a paradigm shift in our understand- interventions that intersect cell-to-cell communication.
ing of single-cell heterogenity and cell-state transitions in
development and disease. Acknowledgment
New exciting opportunities stem from the recent rise of The authors would like to thank all current and former
spatially resolved transcriptomic methods, which makes it members of the Uhler and Shivashankar groups for stim-
possible to obtain transcriptomic data together with the ulating discussions, in particular Karren Dai Yang, Aditya-
3-D position in the tissue. While single-cell technologies narayanan Radhakrishnan, Anastasiya Belyaeva, Chandler
allow comprehensively characterizing the heterogeneity of Squires, and Saradha Venkatachalapathy, without whose
cell types and states in a tissue and understand how these contributions this article would not exist.

REFERENCES
[1] D. A. Lawson, K. Kessenbrock, R. T. Davis, tumour cells,” Nature Rev. Cancer, vol. 19, no. 10, 7th ed. New York, NY, USA: Garland Science,
N. Pervolarakis, and Z. Werb, “Tumour pp. 553–567, Oct. 2019. 2002.
heterogeneity and metastasis at single-cell [6] T. M. Consortium, “A single cell transcriptomic [11] H. Zheng and W. Xie, “The role of 3D genome
resolution,” Nature Cell Biol., vol. 20, no. 12, atlas characterizes aging tissues in the mouse,” organization in development and cell
pp. 1349–1360, Dec. 2018. Nature, vol. 583, no. 7817, pp. 590–595, 2020. differentiation,” Nature Rev. Mol. Cell Biol.,
[2] E. Papalexi and R. Satija, “Single-cell RNA [7] J. H. W. Distler, A.-H. Györfi, M. Ramanujam, vol. 20, no. 9, pp. 535–550, Sep. 2019.
sequencing to explore immune cell heterogeneity,” M. L. Whitfield, M. Königshoff, and R. Lafyatis, [12] C. Uhler and G. V. Shivashankar, “Chromosome
Nature Rev. Immunol., vol. 18, no. 1, pp. 35–45, “Shared and distinct mechanisms of fibrosis,” intermingling: Mechanical hotspots for genome
Jan. 2018. Nature Rev. Rheumatol., vol. 15, no. 12, regulation,” Trends Cell Biol., vol. 27, no. 11,
[3] G. V. Shivashankar, “Mechanical regulation of pp. 705–730, Dec. 2019. pp. 810–819, Nov. 2017.
genome architecture and cell-fate decisions,” [8] J. V. Pluvinage and T. Wyss-Coray, “Systemic [13] B. van Steensel and E. E. M. Furlong, “The role of
Current Opinion Cell Biol., vol. 56, pp. 115–121, factors as mediators of brain homeostasis, ageing transcription in shaping the spatial organization
Feb. 2019. and neurodegeneration,” Nature Rev. Neurosci., of the genome,” Nature Rev. Mol. Cell Biol.,
[4] I. C. Macaulay, C. P. Ponting, and T. Voet, vol. 21, no. 2, pp. 93–102, Feb. 2020. vol. 20, no. 6, pp. 327–337, Mar. 2019.
“Single-cell multiomics: Multiple measurements [9] C. Uhler and G. V. Shivashankar, “Regulation of [14] T. Misteli, “The self-organizing genome: Principles
from single cells,” Trends Genet., vol. 33, no. 2, genome organization and gene expression by of genome architecture and function,” Cell,
pp. 155–168, 2017. nuclear mechanotransduction,” Nature Rev. Mol. vol. 183, no. 1, pp. 28–45, Oct. 2020.
[5] L. Keller and K. Pantel, “Unravelling tumour Cell Biol., vol. 18, no. 12, p. 717, 2017. [15] A. Belyaeva, S. Venkatachalapathy, M. Nagarajan,
heterogeneity by single-cell profiling of circulating [10] B. Alberts et al., Molecular Biology of the Cell, G. V. Shivashankar, and C. Uhler, “Network

Vol. 110, No. 5, May 2022 | P ROCEEDINGS OF THE IEEE 573

Authorized licensed use limited to: CATHOLIC UNIVERSITY OF KOREA. Downloaded on April 15,2023 at 03:14:20 UTC from IEEE Xplore. Restrictions apply.
Uhler and Shivashankar: Machine Learning Approaches to Single-Cell Data Integration and Translation

analysis identifies chromosome intermingling Machines to Paint Write Compose and Play. entropy-transport problems and a new
regions as regulatory hotspots for transcription,” Sebastopol, CA, USA: O’Reilly Media, 2019. Hellinger–Kantorovich distance between positive
Proc. Nat. Acad. Sci. USA, vol. 114, no. 52, [38] G. E. Hinton and R. R. Salakhutdinov, “Reducing measures,” Invent. Math., vol. 211, no. 3,
pp. 13714–13719, Dec. 2017. the dimensionality of data with neural networks,” pp. 969–1117, Mar. 2018.
[16] N. Jain, K. V. Iyer, A. Kumar, and G. V. Science, vol. 313, no. 5786, pp. 504–507, 2006. [64] L. Chizat, G. Peyré, B. Schmitzer, and F.-X. Vialard,
Shivashankar, “Cell geometric constraints induce [39] D. P Kingma and M. Welling, “Auto-encoding “Scaling algorithms for unbalanced transport
modular gene-expression patterns via variational Bayes,” 2013, arXiv:1312.6114. problems,” 2016, arXiv:1607.05816.
redistribution of HDAC3 regulated by actomyosin [40] P. Vincent, H. Larochelle, Y. Bengio, and [65] L. Chizat and S. Di Marino, “A tumor growth
contractility,” Proc. Nat. Acad. Sci. USA, vol. 110, P. A. Manzagol, “Extracting and composing robust model of Hele-Shaw type as a gradient flow,”
no. 28, pp. 11349–11354, Jul. 2013. features with denoising autoencoders,” in Proc. 2017, arXiv:1712.06124.
[17] Y. Takei et al., “Integrated spatial genomics reveals Int. Conf. Mach. Learn., 2008, pp. 1096–1103. [66] K. D. Yang and C. Uhler, “Scalable unbalanced
global architecture of single nuclei,” Nature, [41] S. Rifai, P. Vincent, X. Müller, X. Glorot, and optimal transport using generative adversarial
vol. 590, no. 7845, pp. 344–350, Feb. 2021. Y. Bengio, “Contractive auto-encoders: Explicit networks,” in Proc. Int. Conf. Learn. Represent.,
[18] E. H. Finn et al., “Extensive heterogeneity and invariance during feature extraction,” in Proc. 2019.
intrinsic variation in spatial genome 28th Int. Conf. Mach. Learn. (ICML, 2011, [67] J. Neyman, “Sur les applications de la théorie des
organization,” Cell, vol. 176, no. 6, pp. 833–844. probabilités aux experiences agricoles: Essai des
pp. 1502–1515, 2019. [42] C. Zhang, S. Bengio, M. Hardt, B. Recht, and principes,” Roczniki Nauk Rolniczych, vol. 10,
[19] Y. Goltsev et al., “Deep profiling of mouse splenic O. Vinyals, “Understanding deep learning requires no. 1, pp. 1–51, 1923.
architecture with CODEX multiplexed imaging,” rethinking generalization,” in Proc. Int. Conf. [68] S. Wright, “Correlation and causation,” J. Agricult.
Cell, vol. 174, no. 4, pp. 968–981, 2018. Learn. Represent. (ICLR), 2016. Res., vol. 20, no. 7, pp. 557–585, 1921.
[20] E. Z. Macosko et al., “Highly parallel genome-wide [43] M. Belkin, D. Hsu, S. Ma, and S. Mandal, [69] S. Wright, “The method of path coefficients,” Ann.
expression profiling of individual cells using “Reconciling modern machine-learning practice Math. Statist., vol. 5, no. 3, pp. 161–215, 1934.
nanoliter droplets,” Cell, vol. 161, no. 5, and the classical bias–variance trade-off,” Proc. [70] J. Pearl, “The causal foundations of structural
pp. 1202–1214, 2015. Nat. Acad. Sci. USA, vol. 116, no. 32, equation modeling,” in Handbook of Structural
[21] A. M. Klein et al., “Droplet barcoding for pp. 15849–15854, Aug. 2019. Equation Modeling, R. H. Hoyle, Ed. New York, NY,
single-cell transcriptomics applied to embryonic [44] A. Radhakrishnan, M. Belkin, and C. Uhler, USA: Guilford Press, 2012, pp. 68–91.
stem cells,” Cell, vol. 161, no. 5, pp. 1187–1201, “Overparameterized neural networks implement [71] J. Pearl, Causality: Models, Reasoning and
May 2015. associative memory,” Proc. Nat. Acad. Sci. USA, Inference. Cambridge, U.K.: Cambridge Univ.
[22] A. Rotem et al., “Single-cell ChIP-seq reveals cell vol. 117, no. 44, pp. 27162–27170, Nov. 2020. Press, 2000.
subpopulations defined by chromatin state,” [45] I. J. Goodfellow et al., “Generative adversarial [72] J. M. Robins, “Association, causation, and
Nature Biotechnol., vol. 33, no. 11, nets,” in Proc. NIPS, 2014. marginal structural models,” Synthese, vol. 121,
pp. 1165–1172, Nov. 2015. [46] M. Arjovsky and L. Bottou, “Towards principled pp. 151–179, Jan. 1999.
[23] J. D. Buenrostro et al., “Single-cell chromatin methods for training generative adversarial [73] D. B. Rubin, “Estimating causal effects of
accessibility reveals principles of regulatory networks,” 2017, arXiv:1701.04862. treatments in randomized and nonrandomized
variation,” Nature, vol. 523, no. 7561, [47] M. Arjovsky, S. Chintala, and L. Bottou, studies,” J. Educ. Psychol., vol. 66, no. 5,
pp. 486–490, Jul. 2015. “Wasserstein GAN,” 2017, arXiv:1701.07875. pp. 688–701, 1974.
[24] T. Nagano et al., “Single-cell Hi-C reveals [48] L. V. Kantorovich, “On the translocation of [74] D. B. Rubin, “Causal inference using potential
cell-to-cell variability in chromosome structure,” masses,” in Proc. Doklady Akademii Nauk SSSR, outcomes,” J. Amer. Stat. Assoc., vol. 100, no. 469,
Nature, vol. 502, no. 7469, pp. 59–64, Oct. 2013. vol. 37, 1942, pp. 199–201. pp. 322–331, Mar. 2005.
[25] T. J. Stevens et al., “3D structures of individual [49] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, [75] P. Spirtes, C. Glymour, and R. Scheines, Causation,
mammalian genomes studied by single-cell Hi-C,” and A. Courville, “Improved training of Prediction and Search. Cambridge, MA, USA:
Nature, vol. 544, no. 7648, pp. 59–64, Apr. 2017. Wasserstein GANs,” 2017, arXiv:1704.00028. MIT Press, 2001.
[26] G. J. Knott and J. A. Doudna, “CRISPR-cas guides [50] D. J. Rezende and S. Mohamed, “Variational [76] L. Cong et al., “Multiplex genome engineering
the future of genetic engineering,” Science, inference with normalizing flows,” in Proc. ICML, using CRISPR/cas systems,” Science, vol. 339,
vol. 361, no. 6405, pp. 866–869, Aug. 2018. 2015, pp. 1530–1538. no. 6121, pp. 819–823, 2013.
[27] U. Eggert and T. Mitchison, “Small molecule [51] L. Dinh, D. Krueger, and Y. Bengio, “NICE: [77] D. Feldman et al., “Optical pooled screens in
screening by imaging,” Current Opinion Chem. Non-linear independent components estimation,” human cells,” Cell, vol. 179, no. 3, pp. 787–799,
Biol., vol. 10, no. 3, pp. 232–237, Jun. 2006. 2014, arXiv:1410.8516. 2019.
[28] A. Dixit et al., “Perturb-Seq: Dissecting molecular [52] L. Dinh, J. Sohl-Dickstein, and S. Bengio, “Density [78] C. Glymour, K. Zhang, and P. Spirtes, “Review of
circuits with scalable single-cell RNA profiling of estimation using real NVP,” in Proc. ICLR, 2017. causal discovery methods based on graphical
pooled genetic screens,” Cell, vol. 167, [53] D. P. Kingma and P. Dhariwal, “Glow: Generative models,” Frontiers Genet., vol. 10, p. 524,
pp. 1853–1866, Dec. 2016. flow with invertible 1x1 convolutions,” in Proc. Jun. 2019.
[29] D. A. Jaitin et al., “Dissecting immune circuits by Adv. Neural Inf. Process. Syst., 2018, [79] C. Heinze-Deml, M. H. Maathuis, and
linking CRISPR-pooled screens with single-cell pp. 10215–10224. N. Meinshausen, “Causal structure learning,”
RNA-seq,” Cell, vol. 167, no. 7, pp. 1883–1896, [54] K. Yang et al., “Improved conditional flow models Annu. Rev. Statist. Appl., vol. 5, no. 1,
2016. for molecule to image synthesis,” in Proc. Conf. pp. 371–391, 2018.
[30] M.-A. Bray et al., “Cell painting, a high-content Comput. Vis. Pattern Recognit. (CVPR), 2021, [80] F. Eberhardt and R. Scheines, “Interventions and
image-based assay for morphological profiling pp. 6684–6694. causal inference,” Philos. Sci., vol. 74, no. 5,
using multiplexed fluorescent dyes,” Nature [55] E. Denton, S. Chintala, A. Szlam, and R. Fergus, pp. 981–995, 2007.
Protocols, vol. 11, no. 9, pp. 1757–1774, “Deep generative image models using a Laplacian [81] S. L. Lauritzen, Graphical Models. Oxford, U.K.:
Sep. 2016. pyramid of adversarial networks,” in Proc. Adv. Clarendon Press, 1996.
[31] M.-A. Bray et al., “A dataset of images and Neural Inf. Process. Syst., 2015, pp. 1486–1494. [82] T. Verma and J. Pearl, “Equivalence and synthesis
morphological profiles of 30 000 small-molecule [56] C. Villani, Optimal Transport: Old and New, of causal models,” in Proc. 6th Annu. Conf.
treatments using the cell painting assay,” vol. 338. Paris, France: Springer, 2008. Uncertainty Artif. Intell., 1990, pp. 255–270.
GigaScience, vol. 6, no. 12, Dec. 2017, [57] G. Monge, “Mémoire sur la théorie des déblais et [83] A. Hauser and P. Bühlmann, “Characterization and
Art. no. giw014. des remblais,” Histoire de l’Académie Royale des greedy learning of interventional Markov
[32] S. S. Shai and B. D. Shai, Understanding Machine Sciences de Paris, 1781. equivalence classes of directed acyclic graphs,”
Learning—From Theory to Algorithms. Cambridge, [58] G. Peyré and M. Cuturi, “Computational optimal J. Mach. Learn. Res., vol. 13, pp. 2409–2464,
U.K.: Cambridge Univ. Press, 2014. transport,” 2018, arXiv:1803.00567. Aug. 2012.
[33] T. Hastie, R. Tibshirani, and J. Friedman, [59] M. Cuturi, “Sinkhorn distances: Lightspeed [84] K. D. Yang, A. Katcoff, and C. Uhler,
The Elements of Statistical Mining, Inference and computation of optimal transport,” in Proc. Adv. “Characterizing and learning equivalence classes
Prediction (Springer Series in Statistics). New Neural Inf. Process. Syst., 2013, pp. 2292–2300. of causal DAGs under interventions,” in Proc. Int.
York, NY, USA: Springer, 2009. [60] A. Genevay, M. Cuturi, G. Peyré, and F. Bach, Conf. Mach. Learn., vol. 80, 2018, pp. 5537–5546.
[34] Y. LeCun, Y. Bengio, and G. Hinton, “Deep “Stochastic optimization for large-scale optimal [85] D. M. Chickering, “Optimal structure
learning,” Nature, vol. 521, no. 7553, transport,” in Proc. Adv. Neural Inf. Process. Syst., identification with greedy search,” J. Mach. Learn.
pp. 436–444, May 2015. 2016, pp. 3440–3448. Res., vol. 3, pp. 507–554, Nov. 2002.
[35] Y. Bengio, A. Courville, and P. Vincent, [61] V. Seguy, B. B. Damodaran, R. Flamary, N. Courty, [86] L. Solus, Y. Wang, and C. Uhler, “Consistency
“Representation learning: A review and new A. Rolet, and M. Blondel, “Large-scale optimal guarantees for greedy permutation-based causal
perspectives,” IEEE Trans. Pattern Anal. Mach. transport and mapping estimation,” 2017, inference algorithms,” Biometrika, vol. 108, no. 4,
Intell., vol. 35, no. 8, pp. 1798–1828, Aug. 2013. arXiv:1711.02283. pp. 795–814, Nov. 2021.
[36] I. Goodfellow, Y. Bengio, and A. Courville, Deep [62] L. Chizat, G. Peyré, B. Schmitzer, and F.-X. Vialard, [87] D. I. Bernstein, B. Saeed, C. Squires, and C. Uhler,
Learning. Cambridge, MA, USA: MIT Press, 2016. “Unbalanced optimal transport: Dynamic and “Ordering-based causal structure learning in the
[Online]. Available: Kantorovich formulation,” 2015, presence of latent variables,” Proc. Mach. Learn.
http://www.deeplearningbook.org arXiv:1508.05216. Res., vol. 108, pp. 4098–4108, Jun. 2020.
[37] D. Foster, Generative Deep Learning: Teaching [63] M. Liero, A. Mielke, and G. Savaré, “Optimal [88] Y. Wang, L. Solus, K. D. Yang, and C. Uhler,

574 P ROCEEDINGS OF THE IEEE | Vol. 110, No. 5, May 2022

Authorized licensed use limited to: CATHOLIC UNIVERSITY OF KOREA. Downloaded on April 15,2023 at 03:14:20 UTC from IEEE Xplore. Restrictions apply.
Uhler and Shivashankar: Machine Learning Approaches to Single-Cell Data Integration and Translation

“Permutation-based causal inference algorithms species,” Nature Biotechnol., vol. 36, no. 5, p. 411, Nature Methods, vol. 14, no. 10, pp. 979–982,
with interventions,” in Proc. Neural Inf. Process. 2018. 2017.
Syst., vol. 31, 2017, pp. 5824–5833. [109] L. Haghverdi, A. T. Lun, M. D. Morgan, and [129] M. Guo, E. L. Bao, M. Wagner, J. A. Whitsett, and
[89] C. Squires, Y. Wang, and C. Uhler, J. C. Marioni, “Batch effects in single-cell Y. Xu, “SLICE: Determining cell differentiation and
“Permutation-based causal structure learning with RNA-sequencing data are corrected by matching lineage based on single cell entropy,” Nucleic Acids
unknown intervention targets,” in Proc. 36th Conf. mutual nearest neighbors,” Nature Biotechnol., Res., vol. 20, no. 45, Dec. 2016, Art. no. gkw1278.
Uncertainty Artif. Intell. (UAI), 2020, vol. 36, no. 5, p. 421, 2018. [130] J. Shin et al., “Single-cell RNA-seq with waterfall
pp. 1039–1048. [110] R. Lopez, J. Regier, M. B. Cole, M. I. Jordan, and reveals molecular cascades underlying adult
[90] F. Eberhardt, C. Glymour, and R. Scheines, “On the N. Yosef, “Deep generative modeling for single-cell neurogenesis,” Cell Stem Cell, vol. 17, no. 3,
number of experiments sufficient and in the worst transcriptomics,” Nature Methods, vol. 15, no. 12, pp. 360–372, Sep. 2015.
case necessary to identify all causal relations pp. 1053–1058, Dec. 2018. [131] Z. Ji and H. Ji, “TSCAN: Pseudo-time
among n variables,” in Proc. 21st Conf. Uncertainty [111] M. Amodio et al., “Exploring single-cell data with reconstruction and evaluation in single-cell
Artif. Intell., 2005, pp. 178–184. deep multitasking neural networks,” Nature RNA-seq analysis,” Nucleic Acids Res., vol. 44,
[91] A. Hyttinen, F. Eberhardt, and P. O. Hoyer, Methods, vol. 16, no. 11, pp. 1139–1145, no. 13, p. e117, Jul. 2016.
“Experiment selection for causal discovery,” 2019. [132] E. Marco et al., “Bifurcation analysis of single-cell
J. Mach. Learn. Res., vol. 14, no. 1, [112] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, gene expression data reveals epigenetic
pp. 3041–3071, Oct. 2013. “Unpaired image-to-image translation using landscape,” Proc. Nat. Acad. Sci. USA, vol. 111,
[92] E. Lindgren, M. Kocaoglu, A. G. Dimakis, and cycle-consistent adversarial networks,” in Proc. no. 52, pp. E5643–E5650, Dec. 2014.
S. Vishwanath, “Experimental design for IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, [133] S. C. Bendall et al., “Single-cell trajectory
cost-aware learning of causal graphs,” in Proc. pp. 2223–2232. detection uncovers progression and regulatory
Adv. Neural Inf. Process. Syst., 2018, [113] T. Stuart et al., “Comprehensive integration of coordination in human B cell development,” Cell,
pp. 5279–5289. single-cell data,” Cell, vol. 177, no. 7, vol. 157, no. 3, pp. 714–725, Apr. 2014.
[93] C. Squires, S. Magliacane, K. Greenewald, D. Katz, pp. 1888–1902, 2019. [134] M. Setty et al., “Wishbone identifies bifurcating
M. Kocaoglu, and K. Shanmugam, “Active [114] J. Liu, Y. Huang, R. Singh, J.-P. Vert, and developmental trajectories from single-cell data,”
structure learning of causal DAGs via directed W. S. Noble, “Jointly embedding multiple Nature Biotechnol., vol. 34, no. 6, p. 637, 2016.
clique trees,” in Proc. Adv. Neural Inf. Process. single-cell omics measurements,” in Proc. 19th Int. [135] F. A. Wolf et al., “PAGA: Graph abstraction
Syst., vol. 33, 2020, pp. 21500–21511. Workshop Algorithms Bioinf. (WABI), 2019, reconciles clustering with trajectory inference
[94] R. Agrawal, C. Squires, K. D. Yang, K. vol. 143, no. 10, pp. 1–13. through a topology preserving map of single
Shanmugam, and C. Uhler, “Abcd-strategy: [115] G. Gundersen, B. Dumitrascu, J. T. Ash, and cells,” Genome Biol., vol. 20, no. 1, p. 59,
Budgeted experimental design for targeted causal B. E. Engelhardt, “End-to-end training of deep Dec. 2019.
structure discovery,” in Proc. Mach. Learn. Res., probabilistic CCA on paired biomedical [136] C. Weinreb, S. Wolock, B. K. Tusi, M. Socolovsky,
vol. 89, 2019, pp. 3400–3409. observations,” in Proc. 35th Conf. Uncertainty Artif. and A. M. Klein, “Fundamental limits on dynamic
[95] A. Ghassami, S. Salehkaleybar, and N. Kiyavash, Intell., 2019, pp. 945–955. inference from single-cell snapshots,” Proc. Nat.
“Interventional experiment design for causal [116] K. E. Wu, K. E. Yost, H. Y. Chang, and J. Zou, Acad. Sci. USA, vol. 115, no. 10, pp.
structure learning,” 2019, arXiv:1910.05651. “BABEL enables cross-modality translation E2467–E2476, Mar. 2018.
[96] S. Sussex, A. Krause, and C. Uhler, “Near-optimal between multiomic profiles at single-cell [137] G. Schiebinger et al., “Reconstruction of
multi-perturbation experimental design for causal resolution,” Proc. Nat. Acad. Sci. USA, vol. 118, developmental landscapes by optimal-transport
structure learning,” 2021, arXiv:2105.14024. no. 15, Apr. 2021, Art. no. e2023070118. analysis of single-cell gene expression sheds light
[97] A. Subramanian et al., “A next generation [117] J. Cao et al., “Joint profiling of chromatin on cellular reprogramming,” Cell, vol. 176, no. 4,
connectivity map: L1000 platform and the first accessibility and gene expression in thousands of pp. 928–943, 2019.
1,000,000 profiles,” Cell, vol. 171, no. 6, single cells,” Science, vol. 361, no. 6409, [138] K. D. Yang, K. Damodaran, S. Venkatchalapathy,
pp. 1437–1452, 2017. pp. 1380–1385, Sep. 2018. A. C. Soylemezoglu, G. V. Shivashankar, and
[98] E. Bareinboim and J. Pearl, “Causal inference and [118] S. Ma et al., “Chromatin potential identified by C. Uhler, “Autoencoder and optimal transport to
the data-fusion problem,” Proc. Nat. Acad. Sci. shared single-cell profiling of RNA and infer single-cell trajectories of biological
USA, vol. 113, no. 27, pp. 7345–7352, chromatin,” Cell, vol. 183, no. 4, pp. 1103–1116, processes,” PLoS Comput. Biol., vol. 16, Jan. 2020,
Jul. 2016. 2020. Art. no. e1007828.
[99] C. Uhler, G. Raskutti, P. Bühlmann, and B. Yu, [119] K. D. Yang et al., “Multi-domain translation [139] S. Jain, A. Radhakrishnan, and C. Uhler,
“Geometry of the faithfulness assumption in between single-cell imaging and sequencing data “A mechanism for producing aligned latent spaces
causal inference,” Ann. Statist., vol. 41, no. 2, using autoencoders,” Nature Commun., vol. 12, with autoencoders,” 2021, arXiv:2106.15456.
pp. 436–463, Apr. 2013. no. 1, p. 31, Dec. 2021. [140] A. Belyaeva et al., “Causal network models of
[100] A. Abadie, A. Diamond, and J. Hainmueller, [120] K. D. Yang and C. Uhler, “Multi-domain SARS-CoV-2 expression and aging to identify
“Synthetic control methods for comparative case translation by learning uncoupled autoencoders,” candidates for drug repurposing,” Nature
studies: Estimating the effect of California’s in Proc. Comput. Biol. Workshop, Int. Conf. Mach. Commun., vol. 12, no. 1, p. 1024, Dec. 2021.
tobacco control program,” J. Amer. Stat. Assoc., Learn., 2019. [141] R. Hodos et al., “Cell-specific prediction and
vol. 105, no. 490, pp. 493–505, 2007. [121] A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, application of drug-induced gene expression
[101] A. Agarwal, D. Shah, and D. Shen, “Synthetic and B. Frey, “Adversarial autoencoders,” 2015, profiles,” in Proc. Pacific Symp. Biocomput.,
interventions,” 2020, arXiv:2006.07691. arXiv:1511.05644. vol. 23, 2018, pp. 32–43.
[102] B. Recht, M. Fazel, and P. A. Parrilo, “Guaranteed [122] C. Uhler and G. V. Shivashankar, “Nuclear [142] K. Yang et al., “Analyzing learned molecular
minimum-rank solutions of linear matrix mechanopathology and cancer diagnosis,” Trends representations for property prediction,” J. Chem.
equations via nuclear norm minimization,” Soc. Cancer, vol. 4, no. 4, pp. 320–331, Apr. 2018. Inf. Model., vol. 59, no. 8, pp. 3370–3388,
Ind. Appl. Math. Rev., vol. 52, no. 3, pp. 471–501, [123] A. Radhakrishnan, K. Damodaran, Aug. 2019.
2010. A. C. Soylemezoglu, C. Uhler, and [143] M. Y. Liu and O. Tuzel, “Coupled generative
[103] E. J. Candès and B. Recht, “Exact matrix G. V. Shivashankar, “Machine learning for nuclear adversarial networks,” in Proc. Adv. Neural Inf.
completion via convex optimization,” Commun. mechano-morphometric biomarkers in cancer Process. Syst., vol. 29, 2016, pp. 469–477.
ACM, vol. 55, no. 6, pp. 111–119, 2012. diagnosis,” Sci. Rep., vol. 7, no. 1, p. 17946, [144] M. Lotfollahi, F. A. Wolf, and F. J. Theis, “ScGen
[104] E. J. Candès and T. Tao, “The power of convex Dec. 2017. predicts single-cell perturbation responses,”
relaxation: Near-optimal matrix completion,” IEEE [124] J. E. Smith-Garvin, G. A. Koretzky, and Nature Methods, vol. 16, no. 8, pp. 715–721,
Trans. Inf. Theory, vol. 56, no. 5, pp. 2053–2080, M. S. Jordan, “T cell activation,” Annu. Rev. Aug. 2019.
May 2010. Immunol., vol. 27, pp. 591–619, Apr. 2009. [145] A. Ghahramani, F. M. Watt, and N. M. Luscombe,
[105] S. Arora, N. Cohen, W. Hu, and Y. Luo, “Implicit [125] S. Gupta et al., “Developmental heterogeneity in “Generative adversarial networks simulate gene
regularization in deep matrix factorization,” in DNA packaging patterns influences T-cell expression and predict perturbations in single
Proc. Adv. Neural Inf. Process. Syst., H. Wallach, activation and transmigration,” PLoS ONE, vol. 7, cells,” bioRxiv:10.1101/262501, Jan. 2018,
H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, no. 9, Sep. 2012, Art. no. e43718. Art. no. 262501.
E. Fox, and R. Garnett, Eds. Red Hook, NY, USA: [126] H. Acloque, M. S. Adams, K. Fishwick, [146] M. Lotfollahi et al., “Learning interpretable
Curran Associates, 2019. M. Bronner-Fraser, and M. A. Nieto, cellular responses to complex perturbations in
[106] A. Radhakrishnan, G. Stefanakis, M. Belkin, and “Epithelial-mesenchymal transitions: high-throughput screens,”
C. Uhler, “Simple, fast, and flexible framework for The importance of changing cell state in bioRxiv:10.1101/2021.04.14.439903v2,
matrix completion with infinite width neural development and disease,” J. Clin. Invest., Jan. 2021.
networks,” 2021, arXiv:2108.00131. vol. 119, no. 6, pp. 1438–1449, Jun. 2009. [147] S. A. Morris and G. Q. Daley, “A blueprint for
[107] C. Squires, D. Shen, A. Agarwal, D. Shah, and [127] C. Trapnell et al., “The dynamics and regulators of engineering cell fate: Current technologies to
C. Uhler, “Causal imputation via synthetic cell fate decisions are revealed by pseudotemporal reprogram cell identity,” Cell Res., vol. 23, no. 1,
interventions,” 2020, arXiv:2011.03127. ordering of single cells,” Nature Biotechnol., pp. 33–48, Jan. 2013.
[108] A. Butler, P. Hoffman, P. Smibert, E. Papalexi, and vol. 32, no. 4, pp. 381–386, Apr. 2014. [148] J. H. Hanna, K. Saha, and R. Jaenisch,
R. Satija, “Integrating single-cell transcriptomic [128] X. Qiu et al., “Reversed graph embedding resolves “Pluripotency and cellular reprogramming: Facts,
data across different conditions, technologies, and complex single-cell developmental trajectories,” hypotheses, unresolved issues,” Cell, vol. 143,

Vol. 110, No. 5, May 2022 | P ROCEEDINGS OF THE IEEE 575

Authorized licensed use limited to: CATHOLIC UNIVERSITY OF KOREA. Downloaded on April 15,2023 at 03:14:20 UTC from IEEE Xplore. Restrictions apply.
Uhler and Shivashankar: Machine Learning Approaches to Single-Cell Data Integration and Translation

no. 4, pp. 508–525, Nov. 2010. [153] C. Uhler and G. V. Shivashankar, [157] E. Lubeck, A. F. Coskun, T. Zhiyentayev,
[149] K. Takahashi and S. Yamanaka, “Induction of “Mechano-genomic regulation of coronaviruses M. Ahmad, and L. Cai, “Single-cell in situ RNA
pluripotent stem cells from mouse embryonic and and its interplay with ageing,” Nature Rev. Mol. profiling by sequential hybridization,” Nature
adult fibroblast cultures by defined factors,” Cell, Cell Biol., vol. 21, no. 5, pp. 247–248, May 2020. Methods, vol. 11, no. 4, pp. 360–361, Apr. 2014.
vol. 126, no. 4, pp. 663–676, Aug. 2006. [154] C. Uhler and G. V. Shivashankar, [158] X. Wang et al., “Three-dimensional intact-tissue
[150] Y. Wang, C. Squires, A. Belyaeva, and C. Uhler, “Mechanogenomic coupling of lung tissue sequencing of single-cell transcriptional states,”
“Direct estimation of differences in causal graphs,” stiffness, EMT and coronavirus pathogenicity,” Science, vol. 361, no. 6400, Jul. 2018,
in Proc. Adv. Neural Inf. Process. Syst., vol. 31, Current Opinion Solid State Mater. Sci., vol. 25, Art. no. eaat5691.
2018, pp. 3774–3785. no. 1, Feb. 2021, Art. no. 100874. [159] 4210x Genomics Acquires Spatial Transcriptomics
[151] A. Belyaeva, C. Squires, and C. Uhler, “DCI: [155] D. E. Gordon et al., “A SARS-CoV-2 protein 2018. Accessed: Feb. 9, 2021. [Online]. Available:
Learning causal differences between gene interaction map reveals targets for drug https://www.10xgenomics.com/spatial-
regulatory networks,” Bioinformatics, vol. 37, repurposing,” Nature, vol. 583, no. 7816, transcriptomics
no. 18, pp. 3067–3069, Sep. 2021. pp. 459–468, 2020. [160] S. G. Rodriques et al., “Slide-seq: A scalable
[152] Y. Wang, S. Segarra, and C. Uhler, [156] K. H. Chen, A. N. Boettiger, J. R. Moffitt, S. Wang, technology for measuring genome-wide
“High-dimensional joint estimation of multiple and X. Zhuang, “Spatially resolved, highly expression at high spatial resolution,” Science,
directed Gaussian graphical models,” Electron. J. multiplexed RNA profiling in single cells,” Science, vol. 363, no. 6434, pp. 1463–1467,
Statist., vol. 14, no. 1, pp. 2439–2483, Jan. 2020. vol. 348, no. 6233, Apr. 2015, Art. no. aaa6090. Mar. 2019.

ABOUT THE AUTHORS


Caroline Uhler received the B.Sc. degree G. V. Shivashankar received the
in biology, the M.Sc. degree in mathemat- Ph.D. degree from Rockefeller University,
ics, and the M.Ed. degree from the Uni- New York, NY, USA, in 1999. He did his post-
versity of Zürich, Zürich, Switzerland, all doctoral research at the NEC Research Insti-
in 2011, and the Ph.D. degree in statistics tute, Princeton, NJ, USA, from 1999 to 2000.
from the University of California at Berkeley He is currently a Full Professor of
(UC Berkeley), Berkeley, CA, USA, in 2011. mechanogenomics with the Department
She spent three years as an Assistant of Health Sciences and Technology, ETH
Professor at IST Austria, Klosterneuburg, Zürich, Zürich, Switzerland, and heads the
Austria, in 2015. She is currently an Associate Professor with the Laboratory for Nanoscale Biology, Paul Scherrer Institute, Villigen,
Department of Electrical Engineering and Computer Science and Switzerland. He was a tenured Faculty Member at the National
the Institute for Data, Systems, and Society, Massachusetts Insti- Center for Biological Sciences, NCBS-TIFR, Bengaluru, India, from
tute of Technology (MIT), Cambridge, MA, USA. She is also a Core 2000 to 2009, before relocating to the National University of
Institute Member of the Broad Institute, where she codirects the Singapore (NUS), Singapore, in 2010. He was the Deputy Director of
Eric and Wendy Schmidt Center. Her research interest lies at the the Mechanobiology Institute, NUS, from 2011 to 2019. From 2014
intersection of machine learning, statistics, and genomics, with a to 2019, he was the IFOM-NUS Chair Professor before joining ETH
particular focus on computational models of genome packing and Zürich. His research interest lies at the intersection of cell biology
regulation. and bioengineering with a particular focus on understanding the
Dr. Uhler is a Simons Investigator, a Sloan Research Fellow, links between microenvironmental control of genome regulation in
and an elected member of the International Statistical Institute. health and disease.
Her scientific awards also include NSF Career Award, the Sofja Dr. Shivashankar was elected to the Indian Academy of Sciences
Kovalevskaja Award from the Humboldt Foundation, and the START in 2010 and the EMBO Membership in 2019.
Award from the Austrian Science Foundation.

576 P ROCEEDINGS OF THE IEEE | Vol. 110, No. 5, May 2022

Authorized licensed use limited to: CATHOLIC UNIVERSITY OF KOREA. Downloaded on April 15,2023 at 03:14:20 UTC from IEEE Xplore. Restrictions apply.

You might also like