Pasta Pattern Analysis for Spatial Omics Data
Pasta Pattern Analysis for Spatial Omics Data
Martin Emons1,† , Samuel Gunz1,† , Helena L. Crowell2 , Izaskun Mallona1 , Reinhard Furrer3 , and
Mark D. Robinson1,∗
arXiv:2412.01561v1 [q-bio.QM] 2 Dec 2024
1
Department of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich,
Zurich, Switzerland
2
Centro Nacional de Análisis Genómico (CNAG), Barcelona, Spain
3
Department of Mathematical Modeling and Machine Learning, University of Zurich, Zurich, Switzerland
†
Equal contribution: both reserve the right to list themselves as first author; author order was determined
by flipping a Swiss 5 franc coin.
∗
Correspondence to: [email protected]
December 3, 2024
Abstract
Spatial omics assays allow for the molecular characterisation of cells in their spatial context. Notably,
the two main technological streams, imaging-based and high-throughput sequencing-based, can give
rise to very different data modalities. The characteristics of the two data types are well known in
adjacent fields such as spatial statistics as point patterns and lattice data, and there is a wide range
of tools available. This paper discusses the application of spatial statistics to spatially-resolved
omics data and in particular, discusses various advantages, challenges, and nuances. This work is
accompanied by a vignette, pasta, that showcases the usefulness of spatial statistics in biology using
several R packages.
Introduction
Molecular profiling of cells in organs or tissues can be accomplished either in bulk or at the single-cell
level. However, tissue must be dissociated, which may select against certain cell types and results
in the loss of the spatial organization of the cells. This limitation is overcome with spatial profiling
techniques, which can range from spatial proteomics (e.g., IMC, MIBI-TOF, 4i, CODEX, mIF [1,
2, 3, 4, 5]), to spatial transcriptomics assays based on fluorescence in situ hybridisation (e.g., FISH,
MERFISH, CosMx, Xenium [6, 7, 8]) and based on sequencing (e.g., 10x Visium and Visium HD,
Slide-seq V1&V2 [9, 10, 11, 12]) to even spatial epigenomics [13]. Moreover, the combination of
individual methods enabled spatial multiomic research [14]. The technologies and their application
in biological research have been the topic of various detailed reviews [15, 16, 17, 18, 19, 20].
1
In this work, we focus on the concepts and application of exploratory statistical approaches for
spatial data with a focus on downstream steps; our overview also centres around key differences
between data modalities that result from current spatial profiling technologies. In this regard, we
complement the ongoing discussion [21, 22] about the challenges in applying spatial statistics to
spatial (transcript)omics data.
Most spatial omics assays can be classified as either high-throughput sequencing (HTS) based or
imaging-based. In HTS-based approaches, positional information is recorded using a predetermined
array of spots or beads. Imaging-based approaches, however, either target the molecules of interest
with fluorescent probes, ablate regions stained with a cocktail of antibodies via metal tag readouts,
or target sequences are amplified and sequenced in situ. Several technologies are emerging, but the
main trade-offs stem from the resolution, number of features, and sensitivity of the readout [17,
23]. For example, HTS-based approaches capture the entire transcriptome (i.e., untargeted), but
come at a resolution determined by the spot size. On the other hand, imaging-based approaches
typically have lower depth of information (i.e., targeted) but show a higher resolution in comparison
to HTS-based approaches. Note also that the landscape of spatially-resolved molecular technologies
is rapidly changing, mostly in terms of higher resolution for HTS-based methods and comprehensive
marker panels for imaging-based approaches (Figure 1, Technology) [16, 17, 24].
In terms of data analysis, the technological streams are quite distinct. HTS-based approaches
collect data along regularly-spaced spots or beads resulting in a so-called regular lattice. In contrast,
imaging-based approaches measure features at exact locations that can be assumed to have origi-
nated from a stochastic process known as a point process [17]. Thus, we will distinguish between
lattice-based and point pattern-based spatial omic data. While the focus is often on the technol-
ogy (c.f., imaging-based versus HTS-based), we argue that the distinction should lie at the data
representation level [17]. This is important, since there are technologies that can be represented as
both point patterns and lattice data, depending on the resolution and data processing. For exam-
ple, imaging-based technologies that detect transcripts at subcellular resolution result in data where
transcript locations can be represented as a point pattern, while for the same dataset, segmented
cells can be represented either as an irregular lattice, or as a point pattern via their cell centroids.
In general, if events of a point pattern are aggregated in specified regions (e.g., transcripts per cell
from segmentation), we end up with lattice data (Figure 1, Data modality) [25, 26].
Since we can represent some spatial omic datasets as either lattice or point pattern, it is important
to further understand the assumptions and analysis strategies of such data modalities [27]. On the
one hand, spatial point pattern analysis assumes that the point locations were generated by a
stochastic process, a so-called event-based view on the data; the goal is to study the properties of
this process [28]. On the other hand, one could assume that the locations are fixed and known at
the time of sampling and study the associated features at each location via an observation-based
view on the data, while recognising that the observations in the lattice are not independent due to
the spatial structure [29].
Spatial data modalities have been present in other fields for decades, such as geography, and in
2
particular, the field of spatial statistics offers a large analysis toolbox. For example, spatial omics
data collected across a lattice can be analysed by exploratory tools, such as Local Moran’s I and
Bivariate Lee’s L. Imaging data generated by a stochastic process can be input for methods of point
pattern analysis, including summaries such as Ripley’s K or empty-space functions [30, 25, 28, 26]
(Figure 1, Analysis).
There are already several tools for omics analysis that employ spatial statistics approaches.
Amongst others, Voyager is a framework and a collection of use cases for lattice data, written in
both R and python [22]; it provides data structures and functionalities to compute spatial statistics
in molecular biology. Voyager comes with comprehensive vignettes for several spatial profiling
technologies. The methodological focus of their vignettes is on lattice data analysis and the spdep
geospatial package. Since Voyager offers efficient implementations for lattice data analysis, we
build on their framework in our vignettes for the lattice data component. Other methods written
in R include: SPIAT, which contains various tools for spatial analysis, such as: neighbourhood
analysis, local metrics and heterogeneity scores etc. [31]; scFeatures, a toolbox comprising lattice
data analysis functionality such as Moran’s I and point pattern metrics as the L-function [32];
spicyR, which uses a compressed version of the global L-function for cross-sample comparison [33];
lisaClust, a spatial domain detection method via LISA (local indicators of spatial association) L
curves [34]; mxfda, that uses point pattern summary functions for survival analysis [35]; spatialDM
uses global and local bivariate Moran’s I to score colocalisation of ligand-receptor pairs [36]; and
MERINGUE, which uses nearest-neighbour autocorrelation for spatially variable gene selection [37].
Methods in python include squidpy, a package with various spatial statistics tools for both point
pattern and lattice data [38]. For a detailed view on general methods development in the field of
spatial transcriptomics, we refer readers to the online Supplementary table of the museum of spatial
transcriptomics paper [18].
Here, we will explore the application of spatial statistics to spatially-resolved omics data guided
by the two main streams of data, point pattern-based and lattice-based methods. We will compare
the two streams and show their strengths and limitations. Finally, we give an outlook on the
challenges and research gaps in the field of spatial data analysis for omics data. Furthermore, we
provide a vignette1 that showcases analysis of data from multiple technologies, where concepts and
assumptions are discussed in detail, with inline R code. Overall, our resource Pattern Analysis for
SpaTial omics datA (pasta) will highlight the usefulness and transferability of existing exploratory
spatial statistics approaches in the context of spatial tissue profiling.
1 https://robinsonlabuzh.github.io/pasta
3
TECHNOLOGY
Imaging-based HTS-based
- Targeted - Untargeted
- Higher resolution - Lower resolution
- Trade-off area / time - Standardised arrays
DATA MODALITY
depending on
resolution
ANALYSIS
Point pattern analysis Lattice data analysis
Figure 1: Technology: Spatial omics technologies can be divided into two major streams: imaging-
based and high-throughput sequencing (HTS)-based. Examples of both imaging-based (STARmap
PLUS; left) and HTS-based (10X Visium; right) datasets are shown (Data: [39, 40], clustered with
Banksy [41]). Data modality: these technology streams lead to distinct data modalities. Imaging-
based omics can represent both stochastically-generated point patterns as well as irregular lattices.
Most HTS-based data, on the other hand, can be interpreted as a regular lattice due to the regularity
of the sampling locations; approaches are emerging that could allow high-resolution HTS-based data
to be segmented into cells [42]. Analysis: depending on the data representation, different approaches
of data analysis are available to the analyst.
4
Results
Point Pattern Analysis
Definitions Most imaging-based technologies give a high-resolution readout of either subcellular
compartments (e.g., laser ablation pixels in IMC [1]) or even individual transcripts (e.g., FISH [43]).
In molecule-level technologies, the expression of features (transcript quantifications, ion counts,
fluorescent intensities) is recorded where they occur [15]. Often, we are interested in the distribution
and composition of cell types. In order to annotate cells, we require (good) cell segmentation
boundaries (an active area of research not discussed here; see [44, 45, 46, 47, 48]), allowing transcripts
to be allocated to a given cell [49], and then cell type inference. Since the locations of the annotated
cells are generated by a stochastic biological process, we can approximate e.g., the cell centroids as
points and analyse their location as point patterns.
In the analysis of these point patterns, the goal is to make inferences on the point process that
generated the data, not on the patterns themselves [28, p. 127]. Points in a point pattern can also
carry a mark, which is a (univariate or multivariate) variable associated with a point [28, pp. 147,
563]. In cell biology, we could imagine both categorical (e.g., cell type or spatial domain labels) or
continuous marks (e.g., gene expression).
When a point pattern with many cell types is considered, there are two approaches to formulate
the setup. In the first variant, we consider cells in a tissue to depend on each other so their
distribution is due to one overarching biological process; this is referred to as a multitype view. In
contrast, when we consider the m patterns to be created by m point processes, we assume that
these processes can be individually analysed; this represents the multivariable viewpoint (change
in terminology in comparison to [28]). The processes do not need to be independent, but the
independence of components (and functions) is often used as a null model for multivariable analysis
(see box Point Processes) [28, pp. 565ff.].
Assumptions Many imaging platforms measure patterns in a field of view (FOV) and collect
several such FOVs per sample. In some technologies, there are no gaps between the individual
FOVs and they can be assembled into one big image. If there are gaps, consecutive FOVs can
either be stitched together (which can be computationally complex) or analysed individually [15,
50]. When analysing individual FOVs, we do not observe the entire pattern but rather a subset of
the bigger pattern. This is called window sampling because the window is just a sample of the bigger
point pattern. A related concept is the small world model, which describes that points can only be
observed in a finite world and not beyond these boundaries [28, pp. 143ff.]. Often, the distinction
between the window sampling and small world model concepts is not clear. An example is the
arrangement of the epithelium, which can can be imaged with several FOVs (window sampling) but
there are no cells expected outside of the epithelial boundary (small world) [51]. Therefore, in cell
biology, we often encounter a mixture of these two concepts.
Apart from the window of measurement, we need to make assumptions on the statistical proper-
5
ties of the point pattern, including, most importantly, whether points can be considered homogeneous
or not. Homogeneity (see Box Point Processes) assumes that the number of points in a given region
B is proportional to its area |B|, for arbitrary B; i.e., homogeneity refers to a uniform intensity
of points across the window of measurement. If that is not the case, the point process is said to
be inhomogeneous [28, p.132-133]. This difference has important implications for the analysis of a
process. For example, when adjusting for varying local intensities, the process could be found to be
inhomogeneous rather than genuinely clustered. This is called the confounding between intensity
and interaction (see box Point Processes) [28, pp.151-152].
6
epithelial cells and B cells [28, pp. 561ff.].
Examples A well-known point pattern summary is Ripley’s K-function, defined as the expected
number of points falling in a specified radius around an arbitrary point, averaged over all points in
the observation window (e.g., FOV) (potentially adjusted for edge effects; see box Point Processes).
Ripley’s K-function thus quantifies the correlation structure of a point pattern, with within-type and
cross-type versions. A point pattern where points have no preference for location shows complete
spatial randomness (CSR). This means that the number of points that fall in an arbitrary region
follow a Poisson distribution with a rate parameter that is proportional to the size of the region.
A clustered process would have values of the K-function larger than the Poisson process at that
specific scale; values of K smaller than the Poisson process indicate a self-inhibiting process (i.e.,
where points repel each other) [53, 54][28, pp.203 ff.].
Ripley’s K-function can be variance stabilised by taking the square root, known as Besag’s L-
function [55]. It is a global estimate that measures correlation across an entire observation window
and can be used instead of the standard K-function [28, p.402], [56].
Figure 2A-C show point patterns of mature oligodendrocytes over three different serial sections
of the same mouse hypothalamic preoptic region (the numbers −0.09 mm, 0.01 mm, 0.21 mm indi-
cate the z-axis position of the slices: one slice every 50 µm along the anterior-posterior axis [57]).
Figure 2D shows the homogeneous global L-function, which quantifies the correlation within each
slice and the respective L-function for a Poisson process (dashed line). The homogeneous L-function
indicates strong clustering, even for the 0.21 slice, however, the inhomogeneous L-function, which
makes a local adjustment for intensity, indicates a level of homogeneity very similar to a (completely
spatially random) Poisson process (Supp. Figure S1A-C vs. D-F). This is an example of intensity
vs. interaction confounding (see box Point Processes). If slice 0.21 should therefore be interpreted as
clustered, depends on the assumption of homogeneity for that slice. In addition, there are differences
between the inhomogeneity corrections in Figure S3; the corrections are discussed in more detail in
the vignette.
Local versions of the various summary functions exist. The LISA (local indicators of spatial
association) framework provides local versions for many spatial statistics (including point pattern
and lattice data analysis) [58]. For example, Figure 2E shows LISA L curves for the slice 0.01
[59, 58]. That is, for each point, we calculate a local L-function. These curves can be analysed
with functional principal component analysis (fPCA), which allows to extract the main functional
modes of variation [60]. The idea to perform fPCA on spatial statistics summaries has been used in
ecological and spatial omics data [61, 62, 63]. Figure 2F shows the scores of the first two functional
PCs. There are two main clusters, representing either the physically clustered oligodendrocytes or
those in low-density regions.
Challenges and limitations Analysing biological samples formally as point patterns comes with
several challenges and limitations. The first and most important is viewing the biological sample
as the realisation of a point process. It is generally not easy to define whether the entire pattern
7
A D
−0.09, (N = 240) Lest metric for OD Mature
600
3500
400
Slice
y
3000 −0.09
iso
0.01
0.21
200
2500
−2500
750
−3000
y
−3500 500
value
C 300
0.21, (N = 1136) 200
4000 100
0
0
F
Biplot of LISA curves
y
3000
0.4
PC2
0.0
2500
−0.4
−0.8
1600 2000 2400 2800 3200 −2 −1 0 1
x PC1
Figure 2: Panels A-C) show the distribution of mature oligodendrocytes across three different serial
slices (−0.09 mm, 0.01 mm, 0.21 mm) from the Moffitt et al. [57] dataset. Cells in B) are coloured
according to E). D) shows the L-function for the three slices. The black dashed line indicates the
homogeneous L-function for a completely random process. E) shows a LISA curve for each point
in the slice 0.01, coloured by their functional value at the radius 200 µm (indicated in black). F) is
the biplot of the first two principal components (from fPCA on the set of LISA curves from panel
E) coloured as in panel E). 8
of several cell types should be represented as a single point process (multitype analysis). Analysing
the cell types individually and considering them as separate point processes might underestimate
dependencies among the cells (multivariable analysis) [28, pp. 565 ff.]. If, as in our example in
Figure 2, the sections are far apart from each other, it can be justified to analyse the processes in
isolation as a multivariable analysis. However, if we consider cells in two adjacent slices, these have
probably been formed by one biological process and should not be analysed in isolation.
Obtaining a representative sample of a point pattern is also a challenge. In spatial omics, we
are often provided with FOVs that an experimentalist has selected from a much larger region or
that stem from the technology itself. Such a sample might have different characteristics compared
to the point pattern of the larger region, as FOVs are sometimes selected based on morphological
properties (e.g., H&E staining). In cases where we have a small world scenario, we would want to
limit our observation window to the full region where the points can occur.
Assumptions Once we have recorded the outline and arrangement of the spatial units in the
lattice, we can specify the strength of spatial relationships between each unit. Each pair of units is
assigned a weight: the stronger the connection between units, the higher the weight. The collection
of all weights between locations i and j, Wij , forms the weight matrix (see box Lattice Data). The
construction of the weight matrix is critical for all downstream analyses as it encodes the spatial
relationship between the units in the lattice [30, pp. 321ff.] and there exist various strategies:
contiguity-based (i.e., in direct contact), graph-based, distance-based, or higher-order neighbours
[65, 30, 26].
Options for analysis A common exploratory analysis for lattice data is based on the concept
of spatial autocorrelation. Spatial autocorrelation measures the degree of association of features,
e.g., the expression of a gene, that are assumed to be dependent in space. For each combination of
spatial locations, a measure of association is calculated and scaled by the weight of the connection
[64]. This approach considers both the proximity (via the weight matrix) and the characteristics of
the locations (via the metric) [30, pp. 327ff.] [26, pp. 209ff.] [66]. Over the years, various association
measures have emerged, each specific to an aspect of the data. While many allow for univariate
comparisons, in which one variable (e.g., expression of a gene) is compared over multiple locations,
others allow for the comparison of two (bivariate) or more (multivariate) features at once [26, 22].
9
For most metrics, the features are assumed to be continuous measurements. However, metrics allow
for the measurement of spatial autocorrelation of categorical variables (e.g., join count statistics).
In addition to the number of features and the type of measurement that can be compared in the
association score, there is the to consider: global spatial autocorrelation metrics estimate the average
level of spatial autocorrelation across all locations whereas local measures give a statistic at each
location. The global statistic can be seen as the weighted sum of its respective local statistics [58].
bours [67]. A positive value of Moran’s random variable along the lattice Yi = Y (Ai ) [30].
Regular and irregular lattices In a regular lat-
I indicates that locations with similar val-
tice, all spatial units have the same size, shape, and
ues are clustered while a negative value in-
the observations are placed on a regular grid. If this
dicates the clustering of dissimilar values,
is not the case, the lattice is irregular. Observations
giving a measure of spatial heterogeneity.
that follow the outline of natural objects are usually
The global value of Moran’s I is bounded irregular lattices (e.g., cells in a tissue) [30].
by −1 and 1 with an expected value un- Weight matrix The weight matrix W = wij de-
der spatial randomness near 0 for large n, fines the spatial relationships between locations i and
(E(I) = −1/(n − 1)). Figure 3A shows evi- j. The neighbourhood matrix is a special case of a
dence for spatial autocorrelation of Nrgn ex- weight matrix, where all entries are 1 (direct neigh-
pression in mouse brain tissue; the (global) bour, connection) or 0 (not a neighbour, no connec-
non-zero spatial autocorrelation). When in- sum of its respective local statistics. The concept of
measuring local associations exists both in point pat-
terpreting local autocorrelation measures, it
tern (c.f., Figure 2E) and in lattice data analysis (c.f.,
is important to consider both the effect size
Figure 3B) [58].
estimates and the significance level. Since
the significance level is calculated for each
10
spot separately, it is recommended to adjust for multiple testing (e.g., Benjamini and Hochberg
[70]). Local Moran’s I statistics reveal locations in the tissue that have similar values to their
neighbours (e.g., the upper layers of the tissue). Regions with the highest and most significant
local Moran’s I value lie in the part of the tissue where expression (amongst neighbours) is very
low. Notably, the local Moran’s I measure is both dependent on the value of the log-transformed
counts and the similarity among neighbours (c.f., Supplementary Figure S2A-B). For the analysis
of categorical data, there are metrics such as the join count statistics [25, pp. 141ff.]. In essence,
the join count statistic calculates the frequency of categories among neighbours and compares this
value with a theoretical distribution or permutations of the labels to get a significance score. This
can be used to investigate the interactions between cell types in a lattice since the distance can be
specified when constructing the neighbourhood graph.
Challenges and limitations The choice of the weight matrix poses a challenge in the analysis of
lattice data, as it might influence downstream analyses and conclusions. A common choice is to give a
non-zero weight only to direct neighbours of a spatial unit (i.e., contiguity-based). Neighbours based
on graphs or on distance [26, pp. 191 ff.] are also used; the open question is which is most suited for
spatial omics data. If one uses contiguity-based neighbours in imaging-based spatial transcriptomics,
results ultimately depend heavily on the accuracy of the cell segmentation, which is known to be
challenging [45, 46, 47, 48, 44]. If neighbours beyond the adjacent ones are used, the scale (e.g., at
which cells interact still) needs to be specified. Figure 3C-D show the difference between the local
Moran’s I calculation when based on contiguity-based neighbours in D, the 10 nearest neighbours in
E or neighbours within a 1000 pixel distance (∼ 180µm) in F. While the overall differences are small,
some cells show different local Moran’s I values. Smoothing of the local Moran’s I values occurs
when more neighbours are considered (Supplementary Figure S2C). Supplementary Figure S2A and
B further show that for some cells, no contiguous neighbours were found, which results in a zero
estimate of the local Moran’s I. Since cells do not function in isolation but form complex anatomical
structures including extracellular components, an open question is whether analyses improve when
weight matrix construction takes anatomical structures or regions into account. Overall, it remains
to be investigated how much the construction of the weight matrix influences downstream analyses
in spatial omics data.
11
A B C
Expression of Nrgn Local Moran's I Local Moran's I
(Direct neighbours) (Adjusted significance levels)
D E F
Local Moran's I Local Moran's I Local Moran's I
(Contiguos Neighbours) (10 Nearest Neighbours) (Neighbours in 1000 pixel distance)
4000
10 10 10
2000
5 5 5
0 0 0
1000
0
0 1000 2000 3000 4000
Figure 3: Panel A) shows log-transformed counts of Nrgn expression in the Visium mouse coronal
brain section data [40]. B) shows the local Moran’s I values calculated based on the values in A). C)
shows adjusted p-values corresponding to the calculations of Moran’s I in B). P-values were adjusted
using the Benjamini and Hochberg [70] method. D-F) show local Moran’s I values calculated based
on the log-transformed counts of KRT17 in a subsection of the CosMx human non-small cell lung
cancer dataset [71]. A subset of the data is shown for illustration. The weight matrix was constructed
using contiguous based neighbours in D), 10 nearest neighbours in E) and neighbours within a 1000
pixel distance (∼ 180 µm), axes labels correspond to pixels in F). Axes values correspond to pixels
with arbitrary origin.
12
more than randomly expected. In lattice data analysis, however, we are not directly interested in the
arrangement of points, but rather locations are important to define the spatial relationship between
locations (i.e., via the weight matrix). We study the interaction of features, such as gene expression,
given their spatial location. For example, if ligand-receptor dynamics are of interest, the weight
matrix can be specified in a way that captures spatial associations of gene expression between direct
neighbours or at a specified scale.
Second, the observation scale has to be chosen. Sometimes, this is a technological scale such
as multiple FOVs in imaging. Figure 4A shows a single FOV with islets from the human pancreas
acquired using IMC [72]. An analysis considering the entire FOV will tell us about the processes
that gave rise to the spatial distribution of the endocrine pancreas (islets). As we see in Figure 4B,
a Ripley’s K-function under a homogeneity assumption indicates that in this observation window,
endocrine cells are clustered within pancreatic islets. However, there might also be a relevant
biological scale where we subset our observation window to the islets themselves (Figure 4D). In
this case, Ripley’s K-function indicates a homogeneous distribution of the islets cells within the
observation window (Figure 4E). Overall, the correct scale needs to be chosen in accordance with
the research question, since the interpretation depends on it.
In point pattern analysis, the analyst needs to decide whether (and how) to control for inhomo-
geneous intensities, leading back to potential confounding of intensity and interaction [28, pp. 151
ff.]. Analysts can look at smoothed density estimates to inspect potential inhomogeneity prior to
analysing the pattern more formally. Furthermore, it depends again on the research question and
biological domain knowledge whether the assumption of homogeneity is appropriate. While tissues
such as the adipose tissue form regular structures that could be assumed to be homogeneous, other
tissues such as stratified epithelial structures appear to be more inhomogeneous. As shown in Fig-
ure 4, the homogeneous and inhomogeneous variants of the spatial summaries can lead to different
interpretations.
A fundamental choice in lattice data analysis is the construction of the weight matrix. As seen
in Figure 3A-B, there are differences in the interpretation of results between different weight matrix
choices. The appropriate weight matrix design depends on the research question. For instance, if
the feature of interest is a ligand that requires direct cell contact, using contiguous neighbours is a
reasonable choice. If the feature of interest is a molecule that diffuses to nearby cells (e.g., paracrine
signalling), a nearest neighbour or distance-based approach should be considered for constructing
the neighbourhood matrix.
Overall, we advise analysts to consider both scale and homogeneity carefully, keeping the biolog-
ical context in mind and to document the decisions made and their rationale.
Discussion
In this work, we presented multiple options for the exploratory data analysis of spatial omics data.
We introduced the foundations of both point pattern and lattice data analysis and showed the
13
A B C
Kest metric for islet Kinhom metric for islet
400 40000
1e+05
300 30000
iso
iso
200 20000
5e+04
100 10000
0 0e+00 0
0 100 200 300 400 500 0 25 50 75 100 125 0 25 50 75 100 125
r r
D E F
Kest metric for islet Kinhom metric for islet
15000 15000
400
iso
200
5000 5000
100
0 0 0
0 100 200 300 400 500 0 20 40 60 0 20 40 60
r r
Figure 4: A) Single image of an IMC dataset showing islets in the human pancreas [72]. The
dashed line indicates that the analysis window is set to be the entire FOV. B) Global analysis of
islet cells using a homogeneous K-function. C) Global analysis of islet cells using an inhomogeneous
K-function. D) Local analysis on cells belonging to an islet and window (dashed line) around it using
E) homogeneous K-function and F) inhomogeneous K-function. Coordinates in A)/D) correspond
to r values in the plots.
14
view segmentations as an irregular lattice (and aggregate expression per cell). Similarly, HTS-based
data with subcellular resolution (e.g., Visium HD [10]) can likewise be interpreted as a regular lattice
of (sub)-cellular locations or irregular lattice from segmentations [42] or the reconstructed cells can
then be approximated by their centroids as points and further analysed using point pattern analysis.
We have focused here on spatial statistics and highlighted two main streams of analysis. Another
popular way to represent spatial arrangements is spatial graphs [73, 74]. The methods described in
the section on lattice data are closely connected to spatial graphs, especially in the construction of a
weight matrix. Point pattern analysis concepts are also related to spatial graph via edge rules. For
example, Ripley’s K can be interpreted as a graph problem with a distance threshold edge rule, as
explained in [25, p. 136].
One common challenge in all spatial analyses is the question of the correct scale. In Figure 4,
we showed that the scale influences the interpretation of the spatial analysis: what is found to be
spatially heterogeneous at a one scale can be locally homogeneous at another [75, 25]. Thus, the
scale should be defined in accordance with the scientific question.
The concept of spatial autocorrelation is based on Tobler’s famous first law of geography, which
states that “everything is related to everything else, but near things are more related than distant
things.” [76]. Whether this statement is applicable to biology where we have anatomical structures
that span the body and share similarities at different places, such as the nervous system, remains
to be explored. Moreover, the definition of cells and spatial structures or domains is often not
straightforward and in turn leads to challenges that can be summarised as the modifiable areal
unit problem (MAUP) [77, 78], which states that the definition of a region will affect downstream
analyses and conclusions. For more discussion about the MAUP in spatial transcriptomics, we refer
the reader to Zormpas et al. [21].
There are various other limitations when studying spatially-resolved omics data. On the tech-
nical side, imperfect sections can lead to artifacts or non-overlapping FOVs, which can introduce
missing data. Furthermore, cells are never perfectly arranged within a histological section, thus
cells overlapping in the z-axis are to be expected. In terms of analysis, two-dimensional views ne-
glect processes that occur in three dimensions. For example, in lattice data we might neglect an
important contiguous neighbour in an adjacent slice. The same applies to point pattern analysis
where we consider only processes in two dimensions even though the underlying biological process
likely happens in three dimensions. Both lattice data and point pattern analysis extend conceptu-
ally to 3D, however with increased computational complexity. Moreover, in point pattern analysis,
we often implicitly assume that patterns are invariant to rotations (see box Point Processes). This
assumption is not anatomically accurate for organs and tissues with layered structures, such as the
brain. There are other limitations arising from the concept of spatial metrics. Lattice data analysis
methods were originally designed for rather low dimensional datasets, while spatial omics often deals
with thousands of features (e.g., full transcriptome). Similarly, point pattern methods usually only
allow for pairwise comparison of features, which can lead to a high number of cross comparisons
e.g., when there are a dozen cell types. Moreover, with newer technologies, the size of the biological
15
sample and resolution of measurements are constantly expanding, which can lead to computational
performance issues (e.g., to store cell segmentations in memory to construct weight matrices). Since
point pattern based methods only store coordinates, they usually scale better in terms of memory
requirements.
In our accompanying vignette, we discuss concepts and assumptions in more detail using biolog-
ical examples with R. The choice for R was motivated by the number of spatial statistics libraries
available. Since the R package Voyager is an extensive framework and resource for lattice data, we
build on it for the lattice data analysis part of our work. In addition, we provide vignettes for point
pattern analysis to complement the discussions in [21] and [22].
Outlook
The spatial metrics as described in point pattern and lattice data analysis offer interesting ways to
quantify spatial patterns in molecular biology. There are still open research questions including the
correct application (e.g., scale and homogeneity) of methods to a given biological context, the high
dimensional nature of spatial omics data and the effect of preprocessing steps that come upstream
of the application of spatial statistics. Furthermore, comparisons of spatial metrics across multiple
samples and conditions and parametric spatial models offer interesting options for future work.
Acknowledgments
We thank all members of the Robinsonlab for constructive feedback on the manuscript and vignette.
In particular, we thank Peiying Cai, Alice Driessen, Reto Gerber, Pierre-Luc Germain, Maruša
Kodermann, Vladyslav Korobeynyk, Siyuan Luo, Giulia Moro, Emanuel Sonder, Jiayi Wang and
David Wissel for their careful reading and input to the vignette.
Author contributions
ME: Conceptualisation, Methodology, Software (Vignette), Analysis, Visualisation, Writing - Origi-
nal Draft; SG: Conceptualisation, Methodology, Software (Vignette), Analysis, Visualisation, Writ-
ing - Original Draft; HLC: Conceptualisation, Vignette, Writing - Review & Editing; IM: Method-
16
ology, Writing - Review & Editing; RF: Methodology, Writing - Review & Editing; MDR: Con-
ceptualisation, Methodology, Writing - Original Draft, Supervision, Funding acquisition.
Funding
This work was supported by Swiss National Science Foundation (SNSF) project grant 310030 204869
to MDR. MDR acknowledges support from the University Research Priority Program Evolution in
Action at the University of Zurich. HLC acknowledges support by SNSF grant number 222136.
Conflict of interest
We declare no conflict of interest.
References
[1] Charlotte Giesen et al. “Highly Multiplexed Imaging of Tumor Tissues with Subcellular Reso-
lution by Mass Cytometry”. In: Nature Methods 11.4 (Apr. 2014), pp. 417–422. doi: 10.1038/
nmeth.2869.
[2] Leeat Keren et al. “MIBI-TOF: A Multiplexed Imaging Platform Relates Cellular Phenotypes
and Tissue Structure”. In: Science Advances 5.10 (Oct. 2019), eaax5851. doi: 10 . 1126 /
sciadv.aax5851.
[3] Gabriele Gut, Markus D. Herrmann, and Lucas Pelkmans. “Multiplexed Protein Maps Link
Subcellular Organization to Cellular States”. In: Science 361.6401 (Aug. 2018), eaar7042. doi:
10.1126/science.aar7042.
[4] Sarah Black et al. “CODEX Multiplexed Tissue Imaging with DNA-conjugated Antibodies”.
In: Nature Protocols 16.8 (Aug. 2021), pp. 3802–3835. doi: 10.1038/s41596-021-00556-8.
[5] Takahiro Tsujikawa et al. “Quantitative Multiplex Immunohistochemistry Reveals Myeloid-
Inflamed Tumor-Immune Complexity Associated with Poor Prognosis”. In: Cell Reports 19.1
(Apr. 2017), pp. 203–217. doi: 10.1016/j.celrep.2017.03.037.
[6] Pennina R. Langer-Safer, Michael Levine, and David C. Ward. “Immunological Method for
Mapping Genes on Drosophila Polytene Chromosomes.” In: Proceedings of the National Academy
of Sciences of the United States of America 79.14 (July 1982), pp. 4381–4385. doi: 10.1073/
pnas.79.14.4381.
[7] Kok Hao Chen et al. “Spatially Resolved, Highly Multiplexed RNA Profiling in Single Cells”.
In: Science 348.6233 (Apr. 2015), aaa6090. doi: 10.1126/science.aaa6090.
[8] Amanda Janesick et al. “High Resolution Mapping of the Tumor Microenvironment Using
Integrated Single-Cell, Spatial and in Situ Analysis”. In: Nature Communications 14.1 (Dec.
2023), p. 8353. doi: 10.1038/s41467-023-43458-x.
17
[9] Patrik L. Ståhl et al. “Visualization and Analysis of Gene Expression in Tissue Sections by
Spatial Transcriptomics”. In: Science 353.6294 (July 2016), pp. 78–82. doi: 10.1126/science.
aaf2403.
[10] Michelli F. Oliveira et al. Characterization of Immune Cell Populations in the Tumor Mi-
croenvironment of Colorectal Cancer Using High Definition Spatial Profiling. June 2024. doi:
10.1101/2024.06.04.597233.
[11] Samuel G. Rodriques et al. “Slide-Seq: A Scalable Technology for Measuring Genome-Wide
Expression at High Spatial Resolution”. In: Science 363.6434 (Mar. 2019), pp. 1463–1467. doi:
10.1126/science.aaw1219.
[12] Robert R. Stickels et al. “Highly Sensitive Spatial Transcriptomics at Near-Cellular Resolution
with Slide-seqV2”. In: Nature Biotechnology 39.3 (Mar. 2021), pp. 313–319. doi: 10.1038/
s41587-020-0739-1.
[13] Tian Lu, Cheen Euong Ang, and Xiaowei Zhuang. “Spatially Resolved Epigenomic Profiling of
Single Cells in Complex Tissues”. In: Cell 185.23 (Nov. 2022), 4448–4464.e17. doi: 10.1016/
j.cell.2022.09.035.
[14] Katy Vandereyken et al. “Methods and Applications for Single-Cell and Spatial Multi-Omics”.
In: Nature Reviews Genetics 24.8 (Aug. 2023), pp. 494–515. doi: 10 . 1038 / s41576 - 023 -
00580-2.
[15] Michaela Asp, Joseph Bergenstråhle, and Joakim Lundeberg. “Spatially Resolved Transcriptomes—
Next Generation Tools for Tissue Exploration”. In: BioEssays 42.10 (2020), p. 1900221. doi:
10.1002/bies.201900221.
[16] Jeffrey R. Moffitt, Emma Lundberg, and Holger Heyn. “The Emerging Landscape of Spatial
Profiling Technologies”. In: Nature Reviews Genetics (July 2022). doi: 10.1038/s41576-022-
00515-3.
[17] Anjali Rao et al. “Exploring Tissue Architecture Using Spatial Transcriptomics”. In: Nature
596.7871 (Aug. 2021), pp. 211–220. doi: 10.1038/s41586-021-03634-9.
[18] Lambda Moses and Lior Pachter. “Museum of Spatial Transcriptomics”. In: Nature Methods
19.5 (May 2022), pp. 534–546. doi: 10.1038/s41592-022-01409-2.
[19] Giovanni Palla et al. “Spatial Components of Molecular Tissue Biology”. In: Nature Biotech-
nology 40.3 (Mar. 2022), pp. 308–318. doi: 10.1038/s41587-021-01182-1.
[20] Dario Bressan, Giorgia Battistoni, and Gregory J. Hannon. “The Dawn of Spatial Omics”. In:
Science 381.6657 (Aug. 2023), eabq4964. doi: 10.1126/science.abq4964.
[21] Eleftherios Zormpas et al. “Mapping the Transcriptome: Realizing the Full Potential of Spatial
Data Analysis”. In: Cell (Dec. 2023), S0092867423012199. doi: 10.1016/j.cell.2023.11.
003.
[22] Lambda Moses et al. Voyager: Exploratory Single-Cell Genomics Data Analysis with Geospatial
Statistics. Aug. 2023. doi: 10.1101/2023.07.20.549945.
18
[23] Jiwoon Park et al. “Spatial Omics Technologies at Multimodal and Single Cell/Subcellular
Level”. In: Genome Biology 23.1 (Dec. 2022), p. 256. doi: 10.1186/s13059-022-02824-6.
[24] Anne Rademacher et al. Comparison of Spatial Transcriptomics Technologies Using Tumor
Cryosections. Apr. 2024. doi: 10.1101/2024.04.03.586404.
[25] Mark R. T. Dale and Marie-Josée Fortin. Spatial Analysis: A Guide for Ecologists. Second
Edition. Cambridge ; New York: Cambridge University Press, 2014.
[26] Edzer Pebesma and Roger Bivand. Spatial Data Science: With Applications in R. 1st ed. New
York: Chapman and Hall/CRC, May 2023. doi: 10.1201/9780429459016.
[27] Noel A. C. Cressie. Statistics for Spatial Data. Rev. ed. Wiley Series in Probability and Math-
ematical Statistics. New York, NY: Wiley, 1993.
[28] Adrian Baddeley, Ege Rubak, and Rolf Turner. Spatial Point Patterns. 1st ed. CRC Interdis-
ciplinary Statistics Series. CRC Press, Taylor & Francis Group, Dec. 2015.
[29] Luc Anselin. “What Is Special About Spatial Data? Alternative Perspectives on Spatial Data
Analysis”. In: Spring 1989 Symposium onSpatial Statistics, Past, Present and Future. 1989.
[30] Alain F. Zuur, Elena N. Ieno, and Graham M. Smith. Analysing Ecological Data. Statistics for
Biology and Health. New York: Springer, 2007.
[31] Yuzhou Feng et al. “Spatial Analysis with SPIAT and spaSim to Characterize and Simulate
Tissue Microenvironments”. In: Nature Communications 14.1 (May 2023), p. 2697. doi: 10.
1038/s41467-023-37822-0.
[32] Yue Cao et al. “scFeatures: Multi-View Representations of Single-Cell and Spatial Data for Dis-
ease Outcome Prediction”. In: Bioinformatics 38.20 (Oct. 2022). Ed. by Olga Vitek, pp. 4745–
4753. doi: 10.1093/bioinformatics/btac590.
[33] Nicolas P Canete et al. “spicyR: Spatial Analysis of in Situ Cytometry Data in R”. In: Bioin-
formatics 38.11 (May 2022), pp. 3099–3105. doi: 10.1093/bioinformatics/btac268.
[34] Ellis Patrick et al. “Spatial Analysis for Highly Multiplexed Imaging Data to Identify Tissue
Microenvironments”. In: Cytometry Part A 103.7 (May 2023), pp. 593–599. doi: 10.1002/
cyto.a.24729.
[35] Julia Wrobel et al. “Mxfda: A Comprehensive Toolkit for Functional Data Analysis of Single-
Cell Spatial Data”. In: Bioinformatics Advances 4.1 (Jan. 2024), vbae155. doi: 10 . 1093 /
bioadv/vbae155.
[36] Zhuoxuan Li et al. “SpatialDM for Rapid Identification of Spatially Co-Expressed Ligand–
Receptor and Revealing Cell–Cell Communication Patterns”. In: Nature Communications 14.1
(July 2023), p. 3995. doi: 10.1038/s41467-023-39608-w.
[37] Brendan F. Miller et al. “Characterizing Spatial Gene Expression Heterogeneity in Spatially
Resolved Single-Cell Transcriptomics Data with Nonuniform Cellular Densities”. In: Genome
Research (May 2021), gr.271288.120. doi: 10.1101/gr.271288.120.
19
[38] Giovanni Palla et al. “Squidpy: A Scalable Framework for Spatial Omics Analysis”. In: Nature
Methods 19.2 (Feb. 2022), pp. 171–178. doi: 10.1038/s41592-021-01358-2.
[39] Hailing Shi et al. “Spatial Atlas of the Mouse Central Nervous System at Molecular Resolu-
tion”. In: Nature 622.7983 (Oct. 2023), pp. 552–561. doi: 10.1038/s41586-023-06569-5.
[40] 10x Genomics. Mouse Brain Section (Coronal), Spatial Gene Expression Dataset Analyzed
Using Space Ranger 1.0.0. Dec. 2019.
[41] Vipul Singhal et al. “BANKSY Unifies Cell Typing and Tissue Domain Segmentation for
Scalable Spatial Omics Data Analysis”. In: Nature Genetics (2024). doi: 10.1038/s41588-
024-01664-3.
[42] Krzysztof Polański et al. “Bin2cell Reconstructs Cells from High Resolution Visium HD Data”.
In: Bioinformatics 40.9 (Sept. 2024), btae546. doi: 10.1093/bioinformatics/btae546.
[43] Andrea M. Femino et al. “Visualization of Single RNA Transcripts in Situ”. In: Science
280.5363 (Apr. 1998), pp. 585–590. doi: 10.1126/science.280.5363.585.
[44] Zhenzhou Wang. “Cell Segmentation for Image Cytometry: Advances, Insufficiencies, and
Challenges”. In: Cytometry Part A 95.7 (2019), pp. 708–711. doi: 10.1002/cyto.a.23686.
[45] Carsen Stringer et al. “Cellpose: A Generalist Algorithm for Cellular Segmentation”. In: Nature
Methods 18.1 (Jan. 2021), pp. 100–106. doi: 10.1038/s41592-020-01018-x.
[46] Noah F. Greenwald et al. “Whole-Cell Segmentation of Tissue Images with Human-Level Per-
formance Using Large-Scale Data Annotation and Deep Learning”. In: Nature Biotechnology
40.4 (Apr. 2022), pp. 555–565. doi: 10.1038/s41587-021-01094-0.
[47] Viktor Petukhov et al. “Cell Segmentation in Imaging-Based Spatial Transcriptomics”. In:
Nature Biotechnology 40.3 (Mar. 2022), pp. 345–354. doi: 10.1038/s41587-021-01044-w.
[48] Anwai Archit et al. Segment Anything for Microscopy. Aug. 2023. doi: 10.1101/2023.08.
21.554208.
[49] Xiaohang Fu et al. “BIDCell: Biologically-informed Self-Supervised Learning for Segmentation
of Subcellular Spatial Transcriptomics Data”. In: Nature Communications 15.1 (Jan. 2024),
p. 509. doi: 10.1038/s41467-023-44560-w.
[50] Fatemeh Sadat Mohammadi, Hasti Shabani, and Mojtaba Zarei. “Fast and Robust Feature-
Based Stitching Algorithm for Microscopic Images”. In: Scientific Reports 14.1 (June 2024),
p. 13304. doi: 10.1038/s41598-024-61970-y.
[51] Alba Garrido-Trigo et al. “Macrophage and Neutrophil Heterogeneity at Single-Cell Spatial
Resolution in Human Inflammatory Bowel Disease”. In: Nature Communications 14.1 (July
2023), p. 4506. doi: 10.1038/s41467-023-40156-6.
[52] Adrian J. Baddeley et al. “Analysis of a Three-Dimensional Point Pattern with Replication”.
In: Journal of the Royal Statistical Society. Series C (Applied Statistics) 42.4 (1993), pp. 641–
668. doi: 10.2307/2986181.
20
[53] Brian D. Ripley. “The Second-Order Analysis of Stationary Point Processes”. In: Journal of
Applied Probability 13.2 (1976), pp. 255–266. doi: 10.2307/3212829.
[54] Brian D. Ripley. “Modelling Spatial Patterns”. In: Journal of the Royal Statistical Society.
Series B (Methodological) 39.2 (1977), pp. 172–212. doi: 10 . 1111 / j . 2517 - 6161 . 1977 .
tb01615.x.
[55] Julian Besag. “Contribution to the Discussion on Dr Ripley’s Paper”. In: JR Stat Soc B 39
(1977), pp. 193–195. doi: 10.1111/j.2517-6161.1977.tb01616.x.
[56] Adrian Baddeley and Rolf Turner. “Spatstat: An R Package for Analyzing Spatial Point Pat-
terns”. In: Journal of Statistical Software 12 (Jan. 2005), pp. 1–42. doi: 10.18637/jss.v012.
i06.
[57] Jeffrey R. Moffitt et al. “Molecular, Spatial, and Functional Single-Cell Profiling of the Hy-
pothalamic Preoptic Region”. In: Science 362.6416 (Nov. 2018), eaau5324. doi: 10 . 1126 /
science.aau5324.
[58] Luc Anselin. “Local Indicators of Spatial Association—LISA”. In: Geographical Analysis 27.2
(1995), pp. 93–115. doi: 10.1111/j.1538-4632.1995.tb00338.x.
[59] Arthur Getis and Janet Franklin. “Second-Order Neighborhood Analysis of Mapped Point
Patterns”. In: Ecology 68.3 (1987), pp. 473–477. doi: 10.2307/1938452.
[60] JO Ramsay and BW Silverman. “Principal Components Analysis for Functional Data”. In:
Functional data analysis (2005), pp. 147–172. doi: 10.1007/0-387-22751-2_8.
[61] Janine Illian et al. “Principal Component Analysis for Spatial Point Processes — Assessing the
Appropriateness of the Approach in an Ecological Context”. In: Case Studies in Spatial Point
Process Modeling. Ed. by Adrian Baddeley et al. New York, NY: Springer, 2006, pp. 135–150.
doi: 10.1007/0-387-31144-0_7.
[62] Thao Vu et al. “FunSpace: A Functional and Spatial Analytic Approach to Cell Imaging Data
Using Entropy Measures”. In: PLOS Computational Biology 19.9 (Sept. 2023), e1011490. doi:
10.1371/journal.pcbi.1011490.
[63] Thao Vu et al. “SPF: A Spatial and Functional Data Analytic Approach to Cell Imaging
Data”. In: PLOS Computational Biology 18.6 (June 2022), e1009486. doi: 10.1371/journal.
pcbi.1009486.
[64] Luc Anselin. An Introduction to Spatial Data Science with GeoDa: Volume 1: Exploring Spatial
Data. CRC Press, 2024.
[65] Arthur Getis. “Spatial Weights Matrices”. In: Geographical Analysis 41.4 (2009), pp. 404–410.
doi: 10.1111/j.1538-4632.2009.00768.x.
[66] Arthur Getis and J. K. Ord. “The Analysis of Spatial Association by Use of Distance Statis-
tics”. In: Geographical Analysis 24.3 (1992), pp. 189–206. doi: 10.1111/j.1538-4632.1992.
tb00261.x.
21
[67] P. A. P. Moran. “Notes on Continuous Stochastic Phenomena”. In: Biometrika 37.1/2 (1950),
pp. 17–23. doi: 10.2307/2332142.
[68] Lukas M. Weber et al. “nnSVG for the Scalable Identification of Spatially Variable Genes
Using Nearest-Neighbor Gaussian Processes”. In: Nature Communications 14.1 (July 2023),
p. 4059. doi: 10.1038/s41467-023-39748-z.
[69] Carissa Chen, Hani Jieun Kim, and Pengyi Yang. “Evaluating Spatially Variable Gene Detec-
tion Methods for Spatial Transcriptomics Data”. In: Genome Biology 25.1 (Jan. 2024), p. 18.
doi: 10.1186/s13059-023-03145-y.
[70] Yoav Benjamini and Yosef Hochberg. “Controlling the False Discovery Rate: A Practical and
Powerful Approach to Multiple Testing”. In: Journal of the Royal Statistical Society: Series B
(Methodological) 57.1 (Jan. 1995), pp. 289–300. doi: 10.1111/j.2517-6161.1995.tb02031.x.
[71] Shanshan He et al. “High-Plex Imaging of RNA and Proteins at Subcellular Resolution in
Fixed Tissue by Spatial Molecular Imaging”. In: Nature Biotechnology 40.12 (Dec. 2022),
pp. 1794–1806. doi: 10.1038/s41587-022-01483-z.
[72] Nicolas Damond et al. “A Map of Human Type 1 Diabetes Progression by Imaging Mass
Cytometry”. In: Cell Metabolism 29.3 (Mar. 2019), 755–768.e5. doi: 10.1016/j.cmet.2018.
11.014.
[73] Mayar Ali et al. “GraphCompass: Spatial Metrics for Differential Analyses of Cell Organization
across Conditions”. In: Bioinformatics 40.Supplement 1 (July 2024), pp. i548–i557. doi: 10.
1093/bioinformatics/btae242.
[74] Jonas Windhager et al. “An End-to-End Workflow for Multiplexed Image Processing and
Analysis”. In: Nature Protocols 18.11 (Nov. 2023), pp. 3565–3613. doi: 10 . 1038 / s41596 -
023-00881-0.
[75] Monica G. Turner et al. “Effects of Changing Spatial Scale on the Analysis of Landscape
Pattern”. In: Landscape Ecology 3.3 (Dec. 1989), pp. 153–162. doi: 10.1007/BF00131534.
[76] W. R. Tobler. “A Computer Movie Simulating Urban Growth in the Detroit Region”. In:
Economic Geography 46 (1970), pp. 234–240. doi: 10.2307/143141.
[77] Stan Openshaw. The Modifiable Areal Unit Problem. Concepts and Techniques in Modern
Geography 38. Norwich: Geo, 1984.
[78] Carol A Gotway and Linda J Young. “Combining Incompatible Spatial Data”. In: Journal
of the American Statistical Association 97.458 (June 2002), pp. 632–648. doi: 10 . 1198 /
016214502760047140.
22
Supplementary Figures
A B C
Kest metric for OD Mature Lest metric for OD Mature pcf metric for OD Mature
600
1000000
750000 10
400
iso
iso
iso
500000
200 5
250000
0 0
0 100 200 300 400 0 100 200 300 400 0 100 200 300 400
r r r
D E F
Kinhom metric for OD Mature Linhom metric for OD Mature pcfinhom metric for OD Mature
6e+05 4
400
Slice
4e+05 300 3
−0.09
iso
iso
iso
200 0.01
2
2e+05
0.21
100
1
0e+00 0
0 100 200 300 400 0 100 200 300 400 0 100 200 300 400
r r r
G H
Kscaled metric for OD Mature Lscaled metric for OD Mature
30 3
20 2
iso
iso
10 1
0 0
0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.5 1.0 1.5 2.0 2.5
r r
Figure S1: Each plot shows a spatial statistic curve of the three slices (−0.09, 0.01, 0.21; indicated
with colours); data from [57]. A-C) show homogeneous Ripley’s K-, Besag’s L- and pair correlation-
function. D-F) show the same curves, this time their inhomogeneous variants. G-H) show the
locally scaled K- and L-functions.
23
A B
4 3
−log10(p.val.adj) Nrgn
−log10(p.val.adj)
3
3
locEffect
2 6
2
2
4
1
1 1 2
0
0
−1 0
0 2 4 6 8 −1 0 1 2 3 4
Nrgn locEffect
C D
20 20
(Contiguos Neighbours)
(Contiguos Neighbours)
Local Moran's I
Local Moran's I
15 15
10 10
5 5
0 0
−5 −5
−5 0 5 10 15 20 0 5 10
Local Moran's I Local Moran's I
(10 Nearest Neighbours) (Neighbours in 1000 pixel distance)
E
60000
count
40000
20000
0
0 2 4 6
KRT17
Figure S2: In A-B) each point represents a spot in the Visium mouse brain dataset [40]. Dependence
between log-transformed counts of gene Nrgn and the corresponding local Moran’s I values in A); the
corresponding local Moran’s I and adjusted p-values in B). The colours indicate the corresponding
adjusted p-value in A) and log-transformed counts in B). C-D) show the relationship of local Moran’s
I values and the log-transformed counts of gene KRT7 in the CosMx human non small cell lung
cancer dataset [71]. Local Moran’s I values calculated based on contiguous based neighbours vs.
distance based neighbours in C); and contiguous based neighbours vs. neighbours in 1000 pixel
distance in D). Red indicates cells with no contiguous neighbours. Green line indicates x = y. Blue
lines indicate local densities. Note the high density close to the origin in C) and D) resulting from
sparse expression of the gene KRT7, c.f., histogram in E).
24
A B C
Kinhom metric for islet Kinhom metric for islet
40000 40000
trans
iso
20000 20000
10000 10000
300 0 0
0 25 50 75 100 125 0 25 50 75 100 125
r r
200 D E
Kinhom metric for islet Kinhom metric for islet
90000 1e+05
bord.modif
100
border 60000
5e+04
30000
0
0 0e+00
0 100 200 300 400 500 0 25 50 75 100 125 0 25 50 75 100 125
r r
F G H
Kinhom metric for islet Kinhom metric for islet
15000 15000
trans
iso
5000 5000
300 0 0
0 20 40 60 0 20 40 60
r r
200 I J
Kinhom metric for islet Kinhom metric for islet
15000
15000
bord.modif
100
border
10000
10000
5000 5000
0
0 0
0 100 200 300 400 500 0 20 40 60 0 20 40 60
r r
Figure S3: A) Single image of an IMC dataset showing islets in the human pancreas [72] The analysis
window (dashed line) is set to correspond to the entire FOV. Global analysis using inhomogeneous
K-function with B) isotropic, C) translational, D) border and E) modified border correction. F)
Local analysis on subset of islet cells and window (dashed line) around tissue structure using in-
homogeneous K-function with G) isotropic, H) translational, I) border and J) modified border
correction.
25