0% found this document useful (0 votes)
7 views

GNN in Histopathology

This review discusses the emerging trends and future directions of Graph Neural Networks (GNNs) in histopathology, highlighting their advantages over traditional Convolutional Neural Networks (CNNs) in modeling spatial dependencies of whole slide images (WSIs). It identifies four key trends: Hierarchical GNNs, Adaptive Graph Structure Learning, Multimodal GNNs, and Higher-order GNNs, and proposes future research directions to enhance histopathological analysis. The paper serves as a comprehensive guide for researchers and practitioners looking to innovate in the field using GNN methodologies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

GNN in Histopathology

This review discusses the emerging trends and future directions of Graph Neural Networks (GNNs) in histopathology, highlighting their advantages over traditional Convolutional Neural Networks (CNNs) in modeling spatial dependencies of whole slide images (WSIs). It identifies four key trends: Hierarchical GNNs, Adaptive Graph Structure Learning, Multimodal GNNs, and Higher-order GNNs, and proposes future research directions to enhance histopathological analysis. The paper serves as a comprehensive guide for researchers and practitioners looking to innovate in the field using GNN methodologies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Medical Image Analysis 101 (2025) 103444

Contents lists available at ScienceDirect

Medical Image Analysis


journal homepage: www.elsevier.com/locate/media

Graph neural networks in histopathology: Emerging trends and future


directions
Siemen Brussee a ,∗, Giorgio Buzzanca a , Anne M.R. Schrader a , Jesper Kers a,b
a
Leiden University Medical Center, Albinusdreef 2, 2333 ZA, Leiden, The Netherlands
b
Amsterdam University Medical Center, Meibergdreef 9, 1105 AZ, Amsterdam, The Netherlands

ARTICLE INFO ABSTRACT

Dataset link: https://zenodo.org/records/1459 Histopathological analysis of whole slide images (WSIs) has seen a surge in the utilization of deep learning
9598 methods, particularly Convolutional Neural Networks (CNNs). However, CNNs often fail to capture the intricate
MSC: spatial dependencies inherent in WSIs. Graph Neural Networks (GNNs) present a promising alternative, adept at
05C62 directly modeling pairwise interactions and effectively discerning the topological tissue and cellular structures
05C90 within WSIs. Recognizing the pressing need for deep learning techniques that harness the topological structure
62M45 of WSIs, the application of GNNs in histopathology has experienced rapid growth. In this comprehensive
92C55 review, we survey GNNs in histopathology, discuss their applications, and explore emerging trends that
pave the way for future advancements in the field. We begin by elucidating the fundamentals of GNNs and
Keywords:
Graph neural networks their potential applications in histopathology. Leveraging quantitative literature analysis, we explore four
Computational pathology emerging trends: Hierarchical GNNs, Adaptive Graph Structure Learning, Multimodal GNNs, and Higher-order
Graph representation learning GNNs. Through an in-depth exploration of these trends, we offer insights into the evolving landscape of GNNs
Hierarchical graph representation learning in histopathological analysis. Based on our findings, we propose future directions to propel the field forward.
Adaptive graph structure learning Our analysis serves to guide researchers and practitioners towards innovative approaches and methodologies,
Multimodal graph representation learning fostering advancements in histopathological analysis through the lens of graph neural networks.
Higher-order graph representation learning

1. Introduction data, such as social networks, (bio)chemical molecules (Schütt et al.,


2018; Li et al., 2021), geospatial data (Cui et al., 2019; Zhu et al.,
Histopathology analysis is an important examinatory tool that can 2020), and tabular data which can be effectively modeled as a graph,
be used for disease diagnosis, estimating disease prognosis, and moni- such as in recommendation systems (Ying et al., 2018a) and drug
toring therapeutic effects. Since the digitization of whole slide images interactions (Zitnik et al., 2018). GNNs are well suited for problems
(WSIs) in the early 2000s, the computational analysis of histopathology involving pairwise interactions between entities in data. Furthermore,
images has become an increasingly important part of histopathology. the topological inductive bias that can be encoded in the graph struc-
Starting with image analysis algorithms, the field moved to a deep ture allows GNN models to learn based on the topology of the problem.
learning approach after the rise of the convolutional neural network in We can define the graph neural network as an optimizable transforma-
the 2010s, which can be largely contributed to the availability of large tion on all graph attributes that preserves graph symmetries by being
datasets (e.g. ImageNet Deng et al., 2009) and deeper convolutional
permutation invariant. Fundamental for the graph neural network is the
architectures (e.g. AlexNet Krizhevsky et al., 2012). In the last 5 years,
notion of message-passing in which we use a learned transformation that
paradigms in the field have become more heterogeneous, with the
exchanges feature information between entities in the graph, leading
advent of attention-based multiple instance learning (Ilse et al., 2018;
to topology-aware feature vectors. How the message-passing function
Sudharshan et al., 2019), vision transformers (Dosovitskiy et al., 2020;
is defined depends on the type of GNN used, of which many varieties
Wang et al., 2021a), self-supervised learning (Chen et al., 2020b; Ciga
et al., 2022) and graph neural network (Scarselli et al., 2008; Li et al., exist (e.g. GCN (Kipf and Welling, 2016), GAT (Veličković et al.,
2018) approaches. 2017), GIN (Xu et al., 2018)). In 2018, Graph Neural Networks were
The emergence of Graph Neural Networks (GNNs) (Scarselli et al., also introduced to histopathology (Li et al., 2018) and have gained
2008) has allowed effective modeling of naturally graph-structured tremendous popularity in the field.

∗ Corresponding author.
E-mail address: [email protected] (S. Brussee).

https://doi.org/10.1016/j.media.2024.103444
Received 18 June 2024; Received in revised form 18 November 2024; Accepted 17 December 2024
Available online 7 January 2025
1361-8415/© 2025 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
S. Brussee et al. Medical Image Analysis 101 (2025) 103444

Fig. 1. Overview of the covered emerging subtopics of GNNs in Histopathology, covered in this review. Hierarchical GNNs, Multimodal GNNS, Higher-order Graphs, and Adaptive
Graph Structure Learning (Mirabadi et al., 2024; Pati et al., 2020; Chen et al., 2020a; Di et al., 2022a; Zhu et al., 2021b).

While review papers on the application of graph neural networks node feature representation, ℎ, in each GNN layer. Mathematically, the
exist, they either give a general overview (Ahmedt-Aristizabal et al., node representation update is defined as follows:
2022) or focus on the clinical applications of GNNs in histopathol- ℎ𝑘+1
𝑢 = UPDATE(𝑘) (ℎ(𝑘) (𝑘) (𝑘)
𝑢 , AGGREGATE ({ℎ𝑣 , ∀𝑣 ∈ 𝑁𝑘 (𝑢)}))
ogy (Meng and Zou, 2023). Instead, we focus on identifying and quan- (𝑘) (5)
= UPDATE(𝑘) (ℎ(𝑘)
𝑢 , 𝑚𝑁(𝑢) )
tifying emerging trends in the application of GNNs in histopathology
and use them to provide future directions in the field. where UPDATE and AGGREGATION denote the functions that update
Our review is organized into three main sections: First, we intro- node representation ℎ𝑢 and aggregate the hidden representations from
duce graph theory, graph neural networks, and their applications in 𝑢’s neighborhood 𝑁𝑘 (𝑢), respectively. The exact definition of UPDATE
histopathology. Then, we identify emerging trends in the application and AGGREGATION functions is dependent on the message passing
of GNNs in histopathology and select some emerging paradigms from scheme used and is usually parameterized by two learnable weight
these trends, which are discussed in more depth (Fig. 1). Lastly, based matrices. However, all message-passing schemes employ a permutation-
on our findings, we provide future directions for the field. invariant AGGREGATION function (e.g., sum, mean). We can generally
distinguish two types of message-passing schemes: Spectral message-
passing, based on the spectral graph properties (e.g., eigenvalues) cal-
2. Graph neural networks in histopathology
culated using the graph Fourier transform, and Spatial message-passing,
which are directly applied on the connectivity structure present in
2.1. Graph neural networks the input graphs. We focus on the spectral ChebNet method and the
spatial GCN, GAT, GIN, and GraphSAGE methods, as these are applied
A graph 𝐺 is defined as a set of nodes 𝑁 connected by edges 𝐸: in the vast majority of histopathology applications using GNNs. We
𝐺 = (𝑉 , 𝐸). The set of edges is defined as a tuple of nodes: 𝐸 = first denote a tuple (𝐺, 𝐴, 𝑋), where 𝐺 denotes the input graph, 𝐴
{(𝑥, 𝑦)|𝑥, 𝑦 ∈ 𝑉 and 𝑥 ≠ 𝑦}. The connectivity of the nodes in a graph is is the associated adjacency matrix, and 𝑋 the input node feature
captured in the adjacency matrix 𝐴𝑛×𝑛 , where 𝑛 is the number of nodes matrix. During message passing, we transform the feature matrix 𝑋
in 𝐺. Each entry 𝑎𝑖𝑗 ∈ 𝐴 denotes the existence of an edge 𝑒𝑖𝑗 ∈ 𝐸 as into a hidden feature representation matrix 𝐻, typically using a learned
follows: weight matrix 𝑊 and a non-linear activation function 𝜎.
{
1, if 𝑒𝑖𝑗 ∈ 𝐸 One of the most widely adopted and earliest graph neural network
𝑎𝑖𝑗 = (1) schemes is the Graph Convolutional Network (GCN) (Kipf and Welling,
0, if 𝑒𝑖𝑗 ∉ 𝐸
2016). The message passing function uses a normalized adjacency
Alternatively, the values of 𝑎𝑖𝑗 can denote edge weights ranging from matrix to update the hidden representations of nodes based on their
0 to 1, representing the connectivity strength between nodes 𝑖 and neighborhood. To acquire the hidden representation matrix 𝐻, the
𝑗. Given an undirected graph 𝐺 = (𝑉 , 𝐸), we can define the 𝑘- message passing function in GCN layer 𝑙 is defined as follows:
neighborhood of any node 𝑣 ∈ 𝑉 , noted as 𝑁𝑘 (𝑣) recursively as 1 1
𝐻 𝑙+1 = 𝜎(𝐷̃ − 2 𝐴̃ 𝐷̃ − 2 𝐻 𝑙 𝑊 𝑙 ) (6)
follows:
𝑁0 (𝑣) = {𝑣}, (2) in which 𝐷̃ denotes the degree matrix of 𝐺 and 𝐴̃ represents the
adjacency matrix with added self-loops for each node.
𝑁1 (𝑣) = {𝑢 ∣ (𝑣, 𝑢) ∈ 𝐸 or (𝑢, 𝑣) ∈ 𝐸}, (3) The Graph Attention Network (GAT) (Veličković et al., 2017) ex-
𝑁𝑘 (𝑣) = {𝑢 ∣ ∃𝑤 ∈ 𝑁𝑘−1 (𝑣) such that (𝑤, 𝑢) ∈ 𝐸 or (𝑢, 𝑤) ∈ 𝐸}. (4) tends the graph convolutional network scheme by adding attention
weights to each edge of the graph. This essentially allows models to
In Graph Neural Networks, we aggregate feature information from learn the importance of nodes during message passing. For each edge
the 𝑘-neighborhood of each node, where 𝑘 denotes the number of hops 𝑒𝑣𝑢 connecting nodes 𝑣 and 𝑢, we first calculate an attention score:
from the target node and also directly corresponds to the number of ( [ ])
𝑒𝑣𝑢 = 𝜎 𝑎⃗𝑇 𝑊 ℎ(𝑙) (𝑙)
𝑣 ∥ 𝑊 ℎ𝑢 (7)
GNN layers used. This aggregated information is used to update the

2
S. Brussee et al. Medical Image Analysis 101 (2025) 103444

where ∥ denotes concatenation and 𝑎⃗ is a trainable, shared parameter 2.2. GNNs in histopathology
vector. Using this score, we can calculate the corresponding edge
attention weight as follows: Graphs have been used in digital pathology since the 1990s (Sharma
exp(𝑒𝑣𝑢 ) et al., 2015) and have later been combined with classical machine
𝛼𝑣𝑢 = ∑ (8) learning algorithms (e.g., SVMs) for diagnosis tasks (Bilgin et al.,
𝑢′ ∈𝑁(𝑣) exp(𝑒𝑣𝑢′ )
2007). Since then, Graph Neural Networks have been gaining popu-
We then update the hidden node representation of ℎ(𝑙) 𝑙
𝑣 ∈ 𝐻 as follows: larity throughout the 2010s to become the primary method for graph-
( )
∑ based machine learning tasks. Since the first application of GNNs
ℎ(𝑙+1)
𝑣 =𝜎 𝛼𝑣𝑢 ⋅ 𝑊 𝑙 ℎ(𝑙)
𝑢 (9) in histopathology, in 2018 (Li et al., 2018), the use of GNNs in
𝑢∈𝑁(𝑣) histopathology has grown rapidly to more than 200 publications in
2024. Applying GNN to histopathology requires some decision-making
GraphSAGE (Hamilton et al., 2017) provides a scalable and flexible and algorithmic steps (Fig. 4). First, we preprocess the WSI (e.g., quality
framework to decide how neighboring nodes should be aggregated. control, stain normalization). Now, either a cell segmentation algorithm
It differs from other message-passing schemes in that it samples 𝑆 can be applied from which a cell graph can be constructed, or one
neighbors in the neighborhood of each node, instead of using all extracts patches from which a patch graph can be constructed. Using
neighbors. Given a hidden node representation ℎ(𝑙) 𝑙
𝑣 ∈ 𝐻 , we can define the extracted image entities, a graph can be defined using a chosen
its message-passing scheme as follows: graph construction algorithm. This graph can then be used as input for a
( ( )) GNN model. The predictions given by the GNN model can be explained
ℎ(𝑙+1)
𝑣 = 𝜎 𝐖(𝑙) ⋅ AGG(𝑙) {ℎ(𝑙)
𝑢 ∶ 𝑢 ∈ 𝑣 } (10)
using various GNN explainability methods. We will further explore this
where AGG denotes an aggregation function at layer 𝑙, which can be typical workflow of GNNs in histopathology in the following sections.
any permutation invariant function (e.g., sum, mean).
2.2.1. Defining the input graph
Xu et al. introduced the Graph Isomorphism Network (GIN) (Xu
For GNNs to be applied to histopathology images, one first needs to
et al., 2018), which has an expressive message-passing scheme to
define what entities the nodes in the input graph will represent. Most
optimally differentiate between isomorphic graph structures. For any
GNNs applied to histopathology use one of 3 types of input graphs,
hidden node representation ℎ𝑙𝑣 ∈ 𝐻 𝑙 , the message-passing is defined as
as shown in Fig. 2: Cell Graphs where nodes represent cells or nuclei,
follows:
( ) segmented using a segmentation algorithm or model (for example,
(𝑙)
∑ HoVerNet). Patch Graphs, where nodes represent patches of the image,
(𝑙+1) (𝑙) (𝑙) (𝑙)
ℎ𝑣 = MLP (1 + 𝜖 ) ⋅ ℎ𝑣 + ℎ𝑢 (11)
and lastly, Tissue Graphs where nodes represent larger-scale semantic
𝑢∈ (𝑣)
entities in the image. These tissue graph entities can be acquired from
Here, the MLP denotes a multilayer perceptron which process each a semantic segmentation map, superpixels (usually generated using the
node’s aggregated feature vector. 𝜖 𝑙 is a learnable parameter that learns SLIC algorithm Achanta et al., 2012), or clustered superpixels, which
how to scale the node’s own feature vector. represent similar regions in the input image. Some alternate approaches
The spectral ChebNet (Tang et al., 2019) method uses Chebyshev also exist: Notably, approaches that treat image pixels as nodes and
polynomials to approximate spectral graph convolution. First, we de- approaches that construct a patch-based hypergraph.1
fine the graph Laplacian as follows: 𝐿 = 𝐷 − 𝐴 where 𝐷 is the degree This flexibility in defining the notion of a local neighborhood is
matrix and 𝐴 is the adjacency matrix of the graph. We then rescale what sets GNNs apart from other model architectures like CNNs and
the graph Laplacian 𝐿 using the largest eigenvector of 𝐿, 𝜆𝑚𝑎𝑥 : 𝐿̂ = Transformers, which can be viewed as specific cases of a GNN where
(2𝐿∕𝜆𝑚𝑎𝑥 ) − 𝐼. Given the approximation parameter 𝑘, we can compute the graph structure is fixed. CNNs, for example, can be viewed as a
the approximated Chebyshev polynomial 𝑍 (𝑘) as follows: special case of GNNs, where the underlying graph is a regular grid and
each node (pixel) is connected to its immediate neighbors, defined by
𝐙(1) = 𝐗
the convolutional kernel. In contrast, Transformers can be seen as a
𝐙(2) = 𝐋
̂ ⋅𝐗 (12) GNN operating on a fully connected graph, where every input token or
𝐙(𝑘) = 2 ⋅ 𝐋
̂ ⋅ 𝐙(𝑘−1) − 𝐙(𝑘−2) feature (node) is connected to all others.
Once the nodes have been established, one needs to decide how
Finally, our message-passing function to update the hidden representa- the nodes should be connected. For this, histopathology GNNs usually
tion matrix in layer 𝑙, 𝐻 𝑙 , is defined as follows: apply one of four graph construction strategies or combinations of these

𝐾
strategies (as shown in Fig. 3). First, we can use a simple distance
𝐇𝑙+1 = 𝐙(𝑘) ⋅ 𝐖(𝑙) (13) threshold, where for each node, we connect it to all other nodes having
𝑘=1
a distance (e.g. Euclidean) from the target node less than a set threshold
𝑡. Second, we can use the k-Nearest Neighbor (k-NN) algorithm. Here,
Prediction tasks using GNNs can be categorized into node-level,
we set a parameter 𝑘, which denotes how many neighbors each node
edge-level, and graph-level prediction tasks. Node-level tasks, such
will have. Then, we connect the 𝑘 closest neighbors of each node to the
as node classification, predict labels of target nodes based on the
target node. Note that for both approaches, we can base our notion of
transformed representations after message passing. Edge-level tasks
distance on spatial distance or distance between the node-associated
include edge classification, where edge labels are predicted, and link feature vectors. Third, we can construct a Region Adjacency Graph
prediction. In link prediction, the aim is to predict whether links (RAG), where we connect all entities that share a border.2 Typically,
between nodes should exist based on the node features after message this approach is used for patch or tissue graphs, where there is a clear
passing. Lastly, graph-level tasks that predict graph-level labels. These border between entities. Lastly, we can use Delaunay triangulation. Here,
tasks rely on a global pooling step, which aggregates information from we define the graph edges as all possible triangles between the nodes,
node and/or edge level into a global representation. Let us define a such that the circumcircle of each triangle does not contain other nodes
graph 𝐺 = (𝑉 , 𝐸) with an associated node feature matrix 𝑋. We can than the 3 nodes the triangle consists of.
then use any permutation-invariant function to pool the node features
into a global representation:
⨁ 1
Graph where edges can connect any number of nodes instead of the
𝑝𝑜𝑜𝑙(𝐺) = 𝑋(𝑣) (14) pairwise edges seen in regular graphs.
𝑣∈𝑉 2
In patch graphs, this is equivalent to using a 𝑘 = 4 k-NN without diagonal

where is any permutation invariant function (e.g. sum). neighbors and 𝑘 = 8 k-NN with diagonal neighbors.

3
S. Brussee et al. Medical Image Analysis 101 (2025) 103444

Fig. 4. Overview of a typical workflow of applying GNNs to histopathology whole


slide images. (A) First, preprocessing steps such as slide quality thresholds and tissue
segmentation (e.g., using Otsu thresholding) are applied. (B) Then, for patch graph
approaches, the WSI images are divided into smaller image patches. (C) When a cell
graph approach is used, nuclei segmentation algorithms are applied to acquire a mask
of the nuclei in the WSI. (D) For each acquired entity (patch, nucleus) features are
extracted, typically using a pretrained CNN model (e.g., ResNet) to acquire a feature
matrix 𝑋. (E) Using a graph construction strategy (e.g., k-NN), entities are connected
to other entities to form a cell/patch graph, 𝐺. (F) The resulting graph, along with its
associated feature matrix, can be used as input for a GNN model applying message-
passing operations to learn a representation and provide predictions. (G) (Graph)
explainability methods can be applied to the GNN model to acquire interpretable
information on the model behavior, and its predictions.

graph features (e.g. node degree) can be calculated on a node, edge, or


graph level to more directly incorporate topological information in the
model.
Fig. 2. Most widely used graph types in GNNs for histopathology. (A) Cell Graph,
(B) Patch Graph, (C) Tissue graph (based on superpixels, clustered superpixels, or a
semantic segmentation mask. The superpixel image was acquired from Bejnordi et al., 2.2.3. Graph neural network architectures
2015). Most message-passing schemes used in histopathology GNNs are
not specific to histopathology. Popular schemes used include Graph
Convolutional Networks (GCNs), Graph Attention Networks (GATs),
GraphSAGE, or GINs (Graph Isomorphism Networks). Some approaches
invented schemes specific for their problem (Gao et al., 2021; Zhang
et al., 2022; Hou et al., 2022b; Hasegawa et al., 2023; Nakhli et al.,
2023; Wang et al., 2023b) and more recently, Graph Transformers mod-
els have gained traction as a popular alternative or addition to regular
message passing. In particular, many approaches combine message-
Fig. 3. Most widely used graph construction techniques in GNNs for histopathology.
(A) Delaunay triangulation. (B) K-NN with 𝑘 = 3, (C) Distance threshold with threshold passing layers with other neural network modules, such as transform-
𝑡, (D) RAG with diagonal neighbors (𝑘 = 8). ers, LSTMs, MLPs, and MIL aggregation layers. For graph-level predic-
tion problems, global pooling layers are applied, sometimes combined
with sequentially applied local pooling layers, which hierarchically
2.2.2. Extracting features coarsen the graph.
To allow a GNN to use the image-based features present in whole
slide images, one usually extracts features associated with the entity 2.2.4. Applications
of the node and attaches that to the node as node features. Similarly, GNNs in histopathology have been applied to a wide variety of tasks,
features can also be added to the graph edges, which the GNN can primarily on supervised prediction tasks such as survival prediction,
use in the message-passing function. Backbones of pretrained3 CNN region-of-interest (ROI) classification, cancer grading, cancer subtyp-
(e.g. ResNet He et al., 2016) or Vision Transformer (Dosovitskiy et al., ing, cell classification, and the prediction of treatment response. Some
2020) models are primarily used for node feature extraction, where applications aim to predict data in other modalities, such as genetic
we use an image patch corresponding to the node entity, process it mutations or (spatial) gene expression. Although most of the use cases
using the feature extraction model, and extract the feature vectors of are classification problems, some research has used GNNs for semantic
this image in the intermediate layers of the model as node features. segmentation (Anklin et al., 2021; Zhang et al., 2021; He et al., 2023) or
Sometimes, the feature extraction model is trained in a supervised nuclei detection (Bahade et al., 2023; Wang et al., 2023c). Generative
manner on the histopathology images for the problem at hand, or approaches also exist, such as H&E image-based text generation or IHC
fine-tuned for the prediction task at hand, which allows for more staining generation. Another interesting application is Content-Based
problem-specific features. More recently, self-supervised training has Histopathological Image Retrieval (CBHIR) (Zheng et al., 2019, 2020)
been applied for feature extraction, providing model features that Here, we first use GNNs to extract and save a graph representation
generalize better across prediction tasks (Tendle and Hasan, 2021). of an ROI in a WSI. When pathologists grade new cases, we can use
Handcrafted features, based on morphology, texture, or intensity mea- these embeddings to retrieve similar ROIs, helping in the diagnostic
surements, can also be used as node features. Furthermore, (spatial) process. Most GNN applications focus on cancer as a disease, with a few
exceptions (Wojciechowska et al., 2021; Nair et al., 2022; Hasegawa
et al., 2023; Gallagher-Syed et al., 2023; Lee et al., 2023; Su et al.,
3
Usually on the ImageNet dataset (Deng et al., 2009). 2023; Acharya et al., 2024).

4
S. Brussee et al. Medical Image Analysis 101 (2025) 103444

2.2.5. Explainability inductive bias for modality fusion, which allows a precise defini-
One major advantage GNNs have over other model types in tion of the spatial context that should be modeled. This approach
histopathology is interpretability. The model output can be explained is harder with other model types, where modality fusion is either
on an entity level and visualized using a graph overlay. For example, defined on a very local (e.g., CNNs) or a very global scale
one can pool nodes in a cell graph using an attention mechanism, (e.g., Transformers). Furthermore, this approach is efficient as
calculate the attention scores for each node, assign a color based on no additional model modules are required for the injection of
the attention score per node, and then visualize the attention scores multimodal information into the GNN (Ektefaie et al., 2023; Li
on a cellular level as a graph overlay on the WSI. Many methods et al., 2022).
for GNN explainability have emerged since the inception of the GNN
(e.g., GNNExplainer (Ying et al., 2019), GCExplainer (Magister et al., 2.2.7. Challenges of applying GNNs to histopathology
2021)). There have also been efforts to develop histopathology-specific Despite the advantages discussed, one should also be aware of the
GNN explainability methods (Jaume et al., 2020; Yu et al., 2021; challenges specific to the application of GNNs in histopathology. We
Abdous et al., 2023) or to use combinations of existing GNN explain- discuss these challenges and provide suggestions on how to overcome
ability techniques to extract a clinically interpretable model output (di them.
Villaforesta et al., 2023). 1. Graph structure definition: Constructing a graph that adequately
captures the phenomenon of interest using a WSI can be non-
2.2.6. Advantages of GNNs for histopathology trivial. Since WSIs alone cannot accurately decide whether tissue
Having explained how GNNs can be used in histopathology, it is im- elements (e.g., cells) are biologically interacting, graphs are usu-
portant to point out why GNNs are used in histopathology. We highlight ally constructed somewhat arbitrarily based on spatial proximity
several important advantages of GNN-based modeling in histopathol- alone. This can be a problem, since the input graph has a large
ogy compared to other model types (e.g., CNNs, Transformers): effect on the patterns learned by the GNN and thus on the
downstream performance (Thang et al., 2022). As such, careful
1. GNNs acquire relationship-aware representations: By exchanging graph construction is vital to adequately capture the information
information between nodes on the input graph, GNNs learn required for the downstream task.
context-aware representations. This is important in pathology, 2. Computational Complexity: When constructing graphs based on
where the importance of certain biological entities often depends nuclei, the number of nodes in the graph can grow very large,
on the cellular or regional context (Santoiemma and Powell, limiting the number of graphs that can fit into GPU memory
2015). during training. This can lead to slow model training and po-
2. GNNs can capture the topological information in the WSI : Graphs tentially introduce numerous nodes without predictive value to
are a natural way to capture topology. In histopathology, factors the model. To mitigate this, it is often helpful to select a subset
such as cellular density can be important in diagnosis, which of nodes from the graph (e.g., using random or node-label-based
can be captured through topological information in the graph sampling) or only define a graph at a specified region of interest
structure (Ali et al., 2013; Reynolds et al., 2014). in the WSI. Another solution would be to use scalable sampling-
based GNN methods (e.g., GraphSAGE) (Hamilton et al., 2017),
3. GNNs can model the entire WSI at once: Due to the sheer size of
removing the need to load the entire graph into memory at once.
whole slide images, traditional deep learning methods usually
split the WSI into patches, which are pooled to obtain a global
3. Dependence on segmentation output : When defining an input
slide representation. GNNs can instead model the entire WSI as
graph, one often relies on the output of a segmentation model
a single graph, effectively capturing the global structure of the to define the graph entities (e.g., cell segmentation, semantic
WSI without having to reconstruct this from a combination of segmentation). Consequently, the quality of the graph will be
patches (Adnan et al., 2020). largely dependent on the quality of the provided segmentation.
4. GNNs allow for hierarchical modeling : In pathology image analy- Therefore, it is vital to assess whether the segmentation quality is
sis, diagnosis often relies on information acquired from multiple sufficient for the downstream task. For example, missing a subset
spatial scales of the WSI (e.g., global patterns combined with of nuclei can be detrimental in tasks involving inflammatory
atypical cell structures). GNNs allow modeling both of these patterns, where a small subset of cells can characterize the
scales in a single model, either by connecting graphs on differ- state of the tissue. In contrast, in most cancerous tissues, this
ent scales or by learning the global structure through pooling problem is less severe, as missing cells can be compensated for
operations (Pati et al., 2020; Zheng et al., 2019). by an overwhelming number of cancerous cells detected during
5. GNNs allow for entity-wise interpretability: Whereas CNN-based segmentation.
methods usually rely on pixel-level explainability, GNNs al- 4. Overreliance on nuclei-based features: Since cell graphs are tra-
low for entity-wise explainability. This allows pathologists to ditionally based on nuclei segmentations, one should carefully
investigate the dependence of the model prediction on cer- consider how to integrate information on the surrounding cy-
tain biological entities, such as cells or substructures, in the toplasm and non-cellular features in the cellular neighborhood.
WSI (Sureka et al., 2020). One way to accomplish this is by extracting features from a patch
6. GNNs allow for injecting task-specific inductive biases: The input centered on the nucleus, taking into account the surrounding
area. Recent efforts propose another solution by segmenting the
graph structure can be modified based on prior information
membrane as well as the nucleus (Gu et al., 2024), allowing
about the task at hand. This, in turn, allows for more specific
cytoplasm-specific features to be measured.
explainability and efficient modeling of the problem (Hasegawa
5. Obtaining accurate fine-grained labels: For tasks relying on fine-
et al., 2023)
grained labels (e.g., cell classification using GNN), many an-
7. GNNs enable the efficient fusion and joint modeling of spatially-
notations are required per WSI, which is very labor-intensive
coordinated multimodal data: In cases where multimodal informa-
if manually labeled. Automatic cell classification pipelines mit-
tion is spatially aligned (e.g., IHC staining w.r.t. H&E staining), igate this problem (Graham et al., 2019), but can often be
GNNs can combine the information from the same location as inaccurate, leading to incorrect node labels, especially in tissue
a single feature vector attached to a graph entity (e.g., node, types not seen in the training data. Similarly to the point above,
edge). The multimodal features are then jointly modeled in the we suggest carefully inspecting the quality of the labels obtained
message-passing layers, allowing for spatially-aware fusion of and assessing whether it is adequate for the downstream task.
the modality information. The graph structure provides a spatial

5
S. Brussee et al. Medical Image Analysis 101 (2025) 103444

Fig. 5. Cumulative and relative cumulative frequency of publications on GNNs applied to histopathology. The left four panels show the usage of the emerging trends we explore
further in this review. The right four panels depict more general characteristics of the publications, such as graph type and application.

3. Methodology

Using Google Scholar, we identified 204 papers applying GNNs to


histopathology, dating from September 2018 up to November 2024. We
included all papers applied on H&E stained whole slide images or tissue
microarrays (TMAs) where GNNs (i.e. message passing) were part of
the methodology. The papers were categorized based on the following
properties:

• Message-passing scheme
• Type(s) of input graph
• Application(s)
• Tissue type(s)
• Uses hierarchy
• Uses adaptive graph structure learning
• Uses multimodality
• Uses higher-order graphs

We quantified the frequencies in each of these properties to identify


emerging trends in the literature (Fig. 5).
From our quantification, we identified upcoming trends and selected Fig. 6. (A) Pre-established hierarchy, where different graphs are constructed at different
four to explore more deeply: semantic- or magnification levels, which are connected hierarchically (e.g. using an
assignment matrix) (Pati et al., 2020; Mirabadi et al., 2024) (B) Learned Hierarchy,
1. Hierarchical GNNs where trainable local pooling operations sequentially coarsen the graph structure (Ying
et al., 2018a).
2. Adaptive Graph Structure Learning
3. Multimodal GNNs
4. Higher-order graphs

6
S. Brussee et al. Medical Image Analysis 101 (2025) 103444

Self-Attention Graph Pooling (SAGPool) (Lee et al., 2019) uses the


The remaining trends we identified are discussed in a more concise
self-attention mechanism to learn which nodes are important and to
format in the Other trends section.
discard unimportant ones. First, we calculate the self-attention score
using a graph convolution operation:
4. Emerging trends 1 1
𝐻 𝑙+1 = 𝜎(𝐷̃ − 2 𝐴̃ 𝐷̃ − 2 𝐻 𝑙 𝑊 𝑙 ) (20)
4.1. Hierarchical GNNs Here, 𝑊 𝑙 is a learned weight matrix that we use to calculate the
attention score. For each node 𝑣 ∈ 𝑉 , we calculate:
Diagnostic- and prognostic information present on WSIs often exists
on multiple levels of coarsity. For example, the cellular microenvi- 𝛼𝑖𝑙 = 𝑠𝑜𝑓 𝑡𝑚𝑎𝑥(𝑊 𝑙 ℎ̇ 𝑙𝑖 ) (21)
ronment can be an important diagnostic factor but can depend on where ℎ𝑖 is the feature embedding of 𝑣𝑖 . SAGPool then ranks the nodes
where this microenvironment is globally located in the tissue. Cellular on their attention scores and selects the top 𝑘 nodes to retain. These
graphs are suitable for capturing the microenvironment but can miss 𝑘 nodes are used to mask the adjacency matrix to obtain 𝐻𝑚𝑎𝑠𝑘 , which
the global tissue information present in the WSI. Similarly, patch- or gets multiplied with the original adjacency matrix to coarsen the graph:
tissue-based graphs can capture global information in the WSI, but 𝐴𝑙+1 = 𝐴 ⊙ 𝐻𝑚𝑎𝑠𝑘 .
miss the topological information present in the cellular structures (Pati Lastly, MinCutPool (Bianchi et al., 2020) uses the mincut partition
et al., 2022). To connect information on different levels of coarsity, objective function to decide the assignment matrix 𝑆. Similarly to the
we can either apply local pooling layers which learn a hierarchical DiffPool method, we first generate a GNN-based node feature matrix
representation of the input graph in an end-to-end manner, which we 𝐻 𝑙+1 :
denote as Learned Hierarchy, or, we can define the hierarchy between
graphs prior to model training: which we denote as Pre-established 𝐻 𝑙+1 = 𝐺𝑁 𝑁(𝐻 𝑙 , 𝐴𝑙 , 𝑊𝐺𝑙 𝑁 𝑁 ) (22)
hierarchy. Both are illustrated in Fig. 6. where 𝑊𝐺𝑙 𝑁 𝑁
is the learned weight matrix of the GNN. Using the
In a learned hierarchy, we apply local pooling layers that iteratively updated representation, we can use a multilayer perceptron (MLP) to
coarsen the graph structure hierarchically. Let us define our input graph calculate the node assignment matrix 𝑆:
with associated node features as 𝐺0 = (𝑉0 , 𝐸0 , 𝑋0 ). Assuming that we
have 𝑘 local pooling layers in our GNN architecture, we sequentially 𝑆 = 𝑀 𝐿𝑃 (𝐻 𝑙+1 , 𝑊𝑀
𝑙
𝐿𝑃 ) (23)
coarsen our input graph to 𝐺1, 𝐺2 , … , 𝐺𝑘 where 𝐺𝑘 is the final pooled where 𝑙
𝑊𝑀 are the learned weights of the MLP. Both 𝑊𝐺𝑙 𝑁 𝑁
and
𝐿𝑃
graph representation. Mathematically, we define a local pooling to 𝑊𝑀 𝑙 are trained by minimizing two loss terms 𝐿 , denoting the cut
𝐿𝑃 𝑐
coarsen the graph 𝐺𝑖 to 𝐺𝑖+1 as follows: loss term, and 𝐿𝑜 , denoting the orthogonality loss term. The cut loss
𝐺𝑖+1 = pool𝑖 (𝐺𝑖 ), ∀𝑖 ∈ {0, 1, 2, … , 𝑘 − 1} (15) term approximates the Mincut objective, by aiming to minimize the
number of edges between clusters while maximizing the edges within
where 𝑝𝑜𝑜𝑙𝑖 is defined by any permutation-invariant pooling function. clusters. The orthogonality loss term encourages orthogonal cluster
Prominently used examples include DiffPool (Ying et al., 2018a), SAG- assignments and similarly sized clusters. Together, these loss functions
Pool (Lee et al., 2019), and MinCutPool (Bianchi et al., 2020). Apart form the objective loss 𝐿𝑢 :
from local pooling, we also classify methods that learn the hierarchy ̃
Tr(𝑆 ⊤ 𝐴𝑆) Tr(𝑆 ⊤ 𝑆 − 𝐼 𝐾)
using a cross-hierarchical transformer (Hou et al., 2022a; Shi et al., 𝐿𝑢 = 𝐿𝑐 + 𝐿𝑜 = −
̃
+ √ (24)
Tr(𝑆 ⊤ 𝐷𝑆) 𝐾
2023a; Azadi et al., 2023) layer as learned hierarchy methods.
Learned hierarchy methods learn a node assignment matrix 𝑆 (𝑙) ̃ 𝐼
where 𝐷 is the degree matrix of the normalized adjacency matrix 𝐴,
which denotes the changes in the graph structure after applying the is the identity matrix and 𝐾 is the number of desired clusters.
pooling operation. Often, multiple local pooling layers are applied The pooling operation is performed as follows:
subsequently to coarsen the graph. One sets a pooling ratio hyperpa- 𝐴𝑙+1 = 𝑆 𝑇 𝐴𝑆
̃
rameter, denoted as 𝑘, which determines how many nodes should be 𝑙+1 𝑇
(25)
𝐻 =𝑆 𝐻
present after the pooling operation. For a pooling layer, 𝑙, the pooling
operation updates the adjacency matrix of the input graph, 𝐴, and
its corresponding node attributes 𝑋. The hidden representations are 4.1.1. Learned hierarchy
denoted 𝐻, where 𝑋 = 𝐻 0 . We denote the pooling operation as: As Table 1 shows, the vast majority of GNN applications in
(𝐴𝑙+1 , 𝐻 𝑙+1 ) = POOL(𝐴(𝑙) , 𝐻 𝑙 ) (16) histopathology use existing local pooling functions such as in the exam-
ples above. In this section, we give some examples of newly designed
The exact pooling operation is dependent on the pooling function used. learned hierarchy methods, specifically for problems in histopathology.
DiffPool (Ying et al., 2018b) applies GNNs to learn a differentiable
cluster assignment matrix which maps nodes to clusters, which are then Local Pooling: Hou et al. proposed Iterative Hierarchical Pooling (IH-
used as the new nodes after pooling. DiffPool uses two GNNs: One Pool), which they combined with a pre-established hierarchy (Hou
for obtaining node embeddings, 𝐺𝑁 𝑁𝑙,𝑒𝑚𝑏𝑒𝑑 , and one for assigning the et al., 2022b). As input, the authors used a pyramidal heterogeneous
nodes to cluster nodes: 𝐺𝑁 𝑁𝑙,𝑝𝑜𝑜𝑙 . In each DiffPool layer 𝑙, we use the patch graph, with one graph existing on 10x resolution, one on 5x
embedding GNN for extracting a feature matrix 𝑍: resolution, and one on thumbnail resolution. IHPool was designed to
filter redundant information for the downstream prediction task while
𝑍 𝑙 = 𝐺𝑁 𝑁𝑙,𝑒𝑚𝑏𝑒𝑑 (𝐴𝑙 , 𝐻 𝑙 ) (17) retaining this pyramidal structure when applying the pooling operation.
The method achieves this by conditioning the set of nodes to be pooled
Then, we calculate the assignment matrix using the pooling GNN:
on each resolution level on the pooling outcome of the lower-resolution
𝑆 𝑙 = 𝑠𝑜𝑓 𝑡𝑚𝑎𝑥(𝐺𝑁 𝑁𝑙,𝑝𝑜𝑜𝑙 (𝐴𝑙 , 𝐻 𝑙 )) (18) nodes. Let 𝑋 be a matrix of node features, 𝐴 be the adjacency matrix
of the input graph, 𝑘 be the ratio of nodes to retain after pooling and
Now, we update both the hidden node representations and a new
𝑃 be a learnable projection layer. Now, let us denote the input graph
adjacency matrix:
𝑇 𝐺 = (𝑉 , 𝐸 , 𝑅) where 𝑅 represents the set of different resolutions in the
𝐻 𝑙+1 = 𝑆 𝑙 𝑍 𝑙 graph. For each resolution 𝑟 ∈ 𝑅, nodes are pooled hierarchically, such
𝑇
(19)
𝐴𝑙+1 = 𝑆 𝑙 𝐴𝑙 𝑆 𝑙 that nodes in higher magnification levels are subordinate to nodes in
lower levels. Thumbnail nodes are individually assigned, while other

7
S. Brussee et al. Medical Image Analysis 101 (2025) 103444

Table 1
Publications applying GNNs to histopathology which used learned hierarchies.
Publication Date Application Learned hierarchy method
Zheng et al. (2019) 2019/10 CBHIR DiffPool
Zhou et al. (2019) 2019/10 Cancer grading DiffPool
Sureka et al. (2020) 2020/10 Binary classification DiffPool
Zheng et al. (2020) 2020/12 CBHIR DiffPool
Chen et al. (2020a) 2020/09 Survival prediction, Cancer grading SAGPool
Jiang et al. (2021) 2021/01 Cancer grading DiffPool
Zheng et al. (2021) 2021/04 CBHIR DiffPool
Wang et al. (2021b) 2021/09 Survival prediction SAGpool
Xiang and Wu (2021) 2021/10 Binary classification DiffPool
Xie et al. (2022a) 2022/01 Treatment response prediction TopKPooling
Dwivedi et al. (2022) 2022/04 Cancer grading SAGPool
Hou et al. (2022b) 2022/06 Binary classification IHPool
Bai et al. (2022) 2022/08 Cancer subtyping MinCutPool
Zuo et al. (2022) 2022/09 Survival prediction SAGPool
Hou et al. (2022a) 2022/09 Cancer subtyping Hierarchical attention mechanism
Lim and Jung (2022) 2022/10 Survival prediction SAGPool
Wang et al. (2023a) 2023/02 Cancer subtyping Scattering Cell Pooling
Zhao et al. (2023) 2023/02 Cancer subtyping, Cancer grading GCMinCut
Ding et al. (2023b) 2023/02 Cancer subtyping, Cancer grading Fractal paths
Ding et al. (2023a) 2023/04 Survival prediction SAGPool
Li et al. (2023) 2023/09 Node classification Graph V-Net
Gallagher-Syed et al. (2023) 2023/09 Rheuma subtyping SAGPool
Shi et al. (2023a) 2023/09 Cancer subtyping, mutation prediction Hierarchical attention mechanism
Wu et al. (2023) 2023/10 Survival prediction SAGPool
Nakhli et al. (2023) 2023/10 Survival prediction SAGPool
Azadi et al. (2023) 2023/10 Survival prediction MinCutPool, Hierarchical attention mechanism
Hou et al. (2023) 2023/10 Survival prediction Matrix multiplication
Abbas et al. (2023) 2023/12 Cancer grading DiffPool
Xu et al. (2023) 2023/12 Cancer subtyping DiffPool
Azher et al. (2023) 2024/01 Cancer grading, Survival prediction SAGPool
Yang et al. (2024) 2024/03 Binary classification, Survival prediction MinCutPool
Breen et al. (2024) 2024/07 Cancer subtyping SAGPool
Gindra et al. (2024) 2024/07 Cancer subtyping, binary classification Multihead cross-attention pooling

nodes are pooled iteratively. For all nodes, a fitness score is calculated pooling weight matrix and 𝜎 denotes a nonlinear activation function
and nodes are assigned to clusters based on spatial distance and fitness (e.g. ReLU).
difference between nodes. Specifically, for each node 𝑛 ∈ 𝑁 on
Attention-based Interaction Modeling: Azadi et al. proposed 2
resolution 𝑟, we use a learnable projection matrix 𝑃 to calculate the
attention-based methods for exchanging information between different
fitness set as follows:
( 𝑟 ) levels of graph coarsity (Azadi et al., 2023). The authors used a local
𝑉𝑛 ⋅ 𝑃
𝜙𝑟𝑛 = t anh (26) graph, where nodes represent patches in the WSI, and a global graph,
‖𝑃 ‖
where nodes represent MinCutPool-based clusters of nodes in the local
where 𝑉𝑛𝑟 is the set of nodes to be pooled, based on the hierarchical graph. Now, attention scores are calculated for each node in the local-
edges between resolutions. Based on the calculated node assignments, and global graph. The first method the authors proposed for exchanging
we create a new node feature matrix 𝑋 ′ . The adjacency matrix 𝐴′ is information between the local- and global graph was Mixed Co-Attention
updated to maintain graph connectivity based on the node assignments. (MCA), in which the information is not mixed directly, but weight
Wang et al. proposed a new module for pooling information from sharing is applied between parallel processing of the local- and global
cell graphs to use as embeddings for clusters of cells, called cell commu- nodes. The second method, Mixed Guided Attention, we expanded MCA
nity forests (Wang et al., 2023a). Their method incorporates two parallel by directly infusing the calculated local node feature representation
pooling operations: a max pool and an average pool, whose results are into the attention score calculation of the global nodes. The authors
concatenated and processed using a linear layer. The authors applied found that the mixed co-attention strategy worked optimally for their
DBSCAN clustering to cell embeddings where they applied different use case.
density values 𝑑 to acquire a vector of densities 𝑑1 , … , 𝑑𝑁 with which Starting with a 20x resolution patch graph for each WSI, Gindra
they acquired different clusters 𝑗 ∈ 𝐽 . Mathematically, given a cell et al. first process each nodes using multiple stacked GIN-layers (Gindra
embedding in cluster 𝑗 with clustering density 𝑑𝑖 , 𝐶𝑖,𝑗 𝑥 , 𝑥 ∈ 𝑆 , where
𝑖,𝑗 et al., 2024). The authors then learned node cluster assignments using
𝑆𝑖,𝑗 denotes the set of cells with density 𝑑𝑖 in cluster 𝑗, we calculate the multihead cross-attention pooling. Each cluster was used as input to
pooling outcome 𝑃𝑖𝑗 for cluster 𝑗 with density 𝑑𝑖 as follows: a transformer block whose output was processed using a multilayer
𝑃𝑖𝑗 = 𝑊 0 × ([Max(𝐶𝑖,𝑗
𝑥 𝑥
) ∥ Mean(𝐶𝑖,𝑗 )]) + 𝑏0 (27) perceptron to obtain a slide-level prediction.

where 𝑊 and 0 𝑏0
are learnable parameters. Alternative Approaches: Ding et al. did not learn a hierarchical
Zhao et al. proposed an extension of the popular MinCutPool by representation using pooling layers, instead using a FractalNet architec-
adding a message-passing layer in the pooling equation (Zhao et al., ture (Ding et al., 2023b). Here, the input graph is given to separate pro-
2023). For acquiring the cluster assignment matrix 𝑆, where each node cessing paths which consist of different numbers of GNN layers, thereby
𝑠 ∈ 𝑆 will be a single node in the coarsened graph, the authors used representing different semantic levels in the tissue. The hierarchy be-
the following equation: tween the paths is encoded using a combination of a gated bimodal unit
̂ 𝑊𝑝𝑜𝑜𝑙 )) and an MLP mixer architecture. The former calculates a weighted com-
𝑆 = 𝐻(𝜎(𝐴𝐻 (28)
bination of representations, while the latter enhances communication
where 𝐴̂ is the LaPlacian-normalized adjacency matrix, 𝐻 denotes the between the path representations and strengthens connections among
hidden representation matrix of the nodes, 𝑊𝑝𝑜𝑜𝑙 denotes a learnable different path features.

8
S. Brussee et al. Medical Image Analysis 101 (2025) 103444

Li et al. propose a hierarchical Graph V-Net to encode hierarchy in They generated features for the cell graph by using a pretrained ResNet
a patch graph input (Li et al., 2023). First, attention-based message- on a patch around the nucleus centroid while generating tissue graph
passing is used to exchange information between adjacent patches. presentations by averaging ResNet embeddings from all crops belong-
Then, the authors used a graph coarsening operation where the node ing to a superpixel. The hierarchical information flow is modeled using
features are arranged as a 2D grid based on the spatial location of the a Transformer block that calculates the cross-attention between the
patches. This grid is then evenly divided into submatrices and each graphs at different levels.
submatrix is projected to a single feature vector using a learnable layer, Shi et al. used graphs at 4 different levels of hierarchy: A tissue
which will act as a node after the coarsening operation. Notably, the graph on 5x resolution, consisting of superpixels constructed using the
Graph V-Net also uses graph upsampling layers, which add nodes until SLIC algorithm, and 3 patch graphs at 5x, 10x, and 20x resolution,
the size of the input graph has been restored, similar to what is done respectively (Shi et al., 2023b). The 5x resolution patch graph is used
in UNet-architectures. to generate features for the tissue graph. Then, after applying message-
passing to the 10x- and 20x patch graphs, the interaction between the
4.1.2. Pre-established hierarchy different hierarchical levels is modeled using a hierarchical attention
In pre-established hierarchy, we encode the hierarchy prior to module. This module produces a tissue graph where the interactions are
model training. For example, we can construct multiple graphs at dif- captured in the node features. Message-passing layers, global attention
ferent levels of coarsity in the WSI, and connect them using assignment layers, and a fully connected layer are applied subsequently to the
matrices, denoting how nodes are connected between the hierarchical tissue graph to come to a final prediction.
levels. Often, during message-passing, the learned representations of Gupta et al. modeled a tissue graph and a cell graph together
the lower hierarchy level are aggregated and used as input for the corre- as a heterogeneous graph with cellular nodes, tissue nodes, cell–cell
sponding nodes at the higher hierarchy level. We differentiate between edges, tissue-tissue edges, and cell-tissue edges: 𝐻 = {𝐶 , 𝑇 , 𝐸𝑐 𝑒𝑙𝑙→𝑐 𝑒𝑙𝑙 ,
approaches connecting graphs on different semantic levels (e.g. cells 𝐸𝑡𝑖𝑠𝑠𝑢𝑒→𝑡𝑖𝑠𝑠𝑢𝑒 , 𝐸𝑐 𝑒𝑙𝑙→𝑡𝑖𝑠𝑠𝑢𝑒 } (Gupta et al., 2023). After applying message-
and tissues), and approaches connecting different magnifications of the passing layers, they calculated the cross-attention between the cellu-
WSI (e.g. 40x, 20x). An overview of publications using this approach lar and tissue nodes using the transformer architecture to model the
is given in Table 2. hierarchical relationships.
Semantic Hierarchies: Pati et al. were the first to introduce a pre- Abbas et al. established four separate hierarchical levels, where one
established hierarchy in the graph to use as input for a GNN model (Pati level is a global image analyzed using a CNN model and the other
et al., 2020). They constructed a cell graph, 𝐶 𝐺, using a nuclei seg- levels are cell graphs constructed at different levels (global, spanning
mentation map and a tissue graph, 𝑇 𝐺, constructed by clustering the entire WSI (𝐺(0) ), 512 × 512px (𝐺(1) ), 256 × 256px (𝐺(2) )) (Abbas
superpixels based on similarity. To introduce the hierarchy, they in- et al., 2023). For each level, a subset of the segmented cells is randomly
troduced an assignment matrix 𝑆𝐶 𝐺→𝑇 𝐺 , such that 𝑆𝐶 𝐺→𝑇 𝐺 (𝑖, 𝑗) = 1 if a selected to build a cell graph. After applying message-passing layers
cellular node 𝑖 from the cell graph belongs to tissue node 𝑗 in the tissue on each level separately, the outputs are combined and processed
graph. using a fully connected layer. The combined representation and the
Wang et al. introduced hierarchy by applying separate message- representations gathered at each cell graph level separately are com-
passing operations on both a cell graph and a patch graph (Wang et al., bined using an entropy weighting strategy, which weights the different
2021b). For each patch, they used the pooled cell graph representation representations based on the uncertainty of the model prediction given
on that patch as the node feature in the corresponding patch in the that representation.
patch graph. The authors combined learned hierarchy learning with Multiresolution Hierarchies: Xing et al. constructed hierarchical
pre-established hierarchy by also applying self-attention graph pooling patch graphs at several levels of image resolution, thus aggregating in-
on both the cell- as well as the patch graph. formation from multiple resolution levels. Starting with a single patch,
Sims et al. connected a cell graph with a level-1 and level-2 they subsampled the same patch at increasingly lower resolution and
patch graph, which represent patches of increasing size (400 μm, connected the lower-resolution patches to the corresponding higher-
800 μm) (Sims et al., 2022). They define their message passing for any resolution patch it was sampled from. This input graph was then used
cellular node 𝑖 as 𝐶 𝐺𝑖 ⟶ 𝐿1𝑖 ⟶ 𝐿2𝑖 ⟶ 𝐿1𝑖 ⟶ 𝐶 𝐺𝑖 , where each for a GNN model (Xing et al., 2021).
⟶ defines a message-passing function, 𝐶 𝐺𝑖 represents the node in the Bazargani et al. introduced hierarchy into their approach by con-
cell graph and 𝐿1𝑖 , 𝐿2𝑖 represent the node corresponding to the level- structing separate patch graphs on 5x, 10x and 20x resolution and then
1 patch and the level-2 patch on which this cell exists. By applying performing message-passing operations both on each graph separately
message-passing in this way, the model can exchange information as well as between graphs with different resolutions (Bazargani et al.,
between distant cells without using many message-passing layers, as 2022).
the cellular nodes belonging to the same layer-2 node can be 800 μm Bontempo et al. used a knowledge distillation approach combined
away. with two patch graphs at different resolutions (high, low) (Bontempo
Guan et al. proposed a Node-aligned hierarchical graph-to-local clus- et al., 2023). They performed message-passing both hierarchically be-
tering approach, inspired by the Bag-Of-Visual-Words (BOVW) method- tween high and low resolution and in each resolution graph itself.
ology in NLP (Guan et al., 2022). Starting with a set of WSIs, the They treated the high-resolution graph as a ‘teacher’ and the low-
authors first clustered the patches for each WSI, using a codebook with resolution graph as a ‘student’ network, between which they optimize
patches from all other WSIs in the dataset. Then, a local clustering ap- the KL-divergence for the bag-level predictions at each resolution.
proach is used that samples global clusters from 𝐵 to divide the global Mirabadi et al. proposed modeling the pyramidal multi-
clusters into local subclusters 𝐾𝐺 , such that 𝐵 = 𝑠𝑢𝑏1 , 𝑠𝑢𝑏2 , … , 𝑠𝑢𝑏𝐾𝐺 . magnification structure in whole slide images as a multiresolution
From each subcluster, they use K-means clustering to divide the sub- graph, where information on both the inner-magnification and the
cluster 𝑠𝑢𝑏𝑘 into bins 𝑆𝑘 and randomly select one patch in each bin. intra-magnification levels could be modeled (Mirabadi et al., 2024).
Now, for each WSI, we have a patch for each bin 𝑆𝑘 ∈ 𝐾𝐺 in each They extracted patches from three magnification levels (20x, 10x and
subcluster 𝐾𝐺 ∈ 𝐵. The patches in each subcluster are connected using 5x), such that the patches on the higher resolutions are spatially
inner-sub-bag edges, and the subclusters themselves are connected equivalent to center crops of the patches at the lower resolutions.
using outer-sub-bag edges. This graph structure allows hierarchical A RAG-graph was constructed such that nodes on each level were
information flow during message-passing in the GNN model. connected to both their adjacent neighbors on the same resolution as
Hou et al. proposed constructing a cell graph along with superpixel- well as the spatially corresponding lower- and higher-level patch nodes.
based tissue graphs at two levels (𝐶 𝐺, 𝑇 𝐺𝑙1 , 𝑇 𝐺𝑙2 ) (Hou et al., 2022a). This allowed information to be exchanged between resolutions during

9
S. Brussee et al. Medical Image Analysis 101 (2025) 103444

Table 2
Publications applying GNNs to histopathology which used a pre-established hierarchy. All hierarchies are shown small to large, such that when
𝑋 → 𝑌 , entities in 𝑋 are subordinate to the entities in 𝑌 . CG: Cell Graph, PG: Patch Graph, TG: Tissue Graph, PHG: Patch Hypergraph.
Publication Date Application Hierarchy
Pati et al. (2020) 2020/07 ROI classification 𝐶𝐺 → 𝑇 𝐺
Xing et al. (2021) 2021/08 Cancer subtyping 𝑃 𝐺40𝑥 → 𝑃 𝐺10𝑥 → 𝑃 𝐺5𝑥
Wang et al. (2021b) 2021/09 Survival prediction 𝐶𝐺 → 𝑃 𝐺
Sims et al. (2022) 2022/01 ROI classification 𝐶 𝐺 → 𝑃 𝐺1 → 𝑃 𝐺2
Hou et al. (2022b) 2022/06 Binary classification 𝑃 𝐺10𝑥 → 𝑃 𝐺5𝑥 → 𝑃 𝐺𝑡ℎ𝑢𝑚𝑏𝑛𝑎𝑖𝑙
Guan et al. (2022) 2022/06 Cancer subtyping 𝑆𝑘 → 𝐾𝐺 → 𝐵
Hou et al. (2022a) 2022/09 Cancer subtyping 𝐶 𝐺 → 𝑇 𝐺𝑙1 → 𝑇 𝐺𝑙2
Shi et al. (2023b) 2023/01 Cancer grading 𝑃 𝐺20𝑥 → 𝑃 𝐺10𝑥 → 𝑇 𝐺5𝑥
Wang et al. (2023a) 2023/02 Cancer subtyping 𝐶𝐺 → 𝐶𝐶𝐹 𝐺
Gupta et al. (2023) 2023/07 Cancer subtyping, binary classification 𝐶𝐺 → 𝑇 𝐺
Bazargani et al. (2022) 2023/08 Cancer subtyping 𝑃 𝐺20𝑥 → 𝑃 𝐺10𝑥 → 𝑃 𝐺5𝑥
Bontempo et al. (2023) 2023/10 Binary classification 𝑃 𝐺ℎ𝑖𝑔ℎ → 𝑃 𝐺𝑙𝑜𝑤
Abbas et al. (2023) 2023/12 Cancer grading 𝐶 𝐺256𝑝𝑥 → 𝐶 𝐺512𝑝𝑥 → 𝐶 𝐺𝑔𝑙𝑜𝑏𝑎𝑙 → 𝑊 𝑆 𝐼𝑡ℎ𝑢𝑚𝑏𝑛𝑎𝑖𝑙
Mirabadi et al. (2024) 2024/02 Cancer subtyping 𝑃 𝐺20𝑥 → 𝑃 𝐺10𝑥 → 𝑃 𝐺5𝑥
Godson et al. (2023) 2024/03 Binary classification 𝑃 𝐺40𝑥 → 𝑃 𝐺20𝑥 → 𝑃 𝐺10𝑥
Ibañez et al. (2024) 2024/03 Cancer grading 𝑃 𝐺1𝑥 → 𝑃 𝐺5𝑥 → 𝑃 𝐺10𝑥 → 𝑃 𝐺20𝑥
Liang et al. (2024b) 2024/04 Binary classification 𝑃 𝐻 𝐺40𝑥 → 𝑃 𝐺20𝑥
Han et al. (2024) 2024/04 Survival prediction 𝑃 𝐺20𝑥 → 𝑃 𝐻 𝐺10𝑥
Paul et al. (2024) 2024/05 Cancer grading 𝐶 𝐺, 𝑃 𝐺
Cai et al. (2024) 2024/05 Survival prediction 𝑃𝐺 → 𝑇𝐺
Breen et al. (2024) 2024/06 Cancer subtyping 𝑃 𝐺10𝑥 → 𝑃 𝐺5𝑥

message passing. After message passing, a mean pooling operation was 4.2. Adaptive graph structure learning
applied on each resolution level, resulting in a 3 node graph. This three-
node graph embedding is then used for the downstream classification Most GNN applications in histopathology use a fixed input graph
task. with fixed edge connectivity. While successful results have been
Godson et al. hierarchically connected RAG patch graphs defined achieved using this approach, we argue that it is suboptimal. Whether
on 40x, 20x and 10x resolution (Godson et al., 2023). Nodes in each connections between nodes should exist is not clearly defined in the
resolution were assigned a resolution-specific one-hot encoded vec- histopathology image, leading to the wide range of different approaches
tors to uniquely barcode the nodes for each resolution. Similarly, for constructing the input graphs, as previously discussed. These ap-
all edges were assigned unique barcode vectors, depending on their proaches are usually not based on biological or medical information
resolution and whether the edges were hierarchical connections or and thus introduce inductive bias which might not reflect the biol-
intra-resolution connections. GATv2 layers were used to process the ogy in the tissue. To counteract this problem, one can either adjust
resulting hierarchical graph, where information was exchanged both the message-passing equation such that some edges are given more
between resolutions as well as inside each resolution graph.
representative power than others (e.g. using GAT (Veličković et al.,
Ibañez et al. constructed patch graphs across 4 resolution lev-
2017)), or one can learn the graph structure during model training. The
els: 1x, 5x, 10x and 20x which were hierarchically connected with
second approach, Adaptive Graph Structure Learning (AGSL), has gained
edges (Ibañez et al., 2024). During message-passing, information is
more popularity recently (Table 3). In GNNs for histopathology, the
exchanged between all resolution levels and through spatially adjacent
strategies used for AGSL can be subdivided into three main paradigms:
patches. The authors introduced an average scaled influence score to
Learned transformation based AGSL, CNN-filter based AGSL and Patch-
allow comparative heatmaps across resolutions. They compute this core
by calculating the median attention scores per resolution and then selection based AGSL. The first is based on a learned transformation
normalizing these across all WSIs. that updates the adjacency matrix based on the model gradients. In
Paul et al. used a cell-level graph for each patch and constructed a the second approach, the graph is constructed using the learned filters
patch-level graph, but did not directly connect the two graphs using inherent to convolutional neural networks, which are updated based
hierarchical edges (Paul et al., 2024). Instead, they use handcrafted on the model gradients. Lastly, some approaches using patch graphs
global graph features gained from the cell-graph on each patch as learn to select a discriminative subset of patches from each slide during
the basis for the feature embedding for each patch. The patch-level training, thereby also changing the nodes that will be present in the
graph was then constructed based on the cosine similarity between graph structure.
these features, effectively conditioning the patch graph structure on the Learned Transformation: In 2020, Adnan et al. introduced adaptive
cellular structure seen in each patch.
graph learning for the classification of lung cancer subtypes (Adnan
et al., 2020). The authors modeled the whole slide image as a fully
4.1.3. Discussion and future prospects
connected graph of representative patches. Then, they used a pre-
We observed that hierarchical graph neural networks are an in-
trained DenseNet for feature extraction. The graph connectivity is
creasingly popular modeling technique for histopathology WSIs, due
to the information in whole slide images existing on different levels of learned end-to-end using both global WSI context and local pairwise
coarsity. One future approach will be to learn the necessary level of context between patches. Let us denote WSI 𝑊 with patches 𝑤1 , … , 𝑤𝑛 ,
coarsening to establish an effective hierarchical structure end-to-end, where for each patch 𝑤𝑖 we have a feature vector 𝑥𝑖 . The authors first
which is currently controlled using a pooling ratio hyperparameter. pooled the patch representations into a global context vector 𝑐 using a
We argue that different levels of graph coarsity might be optimal for pooling function 𝜙 (e.g. sum):
different problems, as some problems in histopathology rely more on 𝑐 = 𝜙(𝑥1 , 𝑥2 , … , 𝑥𝑛 ) (29)
cell-level information, while others on larger tissue structures. Lastly,
in most current approaches, message-passing occurs on each level The global vector 𝑐 is concatenated to each patch feature vector 𝑥𝑖
of hierarchy separately, not directly between hierarchies. We argue and is jointly processed by MLP layers which gives a feature vector
that the field could move to message-passing schemes that are more 𝑥∗𝑖 that contains both local and global context information. Finally, the
effective at taking into account the hierarchical graph structure (Zhong matrix 𝑋 ∗ , which holds all feature vectors 𝑥∗𝑖 , is processed using a cross-
et al., 2023). correlation layer that determines the connectivity of the output graph

10
S. Brussee et al. Medical Image Analysis 101 (2025) 103444

Table 3
Publications applying GNNs in histopathology and using adaptive graph structure learning strategies.
Publication Date Application Adaptive learning mechanism
Adnan et al. (2020) 2020/05 Binary classification Learned transformation
Gao et al. (2022) 2022/02 Cancer subtyping CNN-filter based
Hou et al. (2022a) 2022/09 Cancer subtyping Learned transformation
Behzadi et al. (2024) 2022/12 Cancer grading Learned transformation
Ding et al. (2023b) 2023/02 Cancer subtyping, Cancer grading CNN-filter based
Liu et al. (2023) 2023/04 Survival prediction Patch selection
Li et al. (2024b) 2024/06 Cancer subtyping, Cancer grading Learned transformation
Shu et al. (2024) 2024/07 Binary classification Learned transformation
Kim et al. (2024) 2024/07 Binary classification Patch selection
Liu et al. (2024) 2024/08 Patch classification CNN-filter based
Lu et al. (2024) 2024/10 Survival prediction Patch selection

in 𝐴, where each element 𝑎𝑖𝑗 ∈ 𝐴 represents the correlation between In addition to a more traditional MIL-classification branch which uses
patches 𝑤𝑖 and 𝑤𝑗 , which are used as edge weights in the learned graph these slide embeddings, the embeddings were also combined into a
structure. slide-graph. This graph was processed with GCN layers and treated as
Hou et al. described a spatial-hierarchical GNN framework that a node classification task to obtain slide-level predictions. Knowledge
could dynamically learn the graph structure during model training distillation was applied between the MIL classifier and GNN-classifier
(Hou et al., 2022a). Their Dynamic Structure Learning module first to regularize the classification. Since the construction of the slide-level
embeds the representation of both node features 𝑉 and centroid co- graph is conducted using K-NN on a learned projection of the slide
ordinates 𝑃 together into a single representing 𝐽 , using the following embeddings, the slide-level graph structure changes during training.
equation: Patch-selection: Kim et al. constructed a patch graph based on learned
𝐽 = 𝐶 𝑜𝑛𝑐 𝑎𝑡[𝜎(𝑃 𝑇 𝑊1 ), 𝜎(𝑉 𝑇 𝑊2 )] (30) patch clusters from which representative patches were selected us-
ing Gumbel softmax (Kim et al., 2024). The similarity between these
where 𝑊1 and 𝑊2 are learned weight matrices and 𝜎 denotes a non- patches was captured in a similarity matrix. The top 𝑘 most similar
linear activation function. Next, the authors applied a distance- patches were connected to form the patch graph. Allowing the input
thresholded k-NN algorithm on the acquired embedding 𝐽 to determine patches and associated embeddings to change during training leads to
the edge connectivity. Given a set of nodes 𝑉 , set of edges 𝐸, distance a learned graph structure.
threshold 𝑑min and the number of neighbors 𝑘, we use the following Behzadi et al. propose selecting a discriminative subset of patches
equation to determine the edges in 𝐸: to form a concise patch graph for each WSI (Behzadi et al., 2024). They
𝑒𝑢𝑣 ∈ 𝐸 ⟺ {𝑢, 𝑣 ∈ 𝑉 ∣ ‖𝑣 − 𝑢‖2 ≤ min(𝑑𝑘 , 𝑑min )} (31) use a patch image autoencoder to obtain latent features for each WSI
patch. These latent features are processed using a multi-head attention
Here, 𝑑𝑘 denotes the distance between nodes 𝑢 and the 𝑘-closest neigh- mechanism to obtain attention scores per patch, which are used as the
bor. The weight matrices are trained using the overall loss function in selection criterion for inclusion in the patch graph. The selected patches
the GNN framework, which allows the graph connectivity to adapt to are connected using the K-NN algorithm for subsequent GNN-based
the learning task at hand. processing.
Liu et al. propose learning the graph structure based on the cosine
CNN-filter Based: Gao et al. and Ding et al. both use a very different
similarity between the transformed patch feature vectors (Liu et al.,
approach, where the learned feature maps generated by a CNN are used
2023). Given an input feature matrix 𝑋 and a transformation matrix
as basis for the graph construction (Gao et al., 2022; Ding et al., 2023b).
𝑇 , we create a projected matrix 𝑃 = 𝑋 𝑇 . They then calculate the
More specifically, they treat the pixels in each feature map as nodes,
cosine similarity between each pair of patches in 𝑃 , which are saved
in which the features are spatially concatenated across channels into
as a symmetric adjacency matrix 𝐴𝐿 , which holds the ‘edge strength’
a node feature vector. Then the k-nn algorithm is used to connect the
between any two patches in 𝑃 . The edge strength is then thresholded
nodes. By basing the graph structure on learned CNN feature maps, the
using a set threshold 𝜖:
graph structure is learned by training the CNN. Since each pixel in the
𝑃 [𝑢] ⋅ 𝑃 [𝑣]
𝑒𝑢𝑣 ∈ 𝐸 ⟺ {𝑢, 𝑣 ∈ 𝑉 ∣ ≤ 𝜖} (32) feature maps corresponds to a patch in the input WSI, the constructed
‖𝑃 [𝑢]‖ ⋅ ‖𝑃 [𝑣]‖
graph can capture spatial dependencies between regions in the WSI.
where 𝑃 [𝑢] and 𝑃 [𝑣] denote the projected feature vectors of nodes 𝑢 Given the acquired node embedding matrix 𝑋 ∈ 𝐑𝑁×𝐶 where 𝑁 is
and 𝑣, respectively. Note that the transformation matrices are learned, the number of nodes and 𝐶 the amount of feature map channels, we
which allows the graph structure to be adapted during model training. determine the existence of edges as follows:
Li et al. introduced tail and head embeddings calculated as two sepa-
rate linear projections of feature vectors from each WSI patch (Li et al., 𝑒𝑢𝑣 ∈ 𝐸 ⟺ {𝑢, 𝑣 ∈ 𝑉 ∣ ‖𝑢𝑓 − 𝑣𝑓 ‖2 ≤ 𝑑𝑘 } (33)
2024b). The head projection is trained to capture the correlation be- where 𝑢𝑓 , 𝑣𝑓 are the feature vectors of node 𝑢 and 𝑣, and 𝑑𝑘 is the
tween other patches and itself, while the tail captures the contribution distance between node 𝑢 and the 𝑘-closest neighbor of 𝑢.
of the patch to other patches. The authors then calculate the similarity Liu et al. expanded on this idea of using convolutional feature
between each head and tail embeddings to decide which patches should maps as the basis for graph construction by hierarchically applying
be connected. Note that these connections are represented as directed convolutional layers, constructing graphs based on the learned feature
edges from the patch of the head embedding to the patch of the tail maps, processing this graph using a message passing layer and finally
embedding, showing adequate similarity. Additionally, each directed applying a transformer layer to model long-range dependencies (Liu
edge is assigned an edge embedding defined as a weighted sum of the et al., 2024). This results in a set of feature vectors which can be used
head and tail embeddings. The learned directed graph is then processed as input for the convolutional layer in the next hierarchical layer. This
using a knowledge-aware attention mechanism that takes into account process is repeated 4 times before using several linear layers to obtain
the head, tail and edge embeddings. the final prediction probabilities. The new graph constructed at every
Shu et al. implemented a learnable slide-level graph (Shu et al., hierarchical layer utilizes dilated K-NN, which skips every 𝑑th nearest
2024). By pooling patch embeddings from a pretrained encoder model, neighbor, effectively exponentially increasing each node’s receptive
they generated a slide-level embedding for each WSI in their dataset. field.

11
S. Brussee et al. Medical Image Analysis 101 (2025) 103444

Table 4
Applications of Multimodal GNNs in histopathology. CD: Clinical Data, CNV: Copy Number Variation, TC: Trichrome, MP: MultiPhoton
microscopy, TPEF: two-photon excited fluorescence microscopy, MRI: Magnetic Resonance Imaging, SHG: Second-Harmonic Generation
microscopy, ST: Spatial Transcriptomics, IHC: Immunohistochemistry, GE: Gene Expression, miRSeq: microRNA sequencing, DNA-Met: DNA
methylation.
Publication Date Application Fusion Modalities
Chen et al. (2020a) 2020/09 Survival prediction, Cancer subtyping Late H&E WSI, GE, CNV
Dwivedi et al. (2022) 2022/04 Cancer grading Late H&E WSI, TC WSI
Qiu et al. (2022) 2022/07 Survival prediction Early H&E WSI, MP, TPEF
Zuo et al. (2022) 2022/09 Survival prediction Late H&E WSI, GE
De et al. (2022) 2022/10 Cancer subtyping None H&E WSI, MRI
Li et al. (2022) 2022/11 Cancer subtyping Early H&E WSI, SHG
Xie et al. (2022b) 2022/12 Survival prediction Late H&E WSI, GE
Fatemi et al. (2023) 2023/03 ST-prediction None H&E WSI, ST
Jiang et al. (2023) 2023/03 Mutation prediction Late H&E WSI, CD
Gao et al. (2023) 2023/07 ST-prediction, survival prediction None H&E WSI, ST
Gallagher-Syed et al. (2023) 2023/09 Rheumatoid Subtyping Early H&E WSI, IHC WSI
Pati et al. (2023) 2023/12 Survival prediction, Cancer grading Both H&E, virtual IHC
Azher et al. (2023) 2024/01 Survival prediction, Cancer grading Early H&E WSI, ST
Zheng et al. (2023) 2024/01 Survival prediction Late H&E WSI, GE
Zhang et al. (2024) 2024/04 Survival prediction Late H&E WSI, RNA-seq
Chi et al. (2024) 2024/06 ST prediction None H&E WSI, GE, ST
Palmal et al. (2024) 2024/09 Binary classification Early H&E WSI, GE, miRSeq, DNA-Met, CNV, CD
Li et al. (2024a) 2024/10 ST-prediction None H&E WSI

4.3. Multimodal GNNs

In histopathology diagnostics, different modalities are often com-


bined to assist in clinical decision-making and prognostic predictions.
While most applications of GNNs in histopathology focus solely on H&E
image data, approaches considering multiple modalities have gained
popularity recently. Combining data from multiple modalities helps
increase model accuracy and generalization. Graph Neural Networks
are especially suitable for multimodal integration, since data from dif-
ferent modalities can be easily combined in the node- and edge feature
vectors (Ding et al., 2022b), which can then be jointly modeled with
a controllable spatial inductive bias. Multiple approaches combined
IHC-stained biopsy images with H&E stained biopsy images, while
other approaches incorporated spatial transcriptomics or genetic data
in the model input. We differentiate between Stain multimodality, where
the same whole slide images with different stainings (e.g. IHC) are
combined, and Full multimodality, where the modalities are not based on
whole slide images, (e.g. CT-scans, gene expression data). An overview
of multimodal GNNs in histopathology is given in Table 4.
An important challenge in multimodal integration in Deep Learning
models is how- and where in the model architecture data from different
modalities should be combined, which we call fusion. In a GNN context,
we broadly differentiate between early fusion, where data from differ-
Fig. 7. Early fusion (A) versus late fusion (B). In early fusion, information from
different modalities is typically integrated in the node features before message passing,
ent modalities are combined before message passing and thus jointly
enabling joint spatial modeling of modalities. In late fusion, meanwhile, modalities are modeled in the GNN and late fusion, where data is combined after the
separately processed and combined before the final model layers which calculate the message passing steps (Fig. 7).
model prediction. FCN: Fully Connected Layer. We broadly categorize multimodal GNNs into four groups: Pathomic
fusion-based: This uses the pathomic fusion strategy popularized by
Chen et al. (2020a), Early fusion, Late fusion and Modality prediction,
accompassing models that predict one modality using another. Models
that do not directly fuse modalities but use predictions from one modal-
4.2.1. Discussion and future prospects
ity to drive how the other modalities are processed are considered
Outside of histopathology, most adaptive graph structure learning late-fusion models.
assumes graph homophily (Zhu et al., 2021a) where similar nodes are
likely to be colocated. This is not always the case in histopathology, 4.3.1. Full multimodality
as some structures might be composed of different cell types that can Pathomic Fusion: Chen et al. integrated whole slide image informa-
vary widely in morphology. Furthermore, most applications focus on tion together with RNA-Seq count-data and copy number variant (CNV)
homogeneous graphs, where a single type of node and edge exists. Work information (Chen et al., 2020a). They used this combined information
by Zhao et al. (2021) showed that we can learn a heterogeneous graph for cancer subtyping and survival analysis in TCGA data sets for clear
optimized for downstream tasks, which is suitable for graphs showing cell renal cell carcinoma and glioma. Their multimodal model fused
heterophily, which is often the case in histopathology. Therefore, we information from 3 different modules: A CNN-based image module,
argue that heterogeneous graph learning will be a useful approach for a GNN-based cell graph module, and a genomic module, which took
histopathology if we model the whole slide images as a heterogeneous CNV and RNA-seq information as input. Each of these modalities was
graph. first processed individually before fusing the information. Their ap-
proach for multimodal fusion, which they call Pathomic fusion, models

12
S. Brussee et al. Medical Image Analysis 101 (2025) 103444

interactions between modalities via the Kronecker product of attention- De et al. combined MRI- and H&E stained whole slide images of
gated representation. The attention gating is applied to the hidden brain tumors to predict the type of brain cancer (De et al., 2022). The
representation of modality 𝑚, ℎ𝑚 , by learning a transformation 𝑊𝑖𝑔𝑛→𝑚 modalities were not directly fused; instead, the authors first used a 3D-
which assigns an importance score for each modality, which we denote CNN model to detect whether the cancer was one of the possible cancer
as 𝑧𝑚 : types (Glioblastoma). If this was the case, the model simply outputs
ℎ𝑚,gated = 𝑧𝑚 ∗ ℎ𝑚 , ∀𝑚 ∈ {𝑖, 𝑔 , 𝑛} glioblastoma as its prediction. When this was not the case, a patch
graph was constructed from the H&E image, which was used as input
where, ℎ𝑚 = ReLU(𝑊𝑚 ⋅ ℎ𝑚 ) (34)
for a GNN model. Finally, this GNN model could predict one of the
𝑧𝑚 = 𝜎(𝑊ign→𝑚 ⋅ [ℎ𝑖 , ℎ𝑔 , ℎ𝑛 ]) remaining subtypes (Normal, Astrocytoma, or Oligodendroglioma).
Xie et al. combined gene expression with H&E whole slide image
where ℎ𝑖 , ℎ𝑔 , and ℎ𝑛 , are the gated representation vectors of the image
data for survival prediction in gastric cancer (Xie et al., 2022b). Here,
module, graph module, and genomic module, respectively. The authors
the authors first processed the WSI data and gene expression data
calculated the Kronecker product of these vectors to get a combined
separately using MLP layers. Then the interaction between each WSI
representation ℎ𝑓 𝑢𝑠𝑖𝑜𝑛 :
( ) ( ) ( ) patch and each gene feature vector was calculated using a cross-modal
ℎ𝑖 ℎ𝑔 ℎ𝑛 attention layer. After this processing, the data from both modalities
ℎ𝑓 𝑢𝑠𝑖𝑜𝑛 = ⊗ ⊗ (35)
1 1 1 was aggregated using a MIL-aggregation module and finally fused using
where ⊗ denotes the outer product. The result, ℎ𝑓 𝑢𝑠𝑖𝑜𝑛 is a three- concatenation. The fused embeddings were used to construct a patient
dimensional tensor that can then be connected to a fully connected graph, based on the similarity of the fused embeddings between the
layer for classification tasks or survival prediction. patients. A GNN was used to process this graph, which produced a
survival prediction.
Jiang et al. predicted EFGR-gene mutations in lung cancer (Jiang
Zheng et al. fused gene expression signatures with a WSI patch
et al., 2023) by augmenting the approach used by Chen et al. (2020a).
graph using their Genomic Attention Module approach (Zheng et al.,
The author’s approach differs from that of Chen et al. by not using
2023). After message-passing on the patch graph, the pairwise inter-
genomic data but instead using clinical information (e.g., gender, age)
actions between each patch and each individual gene signature are
as the third modality, next to a spatial cell graph and whole slide
modeled using a self-attention mechanism. This allows the model to
image. Compared with a previous model from the same group (Xiao
learn the interactions between spatial tissue regions and gene signa-
et al., 2022), which used a cell graph and image module but no clinical
tures, which allowed the authors to visualize which gene signatures
features, the authors found considerable performance increases for the were associated with certain regions in the WSI.
multimodal model. Zhang et al. fused pathway-enriched RNASeq-data with H&E stained
Early Fusion: Azher et al. integrated spatial transcriptomics data WSI images by constructing modality-specific graphs for each (Zhang
with accompanying H&E WSI data to predict survival and cancer et al., 2024). The RNASeq-data was used to train the model to construct
grade in colorectal cancer (Azher et al., 2023). The authors first con- a graph where each node represents a significantly enriched pathway
structed an embedding model that used an ImageNet-pretrained CNN in the gene expression data, where edges are defined based on the
to encode H&E patches and fully connected layers to encode spatial correlation of biological function and the overlap in associated genes.
gene expression data at the same location. They then optimized a This graph is then used in a graph autoencoder model, which recon-
structs the adjacency matrix based on its latent representation. The
projection layer to merge the data from these modalities into a single
WSI-graph consisted of nodes representing 20x resolution patches along
vector using a combination of unimodal and cross-modal loss functions.
with pretrained ResNet50 features, connected using a RAG-strategy.
This effectively trained the model to encode a cross-modal embed-
The pooling of these patch features to a WSI-level embedding was regu-
ding vector. The acquired embeddings were used as node vectors in
larized by a gene set variation analysis obtained from the RNA-Seq data.
a GNN model for downstream tasks. The authors showed that the
Further interaction between the RNA-Seq and WSI-data was modeled
use of expression-aware embeddings improved model performance on
by aligning linear projections of the WSI-graph and the pathway-node
all tasks, indicating that pretraining using coupled H&E WSIs and graph and by applying cross-modality attention blocks. Subsequently,
spatial transcriptomics datasets can help retrieve more discriminative the outputs were pooled using global attention pooling and mapped to
embeddings for downstream tasks. a risk value for survival prediction.
Palmal et al. integrated data from six different modalities: WSI Elforaici et al. integrated clinical data along with GNN-processed
patches, miRSeq, DNA methylation, copy number variation (CNV), cell graphs defined on WSIs in a support vector machine (SVM) clas-
mRNASeq and clinical data to predict short- or long-term survival sification model (Elforaici et al., 2024). They used their modeling
using the TCGA-BRCA dataset (Palmal et al., 2024). Each modality was approach for accurate tumor response grading in colorectal liver metas-
represented as a graph, which was processed using several GCN layers. tases.
The learned embeddings are enhanced using contrastive learning, with
Modality Prediction: Fatemi et al. integrated spatial transcriptomic
the objective of aligning the embeddings of the nodes of neighboring
data with co-localized H&E whole slide imaging data to characterize
nodes while separating those between nodes further apart. The learned
spatial tumor heterogeneity in colorectal cancer (Fatemi et al., 2023).
node embeddings are utilized in cross-attention blocks that model the
They achieved this by training a model to predict spatial gene expres-
interaction between the modality-specific graphs. Finally, the outputs
sion from the H&E whole slide image. The authors tried to predict
of the cross-attention blocks are concatenated and used as input in a spatial gene expression using both a CNN- and a GNN-based network
random forest model to provide the model prediction. and showed that, for this task, the CNN-based methods performed
Late Fusion: Zuo et al. integrated H&E stained whole slide im- better.
ages with genomic biomarker information (Zuo et al., 2022). Specifi- Gao et al. predicted spatial transcriptomic data using H&E images
cally, they constructed a graph of Tumor Infiltrating Lymphocyte (TIL) by integrating image and cell graph data using CNN- and GNN-based
patches with Tumor patches and analyzed this graph using a GNN. Ge- models (Gao et al., 2023). The authors showed that integrating the
nomic data consisted of mRNA gene counts, which were transformed to graph and image-based information together did significantly improve
a gene co-expression module matrix using the lmQCM algorithm. They over using either one alone.
then applied a concrete autoencoder model to the co-expression matrix Chi et al. utilized the deep graph infomax (Veličković et al., 2018)
to identify survival-associated features. The GNN- and autoencoder GNN-based contrastive learning framework to align spatial transcrip-
outputs were then fused using a self-attention layer. tomics data with WSI patches and raw gene expression data (Chi et al.,

13
S. Brussee et al. Medical Image Analysis 101 (2025) 103444

2024). During inference using a single WSI patch, the top 𝑘 most similar 4.3.3. Discussion and future prospects
gene expressions were retrieved from a database of gene expression In this section, we highlighted the use of graph-based modeling in
embeddings to predict spatial gene expression. multimodal approaches, which have been successfully applied both for
integration of multimodal data to improve prediction accuracy and for
predicting a translation from one modality to another, particularly in
4.3.2. Stain multimodality the context of spatial transcriptomics data. We argue that, in cases
Early fusion: Li et al. fused information from Second-Harmonic where the modalities are not spatially aligned, graphs themselves could
Generation (SHG) microscopy images and H&E whole slide images be utilized more for the multimodal integration itself. For example, sev-
together to differentiate between pancreatic ductal adenocarcinoma eral researchers have used the concept of a Patient graph, where nodes
and chronic pancreatitis in pancreatic cancer (Li et al., 2022). The represent (aggregated) data points from different medical modalities
images from both modalities were registered. The features from each corresponding to the same patient (Kim, 2023) or multiple patients
modality were combined into node features for the input graph, where (Gao et al., 2020; Ochoa and Mustafa, 2022). Some approaches use
nodes represented registered patches in both modalities. An ImageNet- graphs to model time series data, where, for example, medical infor-
pretrained ResNet model was used to retrieve features from the H&E mation on the same patient gathered at different time points can be
patches, while collagen fiber-specific handcrafted features were ex- effectively utilized (Rocheteau et al., 2021; Daneshvar and Samavi,
tracted for each SHG-patch. A H&E-SHG graph was constructed where 2022). Zheng et al. proposed a framework in which adaptive graph
the node vectors contained the concatenation of the patch features from structure learning and GNNs are combined to integrate data from
both modalities. This graph was used in a GNN model which predicted different medical modalities for disease prediction (Zheng et al., 2022).
between the two classes. One major problem in the application of multimodal approaches in
histopathology is that, often, not every modality is available for each
Gallagher-Syed et al. integrated data from IHC- (CD138, CD68,
patient. This effectively creates a missing modality problem. Ma et al.
CD20) and H&E stained synovial biopsy samples to predict a Rheuma-
proposed a Bayesian meta-learning framework which mitigates this
toid Arthritis subtype using a GNN model (Gallagher-Syed et al., 2023).
problem, allowing effective multimodal learning and prediction even
Information between the staining modalities was exchanged by model-
when numerous modalities are missing in the data (Ma et al., 2021). We
ing each patch, from each staining, as a node and connecting the nodes
argue that these approaches should be combined to effectively model
based on their feature similarity to get a single multistain graph. The the relationships between modalities, based on the task at hand, even
authors showed that the features across stains were similar enough to in settings where modalities are missing.
cause nodes from different staining to mix in the graph and, thus, en-
able information exchange between the modalities in message passing 4.4. Higher-order graphs
layers of the GNN. The authors used the multimodal graph as input for
a GNN model whose output was used to predict the rheuma subtype. While graphs have shown to be adequate formats for the rep-
resentation of histopathology slides, it is limited by the fact only
Late fusion: Dwivedi et al. combined trichrome- (TC) and H&E stain-
pairwise relations can be modeled. Furthermore, the entities in the
ings of liver biopsies to predict liver fibrosis (Dwivedi et al., 2022). The
graphs can solely be modeled as nodes and edges. This limitation
authors experimented with different mid- and late fusion techniques.
has inspired extensions to the graph modeling framework, which are
Their experiments showed that their late concatenation or addition
suited to model higher-order graphs, which move beyond pairwise node-
and the pathomic fusion strategy proposed by Chen et al. (2020a) edge interactions. Examples of higher-order graphs are hypergraphs,
performed the best for fibrosis prediction. In the late and pathomic cellular complexes, and combinatorial complexes. To allow learning
fusion strategies, they separately processed both the H&E and TC tissues from these higher-order graph structures, message-passing frameworks
as graphs using a GNN and then fused the features from both modalities called topological neural networks (TNNs) have been developed (Papillon
together. et al., 2023).
Qiu et al. combined information from H&E stainings, multipho- In histopathology, TNNs have not yet been widely adopted, but
ton microscopy (MP), and two-photon excited fluorescence (TPEF) there has been a steadily increasing number of publications that model
applied to the same breast cancer biopsies (Qiu et al., 2022). Instead WSIs as hypergraphs. Hypergraphs extend the graph modeling frame-
of fusing the modalities in the model itself, the authors determined work with hyperedges, which can connect sets containing an arbitrary
tumor-associated collagen signatures from the 3 different modalities number of nodes in the graph. This allows hypergraphs to model
in different regions to calculate a 8-bit binary vector for each region. relations that rely on more than 2 pairwise entities. Deep learning
The regions sampled were treated as graph nodes having the binary on hypergraphs is typically achieved using hypergraph neural net-
vector as node attributes. Using these nodes, a fully connected graph work architectures, such as HGNN (Feng et al., 2019) and Hyper-
was constructed and used as input for a survival prediction GNN-model. GAT (Ding et al., 2020). We provide an overview of publications using
higher-order graphs in histopathology in Table 5.
Modality prediction: Pati et al. used a generative approach to
Let us denote a hypergraph as 𝐺 = (𝑉 , 𝐸hyp ), which consists of
virtually predict IHC stained tissue images from H&E slide images, and
a set of nodes 𝑉 and a set of hyperedges 𝐸hyp . Each hyperedge in
then used a multimodal GNN Transformer model to perform survival
𝐸hyp is a subset of 𝑉 , connecting any number of vertices. For exam-
prediction and cancer grading tasks in prostate cancer, breast can-
ple, a hypergraph with vertices 𝑉 = {𝑣1 , 𝑣2 , 𝑣3 , 𝑣4 } and hyperedges
cer, and colorectal cancer (Pati et al., 2023). The authors used three
𝐸hyp = {{𝑣1 , 𝑣2 }, {𝑣2 , 𝑣3 , 𝑣4 }, {𝑣1 , 𝑣3 , 𝑣4 }} of 𝑉 , expressing relationships
strategies for fusion (no fusion, early fusion, late fusion) and found
between multiple nodes simultaneously. We denote the connectivity
that early fusion works optimally for both tasks. In early fusion, the of a hypergraph using an incidence matrix 𝐻 |𝑉 |×|𝐸| whose entries are
authors combined ImageNet-pretrained ResNet features from the same defined as:
patch in all modalities to form the node features in the input graph. In {
1, 𝑖𝑓 𝑣 ∈ 𝑒
late fusion, meanwhile, all modalities were assigned a separate input ℎ(𝑣, 𝑒) = (36)
graph, which was processed separately using the GNN Transformer 0, 𝑖𝑓 𝑣 ∉ 𝑒
model. Subsequently, the output features were combined. The authors for nodes 𝑣 ∈ 𝑉 and edges 𝑒 ∈ 𝐸ℎ𝑦𝑝 . For any node 𝑣, its degree is defined

hypothesized that early fusion allowed the model to learn multimodal as 𝑑(𝑣) = 𝑒∈𝐸ℎ𝑦𝑝 ℎ(𝑣, 𝑒), similarly for any edge 𝑒 ∈ 𝐸ℎ𝑦𝑝 , its degree

spatial interactions during message passing, causing a performance gain is defined as 𝑑(𝑒) = 𝑣∈𝑉 ℎ(𝑣, 𝑒). These degrees are saved in diagonal
compared to the other fusion strategy. matrices 𝐷𝑒 and 𝐷𝑣 , which contain the edge degrees and node degrees,

14
S. Brussee et al. Medical Image Analysis 101 (2025) 103444

Table 5
Publications which utilized hypergraph neural networks for histopathology WSI analysis.
Publication Date Application Hypergraph type Message-Passing
Di et al. (2020) 2020/09 Survival prediction Patch Hypergraph HGNN
Bakht et al. (2021) 2021/05 Patch classification Patch Hypergraph HGNN
Di et al. (2022b) 2022/09 Survival prediction Patch Hypergraph HGMConv
Benkirane et al. (2022) 2022/11 Survival prediction Patch Hypergraph HGCN, HGAT
Liang et al. (2024a) 2024/02 Binary classification Patch HyperGraph Adaptive HGNN
Han et al. (2024) 2024/04 Survival prediction Patch Hypergraph HMHA
Liang et al. (2024b) 2024/04 Binary classification Patch Hypergraph HGNN
Cai et al. (2024) 2024/05 Survival prediction Patch Hypergraph HGNN
Shi et al. (2024) 2024/08 Cancer subtyping, mutation prediction Patch Hypergraph HGAT
Li et al. (2024a) 2024/10 Gene expression prediction Patch Hypergraph HGNN
Zhou et al. (2024) 2024/10 Metastasis prediction Patch Hypergraph (Multi-slice) HGNN
Lu et al. (2024) 2024/10 Survival prediction Patch Hypergraph HGNN

probability of a node 𝑣 to be connected using hyperedge 𝑒:


( )
⎧ 𝑑
⎪exp − 𝑝 𝑑 , if 𝑣 ∈ 𝑒
ℎ(𝑛, 𝑒) = ⎨ max avg (39)
⎪0, if 𝑣 ∉ 𝑒

Here 𝑑 denotes the distance between the current node 𝑛 and the
neighboring node. 𝑝𝑚𝑎𝑥 denotes the maximum probability and 𝑑𝑎𝑣𝑔 is
Fig. 8. Graphical overview of the hypergraph neural network framework (Feng et al., the average distance between all 𝑘 nearest neighbors. Finally, they use
2019). First, message-passing gets applied between all nodes connected to the same this incidence matrix to calculate the node and edge degrees:
hyperedge. Then the learned features are calculated on a hyperedge-level. Finally, the ∑ ∑
hyperedge-level features are used to calculate the new node features. 𝑑(𝑣) = ℎ(𝑣, 𝑒), 𝑑(𝑒) = ℎ(𝑣, 𝑒) (40)
𝑣∈𝑉 𝑒∈𝐸

The degrees are combined into matrices 𝐷𝑣 and 𝐷𝑒 , which are used,
respectively. Lastly, we denote the matrix of node features as 𝑋. The together with the incidence matrix 𝐻 and node feature matrix 𝑋 in 3
decision on which nodes to connect to a hyperedge is usually based HGNN message passing layers. The output of these layers was used to
on feature similarity or spatial distance (i.e., closely related nodes are predict the label of patches in the WSI.
connected together by a single hyperedge). Feng et al. introduced the Di et al. then expanded on their previous work by using multiple
hypergraph neural network (Feng et al., 2019) (visualized in Fig. 8), hypergraphs that are fused together to be used as input for message
which defined a message passing operation on hypergraphs as follows: passing layers (Di et al., 2022b). Specifically, they construct a topo-
( )
−1∕2
𝐗(𝑙+1) = 𝜎 𝐃𝑣 𝐇𝐖𝐃−1 ⊤ −1∕2 (𝑙) (𝐥)
𝐗 Θ (37) logical hypergraph and a phenotype (feature-based) hypergraph. The
𝑒 𝐇 𝐃𝑣
authors sampled patches sequentially from the tissue boundary to the
where 𝑊 is a learned weight matrix, 𝜎 denotes a nonlinear activation tissue center and grouped the patches in the same sequence step in
function, and 𝛩 is a learnable filter matrix used for feature extraction. the same topological area. The topological hypergraph is constructed
After applying message passing, we have an updated feature matrix by connecting neighboring patches with a hyperedge if they belong
𝑋. This can then be used to obtain features on the hyperedge level, to the same topological area. The phenotype hypergraph meanwhile,
(𝑙+1)
mathematically defined as: 𝑋ℎ𝑒 = 𝐻 𝑇 × 𝑋. Finally, to get the up- is constructed using K-NN based on the vector similarity between the
dated node-level embeddings is acquired by multiplying the hyperedge patch features. The two hypergraphs are then concatenated together
(𝑙+1)
features with the incidence matrix: 𝑋 (𝑙+1)′ = 𝑋ℎ𝑒 × 𝐻. to form a total incidence matrix 𝐻. For processing the constructed
Di et al. were the first to model WSIs as hypergraphs (Di et al., hypergraph the authors use max-mask convolutional layers, which are
2020). They used their hypergraph approach for survival prediction defined in 4 iterative steps:
in lung and brain cancer datasets. The authors started by constructing
sets of 𝐾 similar patches based on the Euclidean distance between the 1. Hyperedge Feature Gathering: First, hyperedge-level features
feature vectors, which were retrieved using an ImageNet-pretrained are formed by multiplying the hypergraph incidence matrix (𝐻)
ResNet model. 𝑁 hyperedges are then used to connect the patches and the node feature matrix (𝐹𝑣(𝑙) ). This step aggregates the
in each of the sets. The authors then used the node feature matrix information from nodes connected by each hyperedge, resulting
𝑋 with the defined hypergraph, captured in 𝐻, and updated the in hyperedge-level features (𝐹𝑒(𝑙) ).
features using a series of convolutional hypergraph layers (HGNN).
2. Max-Mask Operation: After gathering hyperedge-level features,
The acquired representations after message passing were then used for
a max-mask operation is performed on each dimensionality of
the downstream survival prediction task. The authors show that their
𝐹𝑒(𝑙) . This operation aims to avoid overfitting by disregarding
hypergraph-based method outperforms other CNN- and GNN-based
the contribution of dominant hyperedges that take the largest
frameworks for survival prediction.
values.
Bakht et al. extended this idea by introducing a new strategy
3. Node Feature Aggregating: By multiplying the hyperedge fea-
for constructing the hypergraph (Bakht et al., 2021). Given a fixed
tures with the transposed incidence matrix (𝐻 𝑇 × 𝐹𝑒(𝑙) ), we can
neighbor parameter 𝑘, their hypergraph construction strategy starts by
calculate the node features (𝐹𝑣(𝑙+1) ).
defining the distance between any two patches 𝑖, 𝑗 as:
( ) 4. Node Feature Reweighting: Finally, the output node features
‖𝑥𝑖 − 𝑥𝑗 ‖22
𝑑𝑘 (𝑖, 𝑗) = exp − (38) are further weighted using learnable parameters (𝜄(𝑙) ), which are
2𝜎 2 represented as a diagonal matrix. This reweighting is followed by
where 𝑥𝑖 , 𝑥𝑗 represent the feature vectors of patch 𝑖 and 𝑗, respec- a non-linear activation function (𝜎). The reweighting step allows
tively, and 𝜎 is a bandwidth parameter. Then, the authors calculated the model to learn the importance of different node features and
the vertex-edge probabilistic incidence matrix which determines the adaptively adjust them.

15
S. Brussee et al. Medical Image Analysis 101 (2025) 103444

Mathematically, the max-mask convolutional layer is defined as Cai et al. introduced hierarchy into the hypergraph-GNN frame-
follows: work, by defining hypergraphs at the patch and tissue levels (Cai et al.,
( ) 2024). On each of these levels, the hypergraph is defined using the
𝑋 (𝑙+1) = 𝜎 (𝐼 − 𝐿)𝑋 (𝑙) + 𝐻 −1 (𝐼 − 𝐿)𝑋 (𝜆) ⋅ 𝜄(𝑙)
(41) K-NN algorithm in both the spatial and the feature space. Patch-level
𝐹𝑒(𝑙+1) = 𝐻 −1 (𝐼 − 𝐿)𝑋 (𝑙) + 𝑋 (𝜆) features were pooled hierarchically to obtain tissue-level embeddings.
Here, 𝐿 is the multigraph Laplacian matrix, and 𝐼 denotes the identity Using attention-based pooling, patch- and tissue-level hypernodes were
matrix. 𝐻 −1 (𝐼 − 𝐿)𝑋 (𝜆) functionally ensures that the top 𝜆 attribute pooled to obtain a WSI-level representation. The authors show their ap-
feature dimensionalities are ignored during the gradient calculation. proach performs competitively for survival prediction tasks on various
Bankirane et al. used adaptive agglomerative clustering to construct TCGA datasets.
a patch hypergraph, which was then processed using a combination of Han et al. introduced another hierarchical hypergraph approach,
HGNN and HGAT layers (Benkirane et al., 2022). In the agglomerative defining hypergraph nodes on 10x resolution tissue patches based on
clustering, a similarity kernel was used that took into account both the heterogeneity of a subordinate heterogeneous cell graph defined
on 20x resolution patches (Han et al., 2024). The authors used a
spatial location and feature similarity between patches. This kernel cal-
heterogeneous approach, where they separated the hypernodes into
culated pairwise similarity scores between all patches. If the similarity
mix nodes, boundary nodes and random nodes, connected using mix
score was higher than a fixed threshold 𝛿, the patches were assigned to
hyperedges, boundary hyperedges and neighbor hyperedges. They used
the same cluster 𝐶𝑘 . For each cluster, the representation of the patches
a heterogeneity-aware hypergraph learning layer to pass messages,
in the cluster was averaged to obtain cluster-level representations. Each
additionally incorporating a self-attention mechanism. Their approach
clustered patch was treated as a hypergraph node. The hyperedges
explicitly focuses on tissue regions where the cellular composition
connected all nodes with a feature similarity higher than a fixed
changes considerably, which play an important factor in the prognos-
threshold 𝛿ℎ . We denote the neighborhood of a clustered node 𝑐𝑖 as
tic prediction. As such, their method showed competitive results for
𝛾(𝑐𝑖 ) = 𝑐𝑖 ∈ 𝐶; 𝜅ℎ (𝑐𝑖 , 𝑐𝑗 ) ≥ 𝛿ℎ . Here, 𝐶 denotes the set of all clusters and survival prediction on TCGA datasets.
𝜅ℎ (𝑐𝑖 , 𝑐𝑗 ) denotes the output of the feature similarity kernel 𝜅ℎ . Having Li et al. used hypergraph-based WSI modeling to predict spatial
determined the neighborhood, we can calculate the incidence matrix gene expression (Li et al., 2024a). They first extracted image features
𝐻 where: using a combination of convolutional block attention modules and
{
1, if 𝑐𝑖 ∈ 𝛾(𝑐𝑖 ) vision transformers defined at multiple latent stages of the pipeline.
ℎ(𝑘, 𝑗) = (42)
0, else The incidence matrix of their hypergraph was defined by a linear
combination of both the spatial- and feature-wise distances between
The authors then used the incidence matrix 𝐻, and node feature image patches:
matrix 𝑋 as input for a series of combined HGNN-HGAT layers which
were subsequently pooled into a global representation. Finally, this 𝐼 𝑛𝑐_𝑀 𝑎𝑡(𝑣𝑖 , 𝑣𝑗 ) = 𝑁 𝑜𝑟𝑚(𝐷𝑖𝑠(𝑣𝑖 , 𝑣𝑗 )) + 𝑁 𝑜𝑟𝑚(𝑃 𝑜𝑠(𝑣𝑖 , 𝑣𝑗 )) (46)
representation was used as input for an MLP layer, which predicted where 𝑣𝑖 and 𝑣𝑗 are two image patches and 𝐷𝑖𝑠, 𝑃 𝑜𝑠 denote the
the hazard score for survival prediction. feature-wise distance and spatial distance, respectively.
Liang et al. introduced the adaptive HGNN to histopathology, for Shi et al. extended the hypergraph-based modeling of WSIs by
the classification of sentinel node metastases and the differentiation be- introducing masked hypergraph modeling, which generates masked
tween lung squamous cell carcinoma and lung adenocarcinoma (Liang views of the hypergraph during model training (Shi et al., 2024). The
et al., 2024a). The authors used the K-NN algorithm on patch-level model was then trained to reconstruct the masked nodes and edges.
ImageNet-pretrained ResNet features to construct a hypergraph of This strategy aimed to increase the robustness and generalizability
patches, where the 𝑘 most similar patches were connected using a of the features learned by the GNN, enabling better performance on
hyperedge. Their main innovation comes in the form of an adaptive unseen samples. Starting with a set of 20x patches from the WSI along
HGNN, which can adjust the correlation strength between nodes and with corresponding ViT-based image features, the authors defined their
hyperedges on the graph during model training. They first denote a hypergraph by clustering the feature space using k-means and spatially
matrix of edge strength in layer 𝑙 as 𝑇 (𝑙) . Each element 𝑡(𝑙) (𝑙)
𝑖 ∈ 𝑇 , which
clustering using agglomerative clustering. The obtained clusters are
denotes the attention score of the node 𝑖 and its associated hyperedge then stacked to define a joined hypergraph.
𝑒𝑖,𝑖′ in the 𝑙th layer, is defined as: Zhou et al. moved beyond hypergraphs defined on single WSIs to
exp(𝜎(𝑠𝑖𝑚(𝑓𝑖 𝑀 (𝑙) , 𝑒𝑖,𝑖′ 𝑀 (𝑙) ))) hypergraphs defined using multiple WSIs from the same patient (Zhou
𝑡(𝑙)
𝑖 = ∑ (43) et al., 2024). Starting with WSI-patches and their corresponding fea-
(𝑙) (𝑙)
𝑘∈𝑁𝑗 exp(𝜎(𝑠𝑖𝑚(𝑓𝑖 𝑀 , 𝑒𝑖,𝑘 𝑀 ))) tures, the authors first defined two hypergraphs: A intra-hypergraph,
here, 𝑀 (𝑙) denotes a feature transformation matrix. 𝑒𝑖,𝑖′ denotes the defined using K-NN in both the feature- and coordinate space, and
hyperedge connecting nodes 𝑖 and 𝑖′ . By calculating these edge strength a cross-hypergraph, solely defined using K-NN in the feature space.
scores, the incidence matrix can be updated as follows: The intra-hypergraph is WSI specific, while the cross-hypergraph con-
tains hypernodes across WSIs. The features for the hypernodes in
−1∕2 −1∕2
𝐻̃ 𝑖′′(𝑙) = 𝐷𝑉 (𝑇𝑖(𝑙) ⊙ 𝐻 𝑖 )𝑊 𝐷𝑒−1 (𝑇𝑖(𝑙) ⊙ 𝐻 𝑖 )T 𝐷𝑉 (44) the cross-hypergraph are obtained by applying HGNN layers to the
intra-hypergraphs, leading to spatially aware representations. A specif-
where 𝐷𝑣 , 𝐷𝑒 denote the node degree and edge degree matrices. 𝑇 (𝑙)
ically designed multi-slice hypergraph block was then applied to the
denotes the edge strength matrix and 𝑊 is a learnable weight matrix.
cross-hypergraph to exchange information across WSIs, which was
This function essentially adapts the interconnection of the nodes in 𝐻
subsequently pooled to obtain patient-level embeddings.
using the edge strengths calculated in 𝑇 (𝑙) . Note that the edge strength
Finally, Lu et al. introduced reinforcement learning (RL) techniques
changes depending on the layer 𝑙, as the feature similarities also
to optimize their hypergraph-GNN framework for survival predic-
change between the layer embeddings. The feature matrix is updated
tion (Lu et al., 2024). First, the authors clustered 20x resolution
as follows: patch features and patch coordinates using K-means. A RL-agent was
𝐹̃𝑖(𝑙+1) = {𝑓̃𝑖,𝑗 }𝑃𝑗=1 = 𝜎((𝐻̃ 𝑖′′(𝑙) )𝐹𝑖(𝑙) 𝑃𝑖(𝑙) ) (45) trained to select representative patches from each cluster, which are
consequently used as hypernodes in the hypergraph. These hypern-
where 𝜎 is a nonlinear activation function and 𝑃𝑖(𝑙)
denotes a learned odes were connected using inter-cluster edges if they exist within the
projection matrix. same K-means clusters. Clusters themselves were connected by intra-
In a later publication, Liang et al. expanded their approach, utilizing cluster edges. The authors used contrastive learning between different
a multiscale hypergraph model enhanced by cross-scale contrastive hypergraphs of the same slide to improve the data diversity and aid
learning (Liang et al., 2024b). generalization.

16
S. Brussee et al. Medical Image Analysis 101 (2025) 103444

4.4.1. Discussion and future prospects 5.3. Self-supervised learning using GNNs
In this section, we described the increasing popularity of
hypergraph-based modeling of histopathology slides. Interestingly, this Due to the high costs of annotation in histopathology, adapta-
approach has only been applied on a patch level, whereas we argue tion of self-supervised learning (SSL) has been steadily growing for
that hypergraph-based modeling might be very well suited for cell- histopathology applications, particularly for image feature extraction.
level modeling. For example, cells can be organized as clusters that As such, they have become the basis for feature extraction models
can have an important diagnostic context (Chandran et al., 2012). Such often used in GNN approaches. Recently, GNN-based SSL methods
cell clusters could be modeled using hypergraphs, where homogeneous specifically designed for graphs have been introduced (Liu et al., 2022),
clusters are connected using a single hyperedge. Furthermore, there which can be subcategorized into contrastive, generative or auxiliary ap-
exist many other higher-order graph types, such as cellular complexes proaches. Contrastive approaches aim to align the embeddings of graph
and combinatorial complexes (Papillon et al., 2023), which have not entities (e.g., nodes, subgraphs, graphs) to each other, typically using
seen use in histopathology. We anticipate that these approaches will mutual information (MI) maximization. Generative approaches learn to
also be tested in a histopathological context. For example, using cellular reconstruct either the graph structure or features defined on the graph
complexes, different semantic tissue structures (e.g. hair follicles in the based on latent space features or based on different views of the graph.
skin) can be modeled jointly with cells, but as separate graph entities. Lastly, auxiliary approaches predict graph properties (e.g., degree) of
the graph from its latent representation. In histopathology, contrastive
5. Other trends learning has emerged as the predominant strategy for GNN-based SSL
(Pina and Vilaplana, 2022; Liang et al., 2024b; Shao et al., 2024; Chi
In addition to the trends explored above, we identified several other et al., 2024; Lu et al., 2024), while some recent publications have used
trends, which we will discuss more briefly. generative approaches (Pan et al., 2024; Saeidi et al., 2024; Elforaici
et al., 2024). The focus on contrastive and generative techniques,
leaves room for auxiliary approaches in the field. For example, one
5.1. Graph transformers
could predict global graph statistics such as the density or average
clustering coefficient, which could act as a regularization to gain more
In the last few years, graph neural networks have been combined
general-purpose graph-level features before introducing image-based
with transformer architectures, which has given birth to the Graph
information. Based on the rapid increase of GNN-based SSL approaches
Transformer modeling paradigm. Graph Transformers either use the po-
and the considerable shortage of labeled data in histopathology, we
sitional embedding of the graph in the input to the transformer module,
argue these techniques will continue to gain popularity.
use the graph structure as a prior to build an attention mask for each
input, or directly combine message passing layers with transformer
5.4. Semi-supervised learning using GNNs
blocks in the model architecture (Min et al., 2022). Graph transformers
are especially suited for modeling long-distance relations in graphs,
Label scarcity is a big challenge in histopathology, as evident by the
as they do not suffer from oversmoothing, where node representations
popularity of weakly supervised multiple instance learning strategies.
become almost identical throughout the graph when using increased
Semi-supervised learning aims to utilize both labeled and unlabeled
GNN layer depth and oversquashing, where the computational costs
examples in partly labeled datasets, usually by inferring labels from
of adding GNN layers grow exponentially (Kreuzer et al., 2021). In
unlabeled examples based on the labeled ones, deemed label inference.
histopathology, all three types of graph transformers have been used
In a feature similarity-based graph, we can propagate the label of
(Shi et al., 2023a; Hou et al., 2023; Lou et al., 2024; Wang et al.,
labeled examples to unlabeled examples. Often, edges are weighted
2024) One major challenge in the application of graph transformers is based on the strength of the similarity, leading to a more nuanced label
their scalability, as the time- and memory complexity of the attention propagation process. GNNs implicitly perform label propagation, trans-
mechanism in Transformers grows exponentially(𝑂(|𝑉 |2 ), where 𝑉 is ferring both structural- and feature information from labeled nodes to
the number of nodes). This is especially a problem in cell graphs unlabeled nodes via message passing, thus bypassing the need for ex-
in histopathology, as these graphs often pass 10.000 nodes in size. plicit label propagation (Song et al., 2022). When predicting WSI-level
Recently, efforts have been made to greatly mitigate this challenge labels using GNNs in histopathology, there is often a weakly supervised
of scalability (Rampášek et al., 2022; Shirzad et al., 2023; Wu et al., setting where the graph is labeled while the nodes themselves are not.
2024), leading us to believe that the popularity of graph transformers This can be regarded as a form of semi-supervised learning, where
in histopathology will continue to grow. the features learned from the unlabeled nodes are pooled to a labeled
graph-level example. On node-level tasks, however, semi-supervised
5.2. Heterogeneous GNNs learning is still relatively unexplored in histopathology. We argue this
type of learning can be useful for cell classification tasks where some
Histopathology tissues are composed of different entities (e.g., cells have a cell type label (e.g., through the use of IHC-staining) while
glands, cells) which can be subdivided further into biological subtypes other nuclei are unlabeled (Turkki et al., 2016).
(e.g., celltypes). In a graph, we can explicitly encode these differences
as different node types to obtain a heterogeneous graph. Similarly, 5.5. Foundation models in computational histopathology
different interactions between histopathological entities can also be
explicitly defined as different types of edges. Heterogeneous GNNs The rise of self-supervised learning as well as increased availability
are specifically designed to work on heterogeneous graphs, separately of histopathology datasets, has allowed the construction of very large
modeling the model parameters based on the node/edge type. Given the deep neural networks, termed Foundation modes, on huge amounts
large number of different cell types and biological interactions present of (unlabeled) histopathology images (Vorontsov et al., 2023). These
in tissues, heterogeneous GNNs are a natural model for histopatholog- models can be used for effective feature extraction in a wide variety
ical tissues. Interestingly, only a handful of publications have made of tissue types. In both natural language processing and computer
use of heterogeneous GNNs, where most have modeled entities on vision, there has been a move to foundation models that incorporate
different hierarchies (e.g., 20x, 40x patches) as different node types an even broader spectrum of modalities (video Christensen et al., 2023,
(Hou et al., 2022b; Gupta et al., 2023; Bazargani et al., 2024), and audio Gardner et al., 2023, knowledge graphs Luo et al., 2023). Recent
2 articles modeled different node types based on predicted cell types approaches have introduced medical texts in addition to image data (Lu
(Chan et al., 2023; Han et al., 2024). et al., 2023; Huang et al., 2023), which allows associating image data

17
S. Brussee et al. Medical Image Analysis 101 (2025) 103444

with medical texts and is thus very suitable for CBHIR applications. CRediT authorship contribution statement
We argue that in histopathology and medical imaging in general, there
will also be a move towards broader multimodality, especially given Siemen Brussee: Writing – original draft, Visualization, Software,
the number of different modalities available in the medical domain Methodology, Investigation, Formal analysis, Conceptualization. Gior-
(WSI, IHC, MRI, CT, EHR, etc.). Graph models of WSIs can also be gio Buzzanca: Writing – review & editing. Anne M.R. Schrader:
used as input in these models, encoding the topological information Writing – review & editing, Project administration, Funding acquisi-
present in WSIs and correlating that with the image data. We also tion. Jesper Kers: Writing – review & editing, Supervision, Project
argue that in addition to the patch-based histopathology foundation administration.
models currently available, cellular foundation models could prove
very useful. Although there exist some SSL-based embedding models Declaration of competing interest
specifically designed at the cellular level (Feng et al., 2021; Nakhli
et al., 2024), these models are not yet scaled to the training data sizes The authors declare the following financial interests/personal rela-
we see for patch-level foundation models. Since the clinical relevance of tionships which may be considered as potential competing interests:
cells is largely dependent on the context in which they exist, we argue Siemen Brussee reports financial support was provided by Hanarth
that cellular foundation models could greatly benefit from GNN-based Fund Foundation. Giorgio Buzzanca reports financial support was pro-
learning methods, encoding both topological, spatial, and image-based vided by European Commission. If there are other authors, they declare
information for each cell. As such, we envision a cellular foundation that they have no known competing financial interests or personal
model that uses a GNN-based encoder or at least uses encodings of relationships that could have appeared to influence the work reported
spatial/topological features in the training process. in this paper.

Acknowledgment
5.6. Generalization of GNNs in histopathology
This project has been funded by the Hanarth Foundation for AI in
Computational pathology models are plagued by large variations oncology, The Netherlands.
in scanner configurations, staining variability and tissue preparation
between medical centers (Leo et al., 2016). Techniques such as stain Appendix A. Supplementary data
normalization (Janowczyk et al., 2017), color augmentation (Tellez
et al., 2018) and domain adaptation methods (Lafarge et al., 2017) have As supplementary material, we provide a complete table of all the
partly mitigated this issue, but there often remains a considerable per- articles researched with the corresponding categorizations.
formance gap when transferring models to multicenter settings. Graph Supplementary material related to this article can be found online
Neural Networks, especially those based on cell graphs, may be well at https://doi.org/10.1016/j.media.2024.103444.
suited to help improve generalization capabilities of histopathology
models, since cellular positions and thus the resulting graph structures References
themselves are invariant to differences in staining and scanner configu-
ration.4 No conclusive evidence is available to prove this generalization Abbas, S.F., Le Vuong, T.T., Kim, K., Song, B., Kwak, J.T., 2023. Multi-cell type and
advantage GNNs might have for computational histopathology, as di- multi-level graph aggregation network for cancer grading in pathology images.
Med. Image Anal. 90, 102936.
rect comparisons between CNNs/ViTs and GNNs in histopathology
Abdous, S., Abdollahzadeh, R., Rohban, M.H., 2023. KS-GNNExplainer: Global model
generalization performance have not yet been conducted. As such, we interpretation through instance explanations on histopathology images. arXiv
argue this potential advantage should be studied in future research. preprint arXiv:2304.08240.
To improve the generalization capabilities of GNNs themselves, many Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk, S., 2012. SLIC
superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern
approaches have been proposed, ranging from self-supervised graph
Anal. Mach. Intell. 34 (11), 2274–2282.
contrastive learning approaches to more simple augmentation methods Acharya, V., Choi, D., Yener, B., Beamer, G., 2024. Prediction of tuberculosis from lung
such as edge perturbation or feature masking. We refer to Ding et al. tissue images of diversity outbred mice using jump knowledge based cell graph
(2022a) for a complete overview of data augmentation in GNNs. neural network. IEEE Access.
Adnan, M., Kalra, S., Tizhoosh, H.R., 2020. Representation learning of histopathology
images using graph neural networks. In: Proceedings of the IEEE/CVF Conference
6. Conclusion on Computer Vision and Pattern Recognition Workshops. pp. 988–989.
Ahmedt-Aristizabal, D., Armin, M.A., Denman, S., Fookes, C., Petersson, L., 2022. A
survey on graph-based deep learning for computational histopathology. Comput.
In this review, we provided a comprehensive overview of the recent Med. Imaging Graph. 95, 102027.
developments in the applications of GNNs in histopathology, which can Ali, S., Veltri, R., Epstein, J.A., Christudass, C., Madabhushi, A., 2013. Cell cluster
graph for prediction of biochemical recurrence in prostate cancer patients from
be used for guiding new research in the field. We quantified the growth tissue microarrays. In: Medical Imaging 2013: Digital Pathology. Vol. 8676, SPIE,
of different modeling paradigms in the use of GNNs in histopathology. pp. 164–174.
Based on our quantification, we provided a comprehensive overview Anklin, V., Pati, P., Jaume, G., Bozorgtabar, B., Foncubierta-Rodrıguez, A., Thiran, J.-P.,
of several emerging subfields, including hierarchical graph models, Sibony, M., Gabrani, M., Goksel, O., 2021. Learning whole-slide segmentation from
inexact and incomplete labels using tissue graphs. arXiv preprint arXiv:2103.03129.
adaptive graph structure learning, multimodal modeling with GNNs, Azadi, P., Suderman, J., Nakhli, R., Rich, K., Asadi, M., Kung, S., Oo, H., Keyes, M.,
and higher-order graph models. We also provided future directions Farahani, H., MacAulay, C., et al., 2023. ALL-IN: AL ocal GL obal graph-based
for the field, including the use of topological deep learning, adaptive DI stillatio N model for representation learning of gigapixel histopathology images
structure learning for heterophilious graphs, graph-based multimodal with application in cancer risk assessment. In: International Conference on Medical
Image Computing and Computer-Assisted Intervention. Springer, pp. 765–775.
fusion with missing modalities, the use of graph transformer models, Azher, Z.L., Fatemi, M., Lu, Y., Srinivasan, G., Diallo, A.B., Christensen, B.C., Salas, L.A.,
heterogeneous GNNs, self- and semi-supervised learning using GNNs Kolling IV, F.W., Perreard, L., Palisoul, S.M., et al., 2023. Spatial omics driven
and graph-based foundation models. crossmodal pretraining applied to graph-based deep learning for cancer pathology
analysis. In: PACIFIC SYMPOSIUM on BIOCOMPUTING 2024. World Scientific, pp.
464–476.
Bahade, S.S., Edwards, M., Xie, X., 2023. Cascaded graph convolution approach for
4 nuclei detection in histopathology images. J. Image Graph. 11 (1).
Assuming that the nuclei segmentation is robust to these differences.

18
S. Brussee et al. Medical Image Analysis 101 (2025) 103444

Bai, Y., Mi, Y., Su, Y., Zhang, B., Zhang, Z., Wu, J., Huang, H., Xiong, Y., Gong, X., Di, D., Zou, C., Feng, Y., Zhou, H., Ji, R., Dai, Q., Gao, Y., 2022b. Generating
Wang, W., 2022. A scalable graph-based framework for multi-organ histology image hypergraph-based high-order representations of whole-slide histopathological im-
classification. IEEE J. Biomed. Health Inf. 26 (11), 5506–5517. ages for survival prediction. IEEE Trans. Pattern Anal. Mach. Intell. 45 (5),
Bakht, A.B., Javed, S., AlMarzouqi, H., Khandoker, A., Werghi, N., 2021. Colorectal can- 5800–5815.
cer tissue classification using semi-supervised hypergraph convolutional network. di Villaforesta, A.F., Magister, L.C., Barbiero, P., Liò, P., 2023. Digital histopathology
In: 2021 IEEE 18th International Symposium on Biomedical Imaging. ISBI, IEEE, with graph neural networks: Concepts and explanations for clinicians. arXiv
pp. 1306–1309. preprint arXiv:2312.02225.
Bazargani, R., Fazli, L., Gleave, M., Goldenberg, L., Bashashati, A., Salcudean, S., 2024. Ding, S., Gao, Z., Wang, J., Lu, M., Shi, J., 2023b. Fractal graph convolutional
Multi-scale relational graph convolutional network for multiple instance learning network with MLP-mixer based multi-path feature fusion for classification of
in histopathology images. Med. Image Anal. 96, 103197. histopathological images. Expert Syst. Appl. 212, 118793.
Bazargani, R., Fazli, L., Goldenberg, L., Gleave, M., Bashashati, A., Salcudean, S., 2022. Ding, R., Rodriguez, E., Da Silva, A.C.A.L., Hsu, W., 2023a. Using graph neural
Multi-scale relational graph convolutional network for multiple instance learning networks to capture tumor spatial relationships for lung adenocarcinoma recurrence
in histopathology images. arXiv preprint arXiv:2212.08781. prediction. In: 2023 IEEE 20th International Symposium on Biomedical Imaging.
Behzadi, M.M., Madani, M., Wang, H., Bai, J., Bhardwaj, A., Tarakanova, A., Ya- ISBI, IEEE, pp. 1–5.
mase, H., Nam, G.H., Nabavi, S., 2024. Weakly-supervised deep learning model for Ding, K., Wang, J., Li, J., Li, D., Liu, H., 2020. Be more with less: Hypergraph attention
prostate cancer diagnosis and gleason grading of histopathology images. Biomed. networks for inductive text classification. arXiv preprint arXiv:2011.00387.
Signal Process. Control 95, 106351.
Ding, K., Xu, Z., Tong, H., Liu, H., 2022a. Data augmentation for deep graph learning:
Bejnordi, B.E., Litjens, G., Hermsen, M., Karssemeijer, N., van der Laak, J.A., 2015. A
A survey. ACM SIGKDD Explor. Newsl. 24 (2), 61–77.
multi-scale superpixel classification approach to the detection of regions of interest
Ding, K., Zhou, M., Wang, Z., Liu, Q., Arnold, C.W., Zhang, S., Metaxas, D.N., 2022b.
in whole slide histopathology images. In: Medical Imaging 2015: Digital Pathology.
Graph convolutional networks for multi-modality medical imaging: Methods,
Vol. 9420, SPIE, pp. 99–104.
architectures, and clinical applications. arXiv preprint arXiv:2202.08916.
Benkirane, H., Vakalopoulou, M., Christodoulidis, S., Garberis, I.-J., Michiels, S.,
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T.,
Cournède, P.-H., 2022. Hyper-adac: Adaptive clustering-based hypergraph repre-
Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al., 2020. An image is
sentation of whole slide images for survival analysis. In: Machine Learning for
worth 16x16 words: Transformers for image recognition at scale. arXiv preprint
Health. PMLR, pp. 405–418.
arXiv:2010.11929.
Bianchi, F.M., Grattarola, D., Alippi, C., 2020. Spectral clustering with graph neural
networks for graph pooling. In: International Conference on Machine Learning. Dwivedi, C., Nofallah, S., Pouryahya, M., Iyer, J., Leidal, K., Chung, C., Watkins, T.,
PMLR, pp. 874–883. Billin, A., Myers, R., Abel, J., et al., 2022. Multi stain graph fusion for multimodal
Bilgin, C., Demir, C., Nagi, C., Yener, B., 2007. Cell-graph mining for breast tissue integration in pathology. In: Proceedings of the IEEE/CVF Conference on Computer
modeling and classification. In: 2007 29th Annual International Conference of the Vision and Pattern Recognition. pp. 1835–1845.
IEEE Engineering in Medicine and Biology Society. IEEE, pp. 5311–5314. Ektefaie, Y., Dasoulas, G., Noori, A., Farhat, M., Zitnik, M., 2023. Multimodal learning
Bontempo, G., Porrello, A., Bolelli, F., Calderara, S., Ficarra, E., 2023. DAS-MIL: with graphs. Nat. Mach. Intell. 5 (4), 340–350.
Distilling across scales for MIL classification of histological WSIs. In: International Elforaici, M.E.A., Azzi, F., Trudel, D., Nguyen, B., Montagnon, E., Tang, A., Turcotte, S.,
Conference on Medical Image Computing and Computer-Assisted Intervention. Kadoury, S., 2024. Cell-Level GNN-based prediction of tumor regression grade in
Springer, pp. 248–258. colorectal liver metastases from histopathology images. In: 2024 IEEE International
Breen, J., Allen, K., Zucker, K., Orsi, N.M., Ravikumar, N., 2024. Multi-resolution Symposium on Biomedical Imaging. ISBI, IEEE, pp. 1–5.
histopathology patch graphs for ovarian cancer subtyping. arXiv preprint arXiv: Fatemi, M., Feng, E., Sharma, C., Azher, Z., Goel, T., Ramwala, O., Palisoul, S.M.,
2407.18105. Barney, R.E., Perreard, L., Kolling, F.W., et al., 2023. Inferring spatial transcrip-
Cai, H., Yi, W., Huang, W., Wang, Z., Zhang, Y., Song, J., 2024. A hierarchical tomics markers from whole slide images to characterize metastasis-related spatial
hypergraph attention network for survival analysis from pathological images. In: heterogeneity of colorectal tumors: A pilot study. J. Pathol. Inform. 14, 100308.
2024 IEEE International Symposium on Biomedical Imaging. ISBI, IEEE, pp. 1–4. Feng, C., Vanderbilt, C., Fuchs, T., 2021. Nuc2vec: Learning representations of nuclei
Chan, T.H., Cendra, F.J., Ma, L., Yin, G., Yu, L., 2023. Histopathology whole slide in histopathology images with contrastive loss. In: Medical Imaging with Deep
image analysis with heterogeneous graph representation learning. In: Proceedings Learning. PMLR, pp. 179–189.
of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. Feng, Y., You, H., Zhang, Z., Ji, R., Gao, Y., 2019. Hypergraph neural networks.
15661–15670. In: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33, pp.
Chandran, P.S., Byju, N.B., Deepak, R., Kumar, R.R., Sudhamony, S., Malm, P., Bengts- 3558–3565.
son, E., 2012. Cluster detection in cytology images using the cellgraph method. Gallagher-Syed, A., Rossi, L., Rivellese, F., Pitzalis, C., Lewis, M., Barnes, M.,
In: 2012 International Symposium on Information Technologies in Medicine and Slabaugh, G., 2023. Multi-stain self-attention graph multiple instance learning
Education. Vol. 2, pp. 923–927. http://dx.doi.org/10.1109/ITIME.2012.6291454. pipeline for histopathology whole slide images. arXiv preprint arXiv:2309.10650.
Chen, T., Kornblith, S., Norouzi, M., Hinton, G., 2020b. A simple framework for Gao, Z., Lu, Z., Wang, J., Ying, S., Shi, J., 2022. A convolutional neural network
contrastive learning of visual representations. In: International Conference on and graph convolutional network based framework for classification of breast
Machine Learning. PMLR, pp. 1597–1607. histopathological images. IEEE J. Biomed. Health Inf. 26 (7), 3163–3173.
Chen, R.J., Lu, M.Y., Wang, J., Williamson, D.F., Rodig, S.J., Lindeman, N.I., Gao, J., Lyu, T., Xiong, F., Wang, J., Ke, W., Li, Z., 2020. MGNN: A multimodal graph
Mahmood, F., 2020a. Pathomic fusion: an integrated framework for fusing neural network for predicting the survival of cancer patients. In: Proceedings of
histopathology and genomic features for cancer diagnosis and prognosis. IEEE the 43rd International ACM SIGIR Conference on Research and Development in
Trans. Med. Imaging 41 (4), 757–770. Information Retrieval. pp. 1697–1700.
Chi, C., Shi, H., Zhu, Q., Zhang, D., Shao, W., 2024. Spatially resolved gene ex-
Gao, Z., Shi, J., Wang, J., 2021. GQ-GCN: Group quadratic graph convolutional
pression prediction from histology via multi-view graph contrastive learning with
network for classification of histopathological images. In: Medical Image Computing
HSIC-bottleneck regularization. arXiv preprint arXiv:2406.12229.
and Computer Assisted Intervention–MICCAI 2021: 24th International Conference,
Christensen, M., Vukadinovic, M., Yuan, N., Ouyang, D., 2023. Multimodal foundation
Strasbourg, France, September 27–October 1, 2021, Proceedings, Part VIII 24.
models for echocardiogram interpretation. arXiv preprint arXiv:2308.15670.
Springer, pp. 121–131.
Ciga, O., Xu, T., Martel, A.L., 2022. Self supervised contrastive learning for digital
Gao, R., Yuan, X., Ma, Y., Wei, T., Johnston, L., Shao, Y., Lv, W., Zhu, T., Zhang, Y.,
histopathology. Mach. Learn. Appl. 7, 100198.
Zheng, J., et al., 2023. Predicting gene spatial expression and cancer prognosis: An
Cui, Z., Henrickson, K., Ke, R., Wang, Y., 2019. Traffic graph convolutional recurrent
integrated graph and image deep learning approach based on HE slides. bioRxiv.
neural network: A deep learning framework for network-scale traffic learning and
Gardner, J., Durand, S., Stoller, D., Bittner, R.M., 2023. Llark: A multimodal foundation
forecasting. IEEE Trans. Intell. Transp. Syst. 21 (11), 4883–4894.
model for music. arXiv preprint arXiv:2310.07160.
Daneshvar, H., Samavi, R., 2022. Heterogeneous patient graph embedding in
readmission prediction. In: AI. Gindra, R.H., Zheng, Y., Green, E.J., Reid, M.E., Mazzilli, S.A., Merrick, D.T., Burks, E.J.,
De, A., Mhatre, R., Tiwari, M., Chowdhury, A.S., 2022. Brain tumor classification from Kolachalama, V.B., Beane, J.E., 2024. Graph perceiver network for lung tumor and
radiology and histopathology using deep features and graph convolutional network. bronchial premalignant lesion stratification from histopathology. Am. J. Pathol..
In: 2022 26th International Conference on Pattern Recognition. ICPR, IEEE, pp. Godson, L., Alemi, N., Nsengimana, J., Cook, G.P., Clarke, E.L., Treanor, D.,
4420–4426. Bishop, D.T., Newton-Bishop, J., Magee, D., 2023. Multi-level graph represen-
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L., 2009. Imagenet: A large- tations of melanoma whole slide images for identifying immune subgroups. In:
scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision International Conference on Medical Image Computing and Computer-Assisted
and Pattern Recognition. Ieee, pp. 248–255. Intervention. Springer, pp. 85–96.
Di, D., Li, S., Zhang, J., Gao, Y., 2020. Ranking-based survival prediction on Graham, S., Vu, Q.D., Raza, S.E.A., Azam, A., Tsang, Y.W., Kwak, J.T., Rajpoot, N.,
histopathological whole-slide images. In: International Conference on Medical 2019. Hover-net: Simultaneous segmentation and classification of nuclei in
Image Computing and Computer-Assisted Intervention. Springer, pp. 428–438. multi-tissue histology images. Med. Image Anal. 58, 101563.
Di, D., Zhang, J., Lei, F., Tian, Q., Gao, Y., 2022a. Big-hypergraph factorization neural Gu, Z., Wang, S., Rong, R., Zhao, Z., Wu, F., Zhou, Q., Wen, Z., Chi, Z., Fang, Y.,
network for survival prediction from whole slide image. IEEE Trans. Image Process. Peng, Y., et al., 2024. CSGO: A deep learning pipeline for whole-cell segmentation
31, 1149–1160. in hematoxylin and eosin stained tissues. Lab. Invest. 102184.

19
S. Brussee et al. Medical Image Analysis 101 (2025) 103444

Guan, Y., Zhang, J., Tian, K., Yang, S., Dong, P., Xiang, J., Yang, W., Huang, J., Li, J., Chen, Y., Chu, H., Sun, Q., Guan, T., Han, A., He, Y., 2024b. Dynamic graph
Zhang, Y., Han, X., 2022. Node-aligned graph convolutional network for whole-slide representation with knowledge-aware attention for histopathology whole slide
image representation and classification. In: Proceedings of the IEEE/CVF Conference image analysis. In: Proceedings of the IEEE/CVF Conference on Computer Vision
on Computer Vision and Pattern Recognition. pp. 18813–18823. and Pattern Recognition. pp. 11323–11332.
Gupta, R.K., Kurian, N.C., Jeevan, P., Sethi, A., et al., 2023. Heterogeneous graphs Li, B., Nelson, M.S., Savari, O., Loeffler, A.G., Eliceiri, K.W., 2022. Differentiation
model spatial relationships between biological entities for breast cancer diagnosis. of pancreatic ductal adenocarcinoma and chronic pancreatitis using graph neural
arXiv preprint arXiv:2307.08132. networks on histopathology and collagen fiber features. J. Pathol. Inform. 13,
Hamilton, W., Ying, Z., Leskovec, J., 2017. Inductive representation learning on large 100158.
graphs. Adv. Neural Inf. Process. Syst. 30. Li, Y., Shen, Y., Zhang, J., Song, S., Li, Z., Ke, J., Shen, D., 2023. A hierarchical graph
Han, M., Zhang, X., Yang, D., Liu, T., Kuang, H., Feng, J., Zhang, L., 2024. Multi- V-Net with Semi-supervised Pre-training for histological image based breast cancer
scale heterogeneity-aware hypergraph representation for histopathology whole slide classification. IEEE Trans. Med. Imaging.
images. arXiv preprint arXiv:2404.19334. Li, R., Yao, J., Zhu, X., Li, Y., Huang, J., 2018. Graph CNN for survival analysis on
Hasegawa, T., Arvidsson, H., Tudzarovski, N., Meinke, K., Sugars, R.V., Ashok Nair, A., whole slide pathological images. In: International Conference on Medical Image
2023. Edge-based graph neural networks for cell-graph modeling and prediction. In: Computing and Computer-Assisted Intervention. Springer, pp. 174–182.
International Conference on Information Processing in Medical Imaging. Springer, Li, B., Zhang, Y., Wang, Q., Zhang, C., Li, M., Wang, G., Song, Q., 2024a. Gene
pp. 265–277. expression prediction from histology images via hypergraph neural networks. Brief.
He, P., Qu, A., Xiao, S., Ding, M., 2023. A GNN-based network for tissue semantic Bioinform. 25 (6), bbae500.
segmentation in histopathology image. In: Journal of Physics: Conference Series. Li, S., Zhou, J., Xu, T., Huang, L., Wang, F., Xiong, H., Huang, W., Dou, D.,
Vol. 2504, IOP Publishing, 012047. Xiong, H., 2021. Structure-aware interactive graph neural networks for the predic-
He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recog- tion of protein-ligand binding affinity. In: Proceedings of the 27th ACM SIGKDD
nition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Conference on Knowledge Discovery & Data Mining. pp. 975–985.
Recognition. pp. 770–778. Liang, M., Jiang, X., Cao, J., Li, B., Wang, L., Chen, Q., Zhang, C., Zhao, Y., 2024a. CAF-
Hou, W., He, Y., Yao, B., Yu, L., Yu, R., Gao, F., Wang, L., 2023. Multi-scope AHGCN: context-aware attention fusion adaptive hypergraph convolutional network
analysis driven hierarchical graph transformer for whole slide image based cancer for human-interpretable prediction of gigapixel whole-slide image. Vis. Comput.
survival prediction. In: International Conference on Medical Image Computing and 1–19.
Computer-Assisted Intervention. Springer, pp. 745–754. Liang, M., Jiang, X., Cao, J., Zhang, S., Liu, H., Li, B., Wang, L., Zhang, C., Jia, X.,
Hou, W., Huang, H., Peng, Q., Yu, R., Yu, L., Wang, L., 2022a. Spatial-hierarchical 2024b. HSG-MGAF Net: Heterogeneous subgraph-guided multiscale graph attention
graph neural network with dynamic structure learning for histological image fusion network for interpretable prediction of whole-slide image. Comput. Methods
classification. In: International Conference on Medical Image Computing and Programs Biomed. 247, 108099.
Computer-Assisted Intervention. Springer, pp. 181–191. Lim, S., Jung, S.-W., 2022. A comparative study on graph construction methods
Hou, W., Yu, L., Lin, C., Huang, H., Yu, R., Qin, J., Wang, L., 2022b. Hˆ 2-MIL: for survival prediction using histopathology images. In: 2022 IEEE International
exploring hierarchical representation with heterogeneous multiple instance learning Conference on Consumer Electronics-Asia (ICCE-Asia). IEEE, pp. 1–4.
for whole slide image analysis. In: Proceedings of the AAAI Conference on Artificial Liu, P., Ji, L., Ye, F., Fu, B., 2023. GraphLSurv: A scalable survival prediction network
Intelligence. Vol. 36, pp. 933–941. with adaptive and sparse structure learning for histopathological whole-slide
Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T.J., Zou, J., 2023. A visual–language images. Comput. Methods Programs Biomed. 231, 107433.
foundation model for pathology image analysis using medical twitter. Nat. Med. Liu, Y., Jin, M., Pan, S., Zhou, C., Zheng, Y., Xia, F., Philip, S.Y., 2022. Graph
29 (9), 2307–2316. self-supervised learning: A survey. IEEE Trans. Knowl. Data Eng. 35 (6), 5879–5900.
Ibañez, V., Szostak, P., Wong, Q., Korski, K., Abbasi-Sureshjani, S., Gomariz, A., Liu, M., Liu, Y., Xu, P., Cui, H., Ke, J., Ma, J., 2024. Exploiting geometric features via
2024. Integrating multiscale topology in digital pathology with pyramidal graph hierarchical graph pyramid transformer for cancer diagnosis using histopathological
convolutional networks. arXiv preprint arXiv:2403.15068. images. IEEE Trans. Med. Imaging.
Ilse, M., Tomczak, J., Welling, M., 2018. Attention-based deep multiple instance Lou, W., Li, G., Wan, X., Li, H., 2024. Cell graph transformer for nuclei classification.
learning. In: International Conference on Machine Learning. PMLR, pp. 2127–2136. In: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 38, pp.
Janowczyk, A., Basavanhally, A., Madabhushi, A., 2017. Stain normalization using 3873–3881.
sparse autoencoders (StaNoSA): application to digital pathology. Comput. Med. Lu, M.Y., Chen, B., Williamson, D.F., Chen, R.J., Liang, I., Ding, T., Jaume, G.,
Imaging Graph. 57, 50–61. Odintsov, I., Zhang, A., Le, L.P., et al., 2023. Towards a visual-language foundation
Jaume, G., Pati, P., Foncubierta-Rodriguez, A., Feroce, F., Scognamiglio, G., Anni- model for computational pathology. arXiv preprint arXiv:2307.12914.
ciello, A.M., Thiran, J.-P., Goksel, O., Gabrani, M., 2020. Towards explainable graph Lu, C., Tang, C., Cui, Y., Liu, J., Li, Z., Zhang, X., Yao, S., Lin, H., Yang, D., Liu, Z., et
representations in digital pathology. arXiv preprint arXiv:2007.00311. al., 2024. Hypergraph-based multi-instance contrastive reinforcement learning for
Jiang, N., Hou, Y., Zhou, D., Wang, P., Zhang, J., Zhang, Q., 2021. Weakly supervised annotation-free pan-cancer survival prediction on whole slide histology images.
gleason grading of prostate cancer slides using graph neural network. In: ICPRAM. Luo, Y., Yang, K., Hong, M., Liu, X., Nie, Z., 2023. Molfm: A multimodal molecular
pp. 426–434. foundation model. arXiv preprint arXiv:2307.09484.
Jiang, Y., Ma, S., Xiao, W., Wang, J., Ding, Y., Zheng, Y., Sui, X., 2023. Predicting Ma, M., Ren, J., Zhao, L., Tulyakov, S., Wu, C., Peng, X., 2021. Smil: Multimodal
EGFR gene mutation status in lung adenocarcinoma based on multifeature fusion. learning with severely missing modality. In: Proceedings of the AAAI Conference
Biomed. Signal Process. Control 84, 104786. on Artificial Intelligence. Vol. 35, pp. 2302–2310.
Kim, S.Y., 2023. GNN-surv: Discrete-time survival prediction using graph neural Magister, L.C., Kazhdan, D., Singh, V., Liò, P., 2021. Gcexplainer: Human-in-the-
networks. Bioengineering 10 (9), 1046. loop concept-based explanations for graph neural networks. arXiv preprint arXiv:
Kim, J., Wong, B., Ko, Y., Yi, M., 2024. MicroMIL: Graph-based contextual multiple 2107.11889.
instance learning for patient diagnosis using microscopy images. arXiv preprint Meng, X., Zou, T., 2023. Clinical applications of graph neural networks in computational
arXiv:2407.21604. histopathology: A review. Comput. Biol. Med. 164, 107201.
Kipf, T.N., Welling, M., 2016. Semi-supervised classification with graph convolutional Min, E., Chen, R., Bian, Y., Xu, T., Zhao, K., Huang, W., Zhao, P., Huang, J., Anani-
networks. arXiv preprint arXiv:1609.02907. adou, S., Rong, Y., 2022. Transformer for graphs: An overview from architecture
Kreuzer, D., Beaini, D., Hamilton, W., Létourneau, V., Tossou, P., 2021. Rethinking perspective. arXiv preprint arXiv:2202.08455.
graph transformers with spectral attention. Adv. Neural Inf. Process. Syst. 34, Mirabadi, A.K., Archibald, G., Darbandsari, A., Contreras-Sanz, A., Nakhli, R.E.,
21618–21629. Asadi, M., Zhang, A., Gilks, C.B., Black, P., Wang, G., et al., 2024. GRASP:
Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. Imagenet classification with deep Graph-structured pyramidal whole slide image representation. arXiv preprint arXiv:
convolutional neural networks. Adv. Neural Inf. Process. Syst. 25. 2402.03592.
Lafarge, M.W., Pluim, J.P., Eppenhof, K.A., Moeskops, P., Veta, M., 2017. Domain- Nair, A., Arvidsson, H., Gatica V, J.E., Tudzarovski, N., Meinke, K., Sugars, R.V.,
adversarial neural networks to address the appearance variability of histopathology 2022. A graph neural network framework for mapping histological topology in
images. In: Deep Learning in Medical Image Analysis and Multimodal Learning for oral mucosal tissue. BMC Bioinform. 23 (1), 506.
Clinical Decision Support: Third International Workshop, DLMIA 2017, and 7th Nakhli, R., Rich, K., Zhang, A., Darbandsari, A., Shenasa, E., Hadjifaradji, A.,
International Workshop, ML-CDS 2017, Held in Conjunction with MICCAI 2017, Thiessen, S., Milne, K., Jones, S.J., McAlpine, J.N., et al., 2024. Volta: an
QuÉBec City, QC, Canada, September 14, Proceedings 3. Springer, pp. 83–91. environment-aware contrastive cell representation learning for histopathology.
Lee, J., Lee, I., Kang, J., 2019. Self-attention graph pooling. In: International Conference Nature Commun. 15 (1), 3942.
on Machine Learning. PMLR, pp. 3734–3743. Nakhli, R., Zhang, A., Mirabadi, A., Rich, K., Asadi, M., Gilks, B., Farahani, H.,
Lee, J., Warner, E., Shaikhouni, S., Bitzer, M., Kretzler, M., Gipson, D., Pennathur, S., Bashashati, A., 2023. CO-PILOT: Dynamic top-down point cloud with conditional
Bellovich, K., Bhat, Z., Gadegbeku, C., et al., 2023. Clustering-based spatial analysis neighborhood aggregation for multi-gigapixel histopathology image representation.
(CluSA) framework through graph neural network for chronic kidney disease In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp.
prediction using histopathology images. Sci. Rep. 13 (1), 12701. 21063–21073.
Leo, P., Lee, G., Shih, N.N., Elliott, R., Feldman, M.D., Madabhushi, A., 2016. Evaluating Ochoa, J.G.D., Mustafa, F.E., 2022. Graph neural network modelling as a potentially
stability of histomorphometric features across scanner and staining variations: effective method for predicting and analyzing procedures based on patients’
prostate cancer diagnosis from whole slide images. J. Med. Imaging 3 (4), 047502. diagnoses. Artif. Intell. Med. 131, 102359.

20
S. Brussee et al. Medical Image Analysis 101 (2025) 103444

Palmal, S., Saha, S., Arya, N., Tripathy, S., 2024. CAGCL: Predicting short-and long-term Song, Z., Yang, X., Xu, Z., King, I., 2022. Graph-based semi-supervised learning: A
breast cancer survival with cross-modal attention and graph contrastive learning. comprehensive review. IEEE Trans. Neural Netw. Learn. Syst. 34 (11), 8174–8194.
IEEE J. Biomed. Health Inf.. Su, R., He, H., Sun, C., Wang, X., Liu, X., 2023. Prediction of drug-induced
Pan, L., Peng, Y., Li, Y., Wang, X., Liu, W., Xu, L., Liang, Q., Peng, S., 2024. hepatotoxicity based on histopathological whole slide images. Methods 212, 31–38.
SELECTOR: Heterogeneous graph network with convolutional masked autoencoder Sudharshan, P., Petitjean, C., Spanhol, F., Oliveira, L.E., Heutte, L., Honeine, P., 2019.
for multimodal robust prediction of cancer survival. Comput. Biol. Med. 172, Multiple instance learning for histopathological breast cancer image classification.
108301. Expert Syst. Appl. 117, 103–111.
Papillon, M., Sanborn, S., Hajij, M., Miolane, N., 2023. Architectures of topological Sureka, M., Patil, A., Anand, D., Sethi, A., 2020. Visualization for histopathology
deep learning: A survey on topological neural networks. arXiv preprint arXiv: images using graph convolutional neural networks. In: 2020 IEEE 20th International
2304.10031. Conference on Bioinformatics and Bioengineering. BIBE, IEEE, pp. 331–335.
Pati, P., Jaume, G., Fernandes, L.A., Foncubierta-Rodríguez, A., Feroce, F., Anni- Tang, S., Li, B., Yu, H., 2019. ChebNet: Efficient and stable constructions of deep neural
ciello, A.M., Scognamiglio, G., Brancati, N., Riccio, D., Di Bonito, M., et al., 2020. networks with rectified power units using chebyshev approximations. arXiv preprint
Hact-net: A hierarchical cell-to-tissue graph neural network for histopathological arXiv:1911.05467.
image classification. In: Uncertainty for Safe Utilization of Machine Learning in Tellez, D., Balkenhol, M., Karssemeijer, N., Litjens, G., van der Laak, J., Ciompi, F.,
Medical Imaging, and Graphs in Biomedical Image Analysis: Second International 2018. H and E stain augmentation improves generalization of convolutional
Workshop, UNSURE 2020, and Third International Workshop, GRAIL 2020, Held networks for histopathological mitosis detection. In: Medical Imaging 2018: Digital
in Conjunction with MICCAI 2020, Lima, Peru, October 8, 2020, Proceedings 2. Pathology. Vol. 10581, SPIE, pp. 264–270.
Springer, pp. 208–219. Tendle, A., Hasan, M.R., 2021. A study of the generalizability of self-supervised
Pati, P., Jaume, G., Foncubierta-Rodriguez, A., Feroce, F., Anniciello, A.M., Scog- representations. Mach. Learn. Appl. 6, 100124.
namiglio, G., Brancati, N., Fiche, M., Dubruc, E., Riccio, D., et al., 2022. Thang, D.C., Dat, H.T., Tam, N.T., Jo, J., Hung, N.Q.V., Aberer, K., 2022. Nature vs.
Hierarchical graph representations in digital pathology. Med. Image Anal. 75, nurture: Feature vs. structure for graph neural networks. Pattern Recognit. Lett.
102264. 159, 46–53.
Pati, P., Karkampouna, S., Bonollo, F., Comperat, E., Radic, M., Spahn, M., Mar- Turkki, R., Linder, N., Kovanen, P.E., Pellinen, T., Lundin, J., 2016. Antibody-supervised
tinelli, A., Wartenberg, M., Kruithof-de Julio, M., Rapsomaniki, M.A., 2023. Mul- deep learning for quantification of tumor-infiltrating immune cells in hematoxylin
tiplexed tumor profiling with generative AI accelerates histopathology workflows and eosin stained breast cancer samples. J. Pathol. Inform. 7 (1), 38.
and improves clinical predictions. bioRxiv. Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y., 2017. Graph
Paul, S., Yener, B., Lund, A.W., 2024. C2P-GCN: Cell-to-patch graph convolutional attention networks. arXiv preprint arXiv:1710.10903.
network for colorectal cancer grading. arXiv preprint arXiv:2403.04962. Veličković, P., Fedus, W., Hamilton, W.L., Liò, P., Bengio, Y., Hjelm, R.D., 2018. Deep
Pina, O., Vilaplana, V., 2022. Self-supervised graph representations of wsis. In: graph infomax. arXiv preprint arXiv:1809.10341.
Geometric Deep Learning in Medical Image Analysis. PMLR, pp. 107–117. Vorontsov, E., Bozkurt, A., Casson, A., Shaikovski, G., Zelechowski, M., Liu, S.,
Qiu, L., Kang, D., Wang, C., Guo, W., Fu, F., Wu, Q., Xi, G., He, J., Zheng, L., Zhang, Q., Mathieu, P., van Eck, A., Lee, D., Viret, J., et al., 2023. Virchow: A million-slide
et al., 2022. Intratumor graph neural network recovers hidden prognostic value of digital pathology foundation model. arXiv preprint arXiv:2309.07778.
multi-biomarker spatial heterogeneity. Nat. Commun. 13 (1), 4250. Wang, Z., Fan, K., Zhu, X., Liu, H., Meng, G., Wang, M., Li, A., 2023c. Cross-
Rampášek, L., Galkin, M., Dwivedi, V.P., Luu, A.T., Wolf, G., Beaini, D., 2022. Recipe domain nuclei detection in histopathology images using graph-based nuclei feature
for a general, powerful, scalable graph transformer. Adv. Neural Inf. Process. Syst. alignment. IEEE J. Biomed. Health Inf..
35, 14501–14515. Wang, H., Huang, G., Zhao, Z., Cheng, L., Juncker-Jensen, A., Nagy, M.L., Lu, X.,
Reynolds, H.M., Williams, S., Zhang, A.M., Ong, C.S., Rawlinson, D., Chakravorty, R., Zhang, X., Chen, D.Z., 2023a. CCF-GNN: A unified model aggregating appearance,
Mitchell, C., Haworth, A., 2014. Cell density in prostate histopathology images as microenvironment, and topology for pathology image classification. IEEE Trans.
a measure of tumor distribution. In: Medical Imaging 2014: Digital Pathology. Vol. Med. Imaging.
9041, SPIE, pp. 180–187. Wang, Z., Li, J., Pan, Z., Li, W., Sisk, A., Ye, H., Speier, W., Arnold, C.W., 2021b.
Hierarchical graph pathomic network for progression free survival prediction.
Rocheteau, E., Tong, C., Veličković, P., Lane, N., Liò, P., 2021. Predicting patient
In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021:
outcomes with graph representation learning. arXiv preprint arXiv:2101.03940.
24th International Conference, Strasbourg, France, September 27–October 1, 2021,
Saeidi, N., Karshenas, H., Shoushtarian, B., Hatamikia, S., Woitek, R., Mahbod, A., 2024.
Proceedings, Part VIII 24. Springer, pp. 227–237.
Breast histopathology image retrieval by attention-based adversarially regularized
Wang, S., Rong, R., Zhou, Q., Yang, D.M., Zhang, X., Zhan, X., Bishop, J., Chi, Z.,
variational graph autoencoder with contrastive learning-based feature extraction.
Wilhelm, C.J., Zhang, S., et al., 2023b. Deep learning of cell spatial organizations
arXiv preprint arXiv:2405.04211.
identifies clinically relevant insights in tissue images. Nat. Commun. 14 (1), 7872.
Santoiemma, P.P., Powell, Jr., D.J., 2015. Tumor infiltrating lymphocytes in ovarian
Wang, X., Yang, S., Zhang, J., Wang, M., Zhang, J., Huang, J., Yang, W., Han, X.,
cancer. Cancer Biol. Ther. 16 (6), 807–820.
2021a. Transpath: Transformer-based self-supervised learning for histopatholog-
Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G., 2008. The graph
ical image classification. In: Medical Image Computing and Computer Assisted
neural network model. IEEE Trans. Neural Netw. 20 (1), 61–80.
Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France,
Schütt, K.T., Sauceda, H.E., Kindermans, P.-J., Tkatchenko, A., Müller, K.-R., 2018.
September 27–October 1, 2021, Proceedings, Part VIII 24. Springer, pp. 186–195.
Schnet–a deep learning architecture for molecules and materials. J. Chem. Phys.
Wang, K., Zheng, F., Cheng, L., Dai, H.-N., Dou, Q., Qin, J., 2024. Breast cancer clas-
148 (24).
sification from digital pathology images via connectivity-aware graph transformer.
Shao, W., Shi, Y., Zhang, D., Zhou, J., Wan, P., 2024. Tumor micro-environment
IEEE Trans. Med. Imaging.
interactions guided graph learning for survival analysis of human cancers from
Wojciechowska, M., Malacrino, S., Garcia Martin, N., Fehri, H., Rittscher, J., 2021. Early
whole-slide pathological images. In: Proceedings of the IEEE/CVF Conference on
detection of liver fibrosis using graph convolutional networks. In: International
Computer Vision and Pattern Recognition. pp. 11694–11703.
Conference on Medical Image Computing and Computer-Assisted Intervention.
Sharma, H., Zerbe, N., Lohmann, S., Kayser, K., Hellwich, O., Hufnagl, P., 2015. A
Springer, pp. 217–226.
review of graph-based methods for image analysis in digital histopathology. Diagn.
Wu, Q., Zhao, W., Yang, C., Zhang, H., Nie, F., Jiang, H., Bian, Y., Yan, J., 2024.
Pathol. 1 (1).
Simplifying and empowering transformers for large-graph representations. Adv.
Shi, J., Shu, T., Wu, K., Jiang, Z., Zheng, L., Wang, W., Wu, H., Zheng, Y., 2024. Neural Inf. Process. Syst. 36.
Masked hypergraph learning for weakly supervised histopathology whole slide Wu, Y., Zuo, Y., Zhu, Q., Sheng, J., Zhang, D., Shao, W., 2023. Transfer learning-assisted
image classification. Comput. Methods Programs Biomed. 253, 108237. survival analysis of breast cancer relying on the spatial interaction between tumor-
Shi, J., Tang, L., Gao, Z., Li, Y., Wang, C., Gong, T., Li, C., Fu, H., 2023a. MG-Trans: infiltrating lymphocytes and tumors. In: International Conference on Medical Image
Multi-scale graph transformer with information bottleneck for whole slide image Computing and Computer-Assisted Intervention. Springer, pp. 612–621.
classification. IEEE Trans. Med. Imaging. Xiang, X., Wu, X., 2021. Multiple instance classification for gastric cancer pathological
Shi, J., Tang, L., Li, Y., Zhang, X., Gao, Z., Zheng, Y., Wang, C., Gong, T., Li, C., 2023b. images based on implicit spatial topological structure representation. Appl. Sci. 11
A structure-aware hierarchical graph-based multiple instance learning framework (21), 10368.
for pT staging in histopathological image. IEEE Trans. Med. Imaging. Xiao, W., Jiang, Y., Yao, Z., Zhou, X., Sui, X., Zheng, Y., 2022. LAD-GCN: Automatic
Shirzad, H., Velingker, A., Venkatachalam, B., Sutherland, D.J., Sinop, A.K., 2023. diagnostic framework for quantitative estimation of growth patterns during clinical
Exphormer: Sparse transformers for graphs. arXiv preprint arXiv:2303.06147. evaluation of lung adenocarcinoma. Front. Physiol. 13, 946099.
Shu, T., Shi, J., Sun, D., Jiang, Z., Zheng, Y., 2024. SlideGCD: Slide-based graph col- Xie, Y., Niu, G., Da, Q., Dai, W., Yang, Y., 2022b. Survival prediction for gastric
laborative training with knowledge distillation for whole slide image classification. cancer via multimodal learning of whole slide images and gene expression. In: 2022
In: International Conference on Medical Image Computing and Computer-Assisted IEEE International Conference on Bioinformatics and Biomedicine. BIBM, IEEE, pp.
Intervention. Springer, pp. 470–480. 1311–1316.
Sims, J., Grabsch, H.I., Magee, D., 2022. Using hierarchically connected nodes and Xie, C., Vanderbilt, C., Feng, C., Ho, D., Campanella, G., Egger, J., Plodkowski, A.,
multiple GNN message passing steps to increase the contextual information in cell- Girshman, J., Sawan, P., Arbour, K., et al., 2022a. Computational biomarker
graph classification. In: MICCAI Workshop on Imaging Systems for GI Endoscopy. predicts lung ICI response via deep learning-driven hierarchical spatial modelling
Springer, pp. 99–107. from H&E.

21
S. Brussee et al. Medical Image Analysis 101 (2025) 103444

Xing, X., Ma, Y., Jin, L., Sun, T., Xue, Z., Shi, F., Wu, J., Shen, D., 2021. A multi-scale Zheng, Y., Conrad, R.D., Green, E.J., Burks, E.J., Betke, M., Beane, J.E., Kolacha-
graph network with multi-head attention for histopathology image diagnosis. In: lama, V.B., 2023. Graph attention-based fusion of pathology images and gene
COMPAY 2021: The Third MICCAI Workshop on Computational Pathology. expression for prediction of cancer survival. bioRxiv.
Xu, K., Hu, W., Leskovec, J., Jegelka, S., 2018. How powerful are graph neural Zheng, Y., Jiang, B., Shi, J., Zhang, H., Xie, F., 2019. Encoding histopathological wsis
networks? arXiv preprint arXiv:1810.00826. using gnn for scalable diagnostically relevant regions retrieval. In: Medical Image
Xu, J., Xin, J., Shi, P., Wu, J., Cao, Z., Feng, X., Zheng, N., 2023. Lymphoma recognition Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International
in histology image of gastric mucosal biopsy with prototype learning. In: 2023 45th Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part I 22.
Annual International Conference of the IEEE Engineering in Medicine & Biology Springer, pp. 550–558.
Society. EMBC, IEEE, pp. 1–4. Zheng, Y., Jiang, Z., Xie, F., Shi, J., Zhang, H., Huai, J., Cao, M., Yang, X., 2020. Diag-
Yang, Z., Zhang, Y., Zhuo, L., Sun, K., Meng, F., Zhou, M., Sun, J., 2024. Prediction of nostic regions attention network (dra-net) for histopathology wsi recommendation
prognosis and treatment response in ovarian cancer patients from histopathology and retrieval. IEEE Trans. Med. Imaging 40 (3), 1090–1103.
images using graph deep learning: a multicenter retrospective study. Eur. J. Cancer Zheng, Y., Jiang, Z., Zhang, H., Xie, F., Shi, J., Xue, C., 2021. Histopathology wsi
199, 113532. encoding based on gcns for scalable and efficient retrieval of diagnostically relevant
Ying, Z., Bourgeois, D., You, J., Zitnik, M., Leskovec, J., 2019. Gnnexplainer: Generating regions. arXiv preprint arXiv:2104.07878.
explanations for graph neural networks. Adv. Neural Inf. Process. Syst. 32. Zheng, S., Zhu, Z., Liu, Z., Guo, Z., Liu, Y., Yang, Y., Zhao, Y., 2022. Multi-modal graph
Ying, R., He, R., Chen, K., Eksombatchai, P., Hamilton, W.L., Leskovec, J., 2018a. Graph learning for disease prediction. IEEE Trans. Med. Imaging 41 (9), 2207–2216.
convolutional neural networks for web-scale recommender systems. In: Proceedings Zhong, Z., Li, C.-T., Pang, J., 2023. Hierarchical message-passing graph neural networks.
of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Data Min. Knowl. Discov. 37 (1), 381–408.
Mining. pp. 974–983. Zhou, Y., Graham, S., Alemi Koohbanani, N., Shaban, M., Heng, P.-A., Rajpoot, N.,
Ying, Z., You, J., Morris, C., Ren, X., Hamilton, W., Leskovec, J., 2018b. Hierarchical 2019. Cgc-net: Cell graph convolutional network for grading of colorectal cancer
graph representation learning with differentiable pooling. Adv. Neural Inf. Process. histology images. In: Proceedings of the IEEE/CVF International Conference on
Syst. 31. Computer Vision Workshops. p. 0.
Yu, J., Xu, T., He, R., 2021. Towards the explanation of graph neural networks in Zhou, H., Tian, Z., Han, X., Du, S., Gao, Y., 2024. Ccrcc metastasis prediction via
digital pathology with information flows. arXiv preprint arXiv:2112.09895. exploring high-order correlations on multiple WSIs. In: International Conference
Zhang, M., Dong, B., Li, Q., 2022. MS-GWNN: multi-scale graph wavelet neural network on Medical Image Computing and Computer-Assisted Intervention. Springer, pp.
for breast cancer diagnosis. In: 2022 IEEE 19th International Symposium on 145–154.
Biomedical Imaging. ISBI, IEEE, pp. 1–5. Zhu, Y., Xu, W., Zhang, J., Du, Y., Zhang, J., Liu, Q., Yang, C., Wu, S., 2021a. A
Zhang, J., Hua, Z., Yan, K., Tian, K., Yao, J., Liu, E., Liu, M., Han, X., 2021. survey on graph structure learning: Progress and opportunities. arXiv preprint
Joint fully convolutional and graph convolutional networks for weakly-supervised arXiv:2103.03036.
segmentation of pathology images. Med. Image Anal. 73, 102183. Zhu, Y., Xu, W., Zhang, J., Liu, Q., Wu, S., Wang, L., 2021b. Deep graph structure
Zhang, Z., Zhao, Y., Duan, J., Liu, Y., Zheng, H., Liang, D., Zhang, Z., Li, Z.-C., 2024. learning for robust representations: A survey. arXiv preprint arXiv:2103.03036 14.
Pathology-genomic fusion via biologically informed cross-modality graph learning Zhu, D., Zhang, F., Wang, S., Wang, Y., Cheng, X., Huang, Z., Liu, Y., 2020. Under-
for survival analysis. arXiv preprint arXiv:2404.08023. standing place characteristics in geographic contexts through graph convolutional
Zhao, J., Wang, X., Shi, C., Hu, B., Song, G., Ye, Y., 2021. Heterogeneous graph struc- neural networks. Ann. Am. Assoc. Geogr. 110 (2), 408–420.
ture learning for graph neural networks. In: Proceedings of the AAAI Conference Zitnik, M., Agrawal, M., Leskovec, J., 2018. Modeling polypharmacy side effects with
on Artificial Intelligence. Vol. 35, pp. 4697–4705. graph convolutional networks. Bioinformatics 34 (13), i457–i466.
Zhao, W., Wang, S., Yeung, M., Niu, T., Yu, L., 2023. MulGT: Multi-task graph- Zuo, Y., Wu, Y., Lu, Z., Zhu, Q., Huang, K., Zhang, D., Shao, W., 2022. Identify
transformer with task-aware knowledge injection and domain knowledge-driven consistent imaging genomic biomarkers for characterizing the survival-associated
pooling for whole slide image analysis. arXiv preprint arXiv:2302.10574. interactions between tumor-infiltrating lymphocytes and tumors. In: International
Conference on Medical Image Computing and Computer-Assisted Intervention.
Springer, pp. 222–231.

22

You might also like