0% found this document useful (0 votes)

2 views

Evaluation_of_Deep_Neural_Network_Models_for_Insta

This document evaluates 15 deep neural network models for instance segmentation of lumbar spine MRI images, aiming to improve the diagnosis of intervertebral disc disease. A new data augmentation technique, SSMSpine, was introduced to create a synthetic dataset for training and evaluation purposes. The study highlights the advantages of using deep learning methods over traditional segmentation techniques and emphasizes the need for automated analysis in medical imaging.

Uploaded by

gita.adu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Evaluation_of_Deep_Neural_Network_Models_for_Insta

Uploaded by

gita.adu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

bioRxiv preprint doi: https://doi.org/10.1101/2024.04.02.587810; this version posted April 3, 2024.

The copyright holder for this preprint (which

was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY-NC-ND 4.0 International license.

Evaluation of Deep Neural Network Models for Instance

Segmentation of Lumbar Spine MRI

Jiasong Chen a, Linchen Qian a, Linhai Ma a, Timur Urakov b, Weiyong Gu c, and Liang Liang a

a
Department of Computer Science, University of Miami, Coral Gables, FL
b
Department of Neurological Surgery, University of Miami, Coral Gables, FL
c
Department of Mechanical and Aerospace Engineering, University of Miami, Coral Gables, FL

For correspondence:

Liang Liang, Ph.D.

Department of Computer Science

University of Miami

Ungar Building, Room 330K

Coral Gables, FL, 33146

Tel: (305) 284-8381; Email: [email protected]

bioRxiv preprint doi: https://doi.org/10.1101/2024.04.02.587810; this version posted April 3, 2024. The copyright holder for this preprint (which
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY-NC-ND 4.0 International license.

Abstract
Intervertebral disc disease, a prevalent ailment, frequently leads to intermittent or persistent low back pain, and
diagnosing and assessing of this disease rely on accurate measurement of vertebral bone and intervertebral disc
geometries from lumbar MR images. Deep neural network (DNN) models may assist clinicians with more efficient
image segmentation of individual instances (discs and vertebrae) of the lumbar spine in an automated way, which
is termed as instance image segmentation. In this work, we evaluated 15 existing DNN models for lumbar spine
MR image segmentation. We introduced a new data augmentation technique to create synthetic yet realistic MR
image dataset, named SSMSpine, which is made publicly available. The 15 image segmentation models are
evaluated on our private in-house dataset and the public SSMSpine dataset, using two metrics, Dice Similarity
Coefficient and 95% Hausdorff Distance. The SSMSpine dataset are available at
https://github.com/jiasongchen/SSMSpine.

Keywords: Lumbar spine MRI, Medical image instance segmentation, Data augmentation
bioRxiv preprint doi: https://doi.org/10.1101/2024.04.02.587810; this version posted April 3, 2024. The copyright holder for this preprint (which
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY-NC-ND 4.0 International license.

1. Introduction
The intervertebral discs in humans can undergo a profound degenerative process as early in the adolescence
(Cox et al., 2014; Kos et al., 2019), which can be accompanied by facet arthropathy and hypertrophy. This
degeneration can manifest as various conditions, including discogenic low back pain, disc herniation, spinal stenosis,
and spondylolisthesis, which may necessitate the implementation of surgical or non-surgical interventions aimed at
alleviating pain and restoring normal functionality. Magnetic resonance imaging (MRI) is the most widely used
technique for specifically quantifying intervertebral discs degeneration (IDD) by assessing changes in disc geometry
deformation and signal strength degradation (Mallio et al., 2022; Roberts et al., 2021; Tamagawa et al., 2022). The
information derived from imaging data is of utmost importance for medical professionals in terms of both
diagnosing medical conditions and planning appropriate treatments. Furthermore, this information serves as a
critical foundation for developing patient-specific computational models, which hold the potential to mature over
time and eventually enable accurate predictions of treatment outcomes within clinical settings. Presently, the
process of geometry reconstruction, signal measurements, and grading from magnetic resonance (MR) images
heavily relies on manual annotation. However, this process is not only time-consuming but also vulnerable to human
bias. Consequently, there is an urgent need for automated MR image analysis methods to address these challenges.

In medical imaging, semantic/instance image segmentation, which divides the images into distinct sections
at the pixel level so that each pixel belongs to a specific region, has the potential to be carried out through automated
techniques (Galbusera et al., 2019). The traditional methods, such as watershed and level set, have demonstrated
satisfactory performance in medical image segmentation tasks. The watershed method treats an image as a
topological map where intensity represents the altitude of the pixels. The watershed segmentation is determined by
the watershed lines on a topographic surface (Chevrefils et al., 2007; Huang and Chen, 2004). The level set method
performs image segmentation by utilizing dynamic variational boundaries (Huang et al., 2013). However, the
traditional method suffers from the clinical variation of different patients and the noise effect of different medical
imaging equipment, problems like the over-segmentation and time-consuming consist (Li et al., 2007).

Since the increasingly vast amount of medical imaging data and computational resources have become
available, machine learning (ML) methods, especially deep neural network techniques, show superior performance
than traditional methods. Convolutional neural network (CNN) has a significant edge over its predecessors in that
it possesses the capability to recognize essential components/features without requiring any human intervention
(Suganyadevi et al., 2022). CNNs are specifically designed to effectively utilize spatial and configural information
by accepting 2D or 3D images as input. This approach helps to prevent the loss or disruption of structural and
configural information in medical images (Shen et al., 2017).Various deep CNNs, including UNet++ (Zhou et al.,
2018), Attention U-Net (Oktay et al., 2018), MultiResUNet (Ibtehaz and Rahman, 2020) and UNeXt (Valanarasu
and Patel, 2022) have been proposed for image segmentation for different medical imaging modalities and different
organs (e.g. heart (Cao et al., 2023; Gao et al., 2021; Huang et al., 2023), lung (Zhou et al., 2018), brain
(Hatamizadeh et al., 2022, 2021; Hu et al., 2022; Valanarasu et al., 2021), pancreas (Oktay et al., 2018), gland
(Valanarasu et al., 2021; Wang et al., 2022), spine (Sekuboyina et al., 2018; Wang et al., 2023), retina blood vessels
(Moccia et al., 2018; Soomro et al., 2019), aorta (Berhane et al., 2020; Noothout et al., 2018; Pepe et al., 2020),
etc). Although these methods have achieved promising performance, there are still some limitations in a more
complex context coping with long-range dependency explicitly due to the intrinsic locality of convolutions.
Recently, Transformer, an ML technique, has shown exceptional performance not only on natural language
processing (NLP) challenges like machine translation (Vaswani et al., 2017), but also image analysis tasks including
image classification (Shamshad et al., 2023) and segmentation (Chen et al., 2021; Hatamizadeh et al., 2022, 2021;
Liu et al., 2021; Wang et al., 2022). Various variations of Transformer models have demonstrated that the global
information perceived by the self-attention operations is beneficial in medical imaging tasks. TransUNet was the
first Transformer-based network specifically for medical image segmentation on the synapse multi-organ
segmentation dataset (Chen et al., 2021). Wang et al. (2022) substituted the original skip connection scheme of U-
Net with the proposed UCTransNet that includes a multi-scale Channel Cross fusion Transformer and a Channel-
bioRxiv preprint doi: https://doi.org/10.1101/2024.04.02.587810; this version posted April 3, 2024. The copyright holder for this preprint (which
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY-NC-ND 4.0 International license.

wise Cross-Attention and tested the network on the gland segmentation dataset (Sirinukunwattana et al., 2017) and
synapse multi-organ segmentation dataset (Landman et al., 2015). Hatamizadeh et al., (2021, 2022) proposed both
UNETR and Swin UNETR for 3D medical imaging segmentation. UNETR utilizes a U-shape network with a vision
Transformer as the encoder and a CNN-based decoder. Swin UNETR is constructed by replacing the vision
transformer encoder in UNETR architecture with the Swin Transformer encoder. Feng et al. (2022) proposed SLT-
Net to utilize CSwin Transformer (Dong et al., 2022) as the encoder for feature extraction and the multi-scale
context Transformer as the skip connection for skin lesion segmentation. Swin-Unet adopted Swin Transformer
(Liu et al., 2021) with shifted windows as encoder and a symmetric Swin Transformer-based decoder with patch
expanding layer as decoder for multi-organ segmentation task (Cao et al., 2023). Pu et al., (2023) proposed a semi-
supervised learning framework with Inception-SwinUnet adopting convolution and sliding window attention in
different channels for vessel segmentation on small amount of labeled data. Besides the self-attention mechanism,
position embeddings are another crucial component of Transformer models. Regarding changing the order of the
input, a Transformer model is invariant (Vaswani et al., 2017) without position embeddings. However, since text
data inherently has a sequential structure, the absence of position information results in the ambiguous or undefined
meaning of a sentence (Dufter et al., 2022). For image segmentation, usually, an image patch is treated as a token,
and Transformers process the entire input sequence of tokens in parallel. With position embeddings, a Transformer
would be able to differentiate between image patches with similar content that appear in different positions in the
input image, which is beneficial for image segmentation applications. A variety of different methods may be used
to incorporate the position information into Transformer models. Absolute position encoding and relative position
encoding are two main categories to encode a token’s position information. Vaswani firstly introduced absolute and
relative position embedding in the vanilla Transformer model (Vaswani et al., 2017). Shaw extended the self-
attention mechanism with the capacity of effectively incorporating the representation of relative position (Shaw et
al., 2018). Valanarasu et al. (2021) proposed a gated position-sensitive axial attention mechanism to cope the
difficulty in learning position encoding for the images.
Specifically for lumbar spine research, instance segmentation of MR images is preferred, which not only
determines whether or not a pixel belongs to a disc, but also labeling the precise instance to which it belongs
(Galbusera et al., 2019). In recent years, most instance segmentation methods for spine image segmentation are
based on CNNs-only networks, and only a few Transformer-based networks are employed. For example, Kuang et
al. (2020) built an unsupervised segmentation network for spine image segmentation using the rule-based region of
interest (ROI) detection, a voting mechanism accompanied by a CNN network. Sekuboyina et al. (2018) proposed
a dual branch fully convolutional network that take advantages of both low-resolution attention information on two-
dimensional sagittal slices and high-resolution segmentation context on three-dimensional patches for effective
segmentation of the vertebrae. MLKCA-Unet incorporates multi-scale large-kernel convolution and convolutional
block attention into the U-net architecture for efficient feature extraction in spine MRI segmentation (Wang et al.,
2023). Pang et al. (2022) introduced a mixed-supervised segmentation network and it was trained on a strongly
supervised dataset with full segmentation labels and a weakly-supervised dataset with only key points. BianqueNet
combined new modules with a modified deeplabv3+ network (Chen et al., 2018), which includes a Swin
Transformer-skip connection module, for segmentation of lumbar intervertebral disc degeneration related regions
(Zheng et al., 2022).

It was shown that the Transformer-based models only perform effectively when trained on large-scale
datasets since the lack of inductive bias (Dosovitskiy et al., 2021). The utilization of Transformer-based networks
for medical imaging tasks poses a challenge due to the limited availability of labeled images in medical datasets.
Obtaining well-annotated medical imaging datasets presents significantly greater challenges compared to curating
traditional computer vision datasets. Dealing with expensive imaging equipment, complex image acquisition
pipelines, expert annotation requirements, and privacy concerns are all part of the problematic issues (Litjens et al.,
2017). This scarcity hampers the effective application of Transformer-based models in the medical domain. In such
scenarios, the adoption of suitable and feasible data augmentation techniques becomes crucial prior to model
training. These techniques can help to increase the effective size of the medical image dataset and improve the
performance of the Transformer-based model.
bioRxiv preprint doi: https://doi.org/10.1101/2024.04.02.587810; this version posted April 3, 2024. The copyright holder for this preprint (which
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY-NC-ND 4.0 International license.

In this study, we evaluated 15 DNN modes for lumbar spinal MRI instance segmentation. For this purpose,
we developed a novel data synthesis method based on statistical shape model (SSM) and biomechanics. This SSM-
biomechanics-based data synthesis method generates lumbar spine images with large and plausible deformations,
which can be used for model training and evaluation.

2. Review of DNN models for Lumbar Image Segmentation

2.1. Transformer-based Networks for the segmentation of non-spine medical images
Transformer-based networks have shown promising performance on medical image segmentation because
of their ability to capture long-range dependencies. Comparably, the inductive bias of CNN networks benefits from
its local connectivity and parameter sharing property. Therefore, many networks, combining CNN and
Transformers and leveraging both benefits, have been proposed in the past few years. TransUNet (Chen et al., 2021)
firstly combines CNNs and Transformer in a cascaded manner in its encoder for medical imaging tasks, in which
the low-level features are collected from the CNNs and then fed to the Transformer to capture global interactions.
Other designs of CNN and Transformer networks, including UNETR (Hatamizadeh et al., 2021), UTNet (Gao et
al., 2021), UCTransNet (Wang et al., 2022), Swin UNETR (Hatamizadeh et al., 2022) and MedT (Valanarasu et al.,
2021) have shown better segmentation performances in different medical image modalities, compared to CNN-only
networks. UNETR replaces the encoder of a UNet with Transformer layers with 1D learnable positional embedding,
and its decoder only has convolution layers (Hatamizadeh et al., 2021). UTNet inserts Transformer layers into a
Unet in a sequential manner: a convolution layer followed by a Transformer layer with learnable relative position
embedding (Gao et al., 2021). UCTransNet embeds Transformer layers into the skip-connections of a Unet with
fully learnable absolute position embedding (Wang et al., 2022). Swin UNETR replaces the Transformer in UNETR
with Swin-Transformer (Hatamizadeh et al., 2022). Medical Transformer (MedT) uses gated-axial Transformer
layers in the encoder of a Unet (Valanarasu et al., 2021). HSNet used PVTv2 (Wang et al., 2022) as encoder and a
dual-branch structure which Transformer branch and CNN branch fused by element-wise product as decoder for
polyp segmentation (Zhang et al., 2022).

2.2. Transformer-based Networks for the segmentation of spine images

To the best of our knowledge, there are only a few Transformer-based networks (You et al., 2022; Tao et
al., 2022) specifically designed for lumbar spine image segmentation, including EG-Trans3DUNet (You et al.,
2022), Spine-transformers (Tao et al., 2022), APSegmenter (Zhang et al., 2022), and BianqueNet (Zheng et al.,
2022). However, it is worth noting that most of these networks were developed for the segmentation of vertebral
bodies using CT modality, which may not be directly applicable to the segmentation of the intervertebral discs in
order to study disc degeneration. Amony those networks, only the code of BianqueNet is publicly available.
EG-Trans3DUNet combines two vision transformer branches to handle both local patches and resized
global spinal CT images (You et al., 2022), and it merges edge characteristics and semantic features generated by
a CNN-based edge detection block. Spine-transformers was designed for spinal CT image segmentation with a two-
stage pipeline to handle arbitrary Field-Of-View input images (Tao et al., 2022). In its first stage, a Transformer
with a CNN backbone is utilized as a 3D object detector to locate individual vertebrae, and then the input image is
cropped into regions of individual vertebrae. In its second stage, a multi-task encoder-decoder CNN network is
applied to each cropped region to segment the vertebra. The source code of the second stage is not publicly available.
APSegmenter (Zhang et al., 2022) combines a ViT-style Transformer with a mask Transformer to segment spine
X-ray images, and an adaptive postprocessing is applied to further refine the result. BianqueNet (Zheng et al., 2022)
employed a resnet101 network to perform feature extraction, followed by upsampling using the Swin Transformer-
skip connection module and a double upsampling operations. It also used a multi-scale feature fusion module to
generate the segmentation of regions associated with intervertebral disc degeneration.
bioRxiv preprint doi: https://doi.org/10.1101/2024.04.02.587810; this version posted April 3, 2024. The copyright holder for this preprint (which
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY-NC-ND 4.0 International license.

In addition to the segmentation of vertebral bodies, the segmentation of intervertebral discs (IVDs) is vital
for lumbar spinal disease diagnosis and treatment. Since the water content in IVDs cannot be revealed on CT images,
currently, MRI is the gold standard imaging modality for the evaluation of IVD pathologies (Kirnaz et al., 2022).
Our study aims to explore the benefit of combining CNN and transformer for instance segmentation of lumbar spine
MR images.

2.3. Self-Attention in Transformer

As opposed to convolutional operations, the self-attention mechanism within a Transformer network has
the fundamental advantage of effectively capturing global features and long-range contextual dependency. It uses
the Key, Query and Value vectors to better describe the features’ connections. Nonetheless, because of the inherited
properties of self-attention, it doesn’t retrieve the position information on its own, which is important for instance
segmentation. One of the best ways to tackle this problem is to use a well-designed position embedding mechanism
to inject the position relationships into the self-attention calculation.
2.3.1. The classic self-attention mechanism with additive position embedding
The plain Transformer is constructed with the multi-head self-attention modules (MHSA), which enable
Transformer to capture and utilize more accurate and detailed spatial information (Vaswani et al., 2017). Given an
input token set (e.g., image patches) 𝑋, three individual linear transformations (𝑊𝑄 , 𝑊𝐾 , 𝑊𝑉 ) are applied to 𝑋 to
generate query embedding (𝑄), key embedding (𝐾), and value embedding (𝑉). Then, the self-attention score, 𝐴𝑡𝑡𝑛,
is calculated as a scale-product of these three embedding as following:
𝑄𝐾 𝑇
𝐴𝑡𝑡𝑛 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 ( ) (1)
√𝑑

𝑂𝑢𝑡 = 𝐴𝑡𝑡𝑛 × 𝑉 (2)

In the above equations, 𝑄 = (𝑋 + 𝑃)𝑊𝑄 , 𝐾 = (𝑋 + 𝑃)𝑊𝐾 , 𝑉 = (𝑋 + 𝑃)𝑊𝑉 . 𝑃 is the encoded position, and 𝑑 is
the dimension of embedding in each head. 𝑂𝑢𝑡 is the final output of the self-attention module.

2.3.2. Position Embedding

Without using any position embedding, the self-attention mechanism in Eq.(1) is permutation-invariant and
cannot distinguish tokens (e.g., image patches) at different spatial locations. Therefore, it is essential to design
efficient position embedding cooperating with the self-attention mechanism.
The attention matrix in the classic self-attention Eq.(1) can be decomposed into three terms:
𝐴𝑡𝑡𝑛 ~ 𝑄𝐾 𝑇 = 𝑋𝑊𝑄 (𝑋𝑊𝐾 )𝑇 + 𝑃 𝑊𝑄 (𝑋𝑊𝐾 )𝑇 + 𝑋𝑊𝑄 (𝑃𝑊𝐾 )𝑇 + 𝑃𝑊𝑄 (𝑃𝑊𝐾 )𝑇 (3)

Content-Content Content-Position Position-Position

Therefore, the attention considers three interactions/correlations among tokens: 𝑋𝑊𝑄 (𝑋𝑊𝐾 )𝑇 for content to content
interaction, 𝑃 𝑊𝑄 (𝑋𝑊𝐾 )𝑇 + 𝑋𝑊𝑄 (𝑃𝑊𝐾 )𝑇 for interaction between content and position, and 𝑃𝑊𝑄 (𝑃𝑊𝐾 )𝑇 for
position to position interaction. As the instance segmentation task is location-specific, a well-designed interaction
between content and position could improve self-attention ability to utilize both content and position information
to accomplish the instance segmentation task.
bioRxiv preprint doi: https://doi.org/10.1101/2024.04.02.587810; this version posted April 3, 2024. The copyright holder for this preprint (which
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY-NC-ND 4.0 International license.

Generally, there are mainly two steps to define the position embedding. The first step is defining the position
function or distance function, which is used for encoding the position information of input tokens. There are plenty
of position functions such as index function, Euclidean distance, and sinusoidal functions etc., The second step is
defining methods to incorporate the encoded position information into self-attention.
Absolute position embedding and relative position embedding are two main position representation
methods to incorporate the position information into input tokens. Absolute position embedding encodes the
absolute positions of each input tokens as individual encoding vectors, and relative position embedding focus on
the relative positional relationships of pairwise input tokens (Lin et al., 2022; Wu et al., 2021). In the vanilla
Transformer designed for NLP (Vaswani et al., 2017), it used a combination of absolute and relative position
embedding to add position information to the tokens. It is inconclusive that if relative position embedding is better
or worse than absolute position embedding, and the answer seems to be dependent on specific applications (Dufter
et al., 2022; Huang et al., 2020; Shaw et al., 2018; Wu et al., 2021). Relative position encoding benefits from
capturing the details of relative distance/direction and is invariant to tokens’ shifting. The intuition is that, in the
self-attention mechanism, the pairwise positional relationship (both in terms of direction and distance) between
input elements might be more advantageous than absolute position of individual elements (Lin et al., 2022). In such
a case, position information in Transformer is an extensive research area, and various relative position encodings
have been proposed for medical imaging segmentation (Dosovitskiy et al., 2021). For example, UTNet proposed
the 2-dimensional relative position encoding by adding relative height and width information (Gao et al., 2021).
MedT updated self-attention mechanism with position encoding along the width axis with the inspiration of axial
attention (Valanarasu et al., 2021; Wang et al., 2020; Zhang and Zhang, 2022). The Parameter-Efficient Transformer
added a trainable position vector to the input to encode relative distances (Hu et al., 2022). In this work, we propose
a novel relative position embedding method for segmentation performance improvement.

2.3.3. Image Self-Attention in the Existing Image Segmentation Models

We compared instance segmentation performances of 15 representative image segmentation models,
including 11 Transformer-based models and 4 CNN-only models.
Table 1 summarizes the image self-attention mechanisms in the existing 11 Transformer-based models.
TransUnet and UNETR add the position embedding directly into the input patches (Chen et al., 2021; Hatamizadeh
et al., 2021). Swin-Unet, Swin UNETR, and Inception-SwinUnet incorporate the position embedding into attention
score instead of input tokens. (Cao et al., 2023; Hatamizadeh et al., 2022; Pu et al., 2023). BianqueNet utilized the
position embedding within Swin-Transformer (Zheng et al., 2022). According to the source code, SLT-Net
introduces a lepe distance to represent the position bias (embedding), which is directly added into the output matrix
(Feng et al., 2022). MedT proposed a gated position-sensitive axial attention mechanism where four learnable gates
(𝐺) control the amount of position embedding contained in key (𝐾) , query (𝑄) and value (𝑉) embeddings
(Valanarasu et al., 2021). UTNet introduced the 2-dimensional relative position encoding by adding relative position
logits along height and width dimensions(𝑅ℎ𝑒𝑖𝑔ℎ𝑡 , 𝑅𝑤𝑖𝑑𝑡ℎ ) into the key embedding (Gao et al., 2021). HSNet and
UCTransNet do not include position embedding in their models (Zhang et al., 2022; Wang et al., 2022)
bioRxiv preprint doi: https://doi.org/10.1101/2024.04.02.587810; this version posted April 3, 2024. The copyright holder for this preprint (which
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY-NC-ND 4.0 International license.

Table 1. Image Self-Attention in the attention/Transformer-based segmentation models

Model Self-Attention
𝑋𝑊𝑄 (𝑋𝑊𝐾 )𝑇
𝑠𝑜𝑓𝑡𝑚𝑎𝑥 ( + 𝑅) (𝑋𝑊𝑉 )
Swin UNETR √𝑑
Note: 𝑅 is called relative position bias in the reference
𝑋𝑊𝑄 (𝑋𝑊𝐾 )𝑇
𝑠𝑜𝑓𝑡𝑚𝑎𝑥 ( ) (𝑋𝑊𝑉 ) + 𝑙𝑒𝑝𝑒(𝑋𝑊𝑉 )
SLT-Net √𝑑
Note: 𝑙𝑒𝑝𝑒 is a convolution kernel
𝑇
(𝑋 + 𝑃)𝑊𝑄 ((𝑋 + 𝑃)𝑊𝐾 )
UNETR 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 ( ) (𝑋 + 𝑃)𝑊𝑉
√𝑑
𝑋𝑊𝑄 (𝑋𝑊𝐾 )𝑇
Inception- 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 ( + 𝑅) (𝑋𝑊𝑉 )
√𝑑
SwinUnet
Note: 𝑅 is called relative position bias in the reference
𝑋𝑊𝑄 (𝑋𝑊𝐾 )𝑇
HSNet 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 ( ) (𝑋𝑊𝑉 )
√𝑑
𝑋𝑊𝑄 (𝑋𝑊𝐾 )𝑇
𝑠𝑜𝑓𝑡𝑚𝑎𝑥 ( + 𝑅) (𝑋𝑊𝑉 )
Swin-Unet √𝑑
Note: 𝑅 is called relative position bias in the reference
𝑇
(𝑋 + 𝑃)𝑊𝑄 ((𝑋 + 𝑃)𝑊𝐾 )
TransUNet 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 ( ) (𝑋 + 𝑃)𝑊𝑉
√𝑑
𝑇
𝑠𝑜𝑓𝑡𝑚𝑎𝑥 (𝑋𝑊𝑄 (𝑋𝑊𝐾 )𝑇 + 𝐺𝑄 𝑋𝑊𝑄 (𝑅𝑄 ) + 𝐺𝐾 𝑋𝑊𝐾 (𝑅𝐾 )𝑇 ) (𝐺𝑉1 𝑋𝑊𝑉
MedT + 𝐺𝑉2 𝑅𝑉 )
Note: 𝐺𝑄 , 𝐺𝐾 , 𝐺𝑉1 , 𝐺𝑉2 are learnable parameters for gating mechanism.
𝑅𝑄 , 𝑅𝐾 , 𝑅𝑉 are the position bias for query, key and value
𝑇
𝑋𝑊𝑄 (𝑋𝑊𝐾 + 𝑅𝑤𝑖𝑑𝑡ℎ + 𝑅ℎ𝑒𝑖𝑔ℎ𝑡 )
𝑠𝑜𝑓𝑡𝑚𝑎𝑥 ( ) 𝑋𝑊𝑉
UTNet √𝑑
Note: 𝑅𝑤𝑖𝑑𝑡ℎ and 𝑅ℎ𝑒𝑖𝑔ℎ𝑡 are the relative height and width information

𝑋𝑐 𝑊𝑄 (𝑋𝑐 𝑊𝐾 )𝑇
𝑠𝑜𝑓𝑡𝑚𝑎𝑥 ( ) (𝑋𝑐 𝑊𝑉 )
UCTransNet √𝑑
Note: 𝑋𝑐 is composed of image channels instead of image patches.

𝑋𝑊𝑄 (𝑋𝑊𝐾 )𝑇
𝑠𝑜𝑓𝑡𝑚𝑎𝑥 ( + 𝑅) (𝑋𝑊𝑉 )
BianqueNet √𝑑
Note: 𝑅 is called relative position bias in the reference
bioRxiv preprint doi: https://doi.org/10.1101/2024.04.02.587810; this version posted April 3, 2024. The copyright holder for this preprint (which
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY-NC-ND 4.0 International license.

3. Methods
3.1. Novel data augmentation/synthesis method based on SSM and biomechanics
For medical image data augmentation, elastic deformation is often used for nonlinear deformation of the
images to increase diversity of training data (Ronneberger et al., 2015). Briefly, the input space is discretized by a
grid, and a random displacement field on the grid is generated by sampling from a normal distribution with standard
deviation equal to 𝜎 × grid resolution (i.e., the size of a grid cell). The parameter 𝜎 determines deformation
magnitude. To ensure a large deformation with diffeomorphism, the grid needs to be coarser than the input size (i.e.,
512 ×512). In this study, we applied two successive elastic deformations to each training image, with grid sizes of
9 × 9 and 17 × 17. As shown in Figure 1, when the deformation parameter 𝜎 is larger than 0.5, the generated images
and spine shapes are highly unrealistic.

Fig. 1. Data augmentation/synthesis examples using elastic deformation with sigma from 𝜎 to 2.0.
In this work, we developed a new method to synthesize lumbar spine MR images suitable for model training
and evaluation. First, we built a statistical shape model (SSM) of lumbar spine shapes (i.e., contours of discs and
vertebrae) in a dataset set, and the SSM represents the probability distribution of lumbar spine shapes. We refer the
reader to the reference papers (Ambellan et al., 2019; Sarkalkan et al., 2014; Hufnagel et al., 2007; Davies et al.,
2003; Cootes et al., 1995) for the details of constructing an SSM. By sampling from the SSM, different lumbar
spine shapes can be generated, and each generated lumbar spine shape could be considered from a virtual patient.
We note that the SSM technique has been used to generate virtual but realistic patient geometries in many
applications, such as generating aortic aneurysm geometries (Liang et al., 2017; van Veldhuizen et al., 2022;
Wiputra et al., 2023). Given a lumbar spine shape, if a lumbar spine MR image can be generated and consistent
with the shape, then we will have a new sample with ground-truth. For this purpose, we developed a biomechanics-
based method to generate a lumbar spine MR image 𝐼̃ from a lumbar spine shape 𝑆̃ by using a reference image 𝐼
with its ground-truth shape 𝑆. Intuitively speaking, a nonlinear spatial transform from the shape 𝑆 to the shape 𝑆̃ is
determined by using biomechanics principles, and then 𝐼̃ is obtained by applying the spatial transform to 𝐼. The
generated images are visually plausible, as shown in Figure 2.
In the implementation, we obtain the spatial transform 𝑇 from 𝑆̃ to 𝑆, and apply the spatial transform 𝑇 to
a regular mesh grid around the shape 𝑆̃ to obtain a deformed grid in the space of the reference image 𝐼, and then 𝐼̃
is obtained by interpolating pixel values of 𝐼 at each node of the deformed grid, i.e., 𝐼̃(𝑥, 𝑦) = 𝐼(𝑇(𝑥, 𝑦)) where
(𝑥, 𝑦) denotes a 2D spatial point and 𝑇(𝑥, 𝑦) is the transformed point. By using biomechanics and finite element
analysis (FEA), the spatial transform, i.e., the deformation field on the mesh grid, is obtained by minimizing the
following energy/loss function Π:
bioRxiv preprint doi: https://doi.org/10.1101/2024.04.02.587810; this version posted April 3, 2024. The copyright holder for this preprint (which
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY-NC-ND 4.0 International license.

4
Π = ∫𝑉 Ψ 𝑑𝑉 + 𝜆 ∙ 𝑎𝑣𝑔𝑖 ‖𝑇(𝑆̃(𝑖 )) − 𝑆(𝑖 )‖ (1)

In the above Eq.(1), 𝑉 represents the undeformed mesh grid of the image 𝐼̃ to be generated, 𝑆̃(𝑖 ) represents the i-
th point location of the shape 𝑆̃, 𝑇(𝑆̃(𝑖 )) is the transformed point location that needs to be equal to 𝑆(𝑖 ), and ‖∎‖
denotes vector L2 norm. 𝑎𝑣𝑔 is the average operator. 𝜆 is a weight constant (set to 16 in experiments). Ψ is the
strain energy density function that is determined by deformation and mechanical property of soft biological tissues
around the lumbar spine. From the perspective of FEA and biomechanics, the Eq.(1) simulates the scenario that
under the external “force” proportional to 𝑇(𝑆̃(𝑖 )) − 𝑆(𝑖 ) at each point of the lumbar spine shape, the soft biological
tissues of human body will deform and reach to an equilibrium state of minimum energy. To speed up the
optimization process, we use a deep neural network with sine activation functions to parameterize the transform T,
i.e., 𝑇(𝑥, 𝑦) = 𝐷𝑁𝑁(𝑥, 𝑦), and then the energy function in Eq.(1) becomes a function of the DNN internal
parameters. The energy optimization problem is resolved by adjusting/optimizing the parameters of the DNN. Once
the optimization is done, the deformation field is obtained and then the image 𝐼̃ is generated. Since our goal is to
generate plausible images for model training and evaluation in the image segmentation tasks, not for patient-specific
FEA simulation of human body deformation, we made an assumption about the strain energy density function to
reduce computation cost: tissue mechanical behavior follows the Ogden hyperelastic model with homogeneous
tissue properties (Ogden and Hill, 1997; Dwivedi et al., 2022). The whole procedure is implemented by using our
newly developed PyTorch-FEA library for large deformation biomechanics (Liang et al., 2023).

Fig. 2. data augmentation/synthesis examples (a-f) using our method. Please zoom in for better visualization.

3.2. Loss function

For each model, we use the original loss function if it is applicable to our application. If the original loss
function is not suitable (e.g., it is for binary classification only), then we use the loss function ℒ that combines a
Dice loss ℒ𝐷𝑖𝑐𝑒 and an area-weighted cross entropy loss ℒ𝑎𝑤_𝑐𝑒 .

ℒ = 0.5𝐿𝐷𝑖𝑐𝑒 + 0.5𝐿𝑎𝑤_𝑐𝑒 (2)

2 ∑𝑖,𝑗(𝑦(𝑖,𝑗)𝑝̂𝑚 (𝑖,𝑗))+𝜖
ℒ𝐷𝑖𝑐𝑒 = 1 − ∑ (3)
𝑖,𝑗(𝑦(𝑖,𝑗)+𝑝̂𝑚 (𝑖,𝑗))+𝜖
ℒ𝑎𝑤_𝑐𝑒 = − ∑𝑖,𝑗,𝑚 𝑤𝑚 𝑦𝑚 (𝑖, 𝑗)𝑙𝑜𝑔(𝑝̂𝑚 (𝑖, 𝑗)) (4)

𝑝̂𝑚 (𝑖, 𝑗) is the m-th element in the output tensor from the softmax layer at the pixel location (𝑖, 𝑗), which corresponds
to the m-th object (a disc or a vertebra) at the location (𝑖, 𝑗). 𝑦(𝑖, 𝑗) is the true label of the pixel at location (𝑖, 𝑗).
bioRxiv preprint doi: https://doi.org/10.1101/2024.04.02.587810; this version posted April 3, 2024. The copyright holder for this preprint (which
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY-NC-ND 4.0 International license.

𝑤𝑚 is a nonnegative weight inversely proportional to the area of the m-th object, and ∑𝑚 𝑤𝑚 = 1. 𝜖 is a small
constant (1e-4) to prevent the case of 0/0 in Eq.(3). In a lumbar spine image, where the background area is
substantially larger than the combined area of the discs and vertebrae, employing area-weighted cross entropy loss
effectively reduces the influence of the background in the loss function.

4. Experiments

4.1. Original Dataset and Augmented Datasets

Our dataset consists of a total of 100 patients’ lumbar spine MR images from the University of Miami
medical school, with personal identification information removed. Following the protocol in (Hu et al., 2018), five
lumbar discs (D1, D2, D3, D4, D5) and six vertebral bones (L1, L2, L3, L4, L5, S1) in each patient’s mid-sagittal
MR image was manually annotated by three trained operators to identify and mark the boundaries and landmarks
of the lumbar discs and vertebrae. To ensure accuracy and consistency, the three operators engaged in discussion
to reach a consensus on the best annotation (i.e., ground-truth), for each mid-sagittal MR image. In the literature,
lumbar disc D1 is also called L1/L2, similarly D2 for L2/L3, D3 for L3/L4, D4 for L4/L5, and D5 for L5/S1. The
MR images are of various resolutions, and each of the images are resized to 512 × 512. Each image is also pre-
processed independently by normalizing the intensities into range [0,1]. The average pixel spacing is 0.7004mm.
The dataset of 100 patients was divided into 70 training samples, 10 validation samples, and 20 test samples, and
those samples are referred to as the original training/validation/test samples in this paper.
Subsequently, the augmented dataset, named SSMSpine, is generated by using our method in Section 3.1.
The SSMSpine dataset is divided into three sets: an augmented training set with 7000 samples, an augmented
validation set with 250 samples, and an augmented test set with 2500 samples. To generate the augmented test set,
an SSM was constructed using the 20 original test samples, and then 125 virtual shapes were generated from the
SSM. Using each of the 20 original test samples as a reference image and each of the 125 virtual shapes, 20 × 125
(=2500) new MR images were generated using the method in Section 3.1. The augmented training and validation
sets were generated in a similar way using another SSM built on the original training and validation samples.

4.2. Model evaluation and comparison

In our study, we trained and compared a total of 15 models on our lumbar spine MRI dataset. These models
are Attention U-Net (Oktay et al., 2018), HSNet (Zhang et al., 2022), Inception-SwinUnet (Pu et al., 2023), MedT
(Valanarasu et al., 2021), MultiResUNet (Ibtehaz and Rahman, 2020), SLT-Net (Feng et al., 2022), Swin-Unet (Cao
et al., 2023), UNETR (Hatamizadeh et al., 2021), Swin UNETR (Hatamizadeh et al., 2022), TransUNet (Chen et
al., 2021), UCTransNet (Wang et al., 2022), UNet++ (Zhou et al., 2018), UNeXt (Valanarasu and Patel, 2022),
UTNet (Gao et al., 2021), and BianqueNet (Zheng et al., 2022). We were unable to test all models mentioned in
Section 2 due to either unavailability (e.g., no source code) or incompatibility (e.g., size not matching). By training
and evaluating these models, we aimed to compare their performance and determine the most effective approach
for spine MRI instance segmentation on our dataset. To ensure compatibility with our dataset, we made minor
adjustments to the original codes of some models if necessary.
We conducted two experiments: experiment-A and experiment-B. In experiment-A, each model is trained
using the original training set with elastic deformation (Section 3.1). The top 4 models in experiment-A are selected
for training using the augmented training set in experiment-B. In both experiments, we applied random translations
to the input images within 16 pixels during training, which is a common data augmentation method. In both
experiments, the augmented validation set is used for hyper-parameter tuning. The evaluation process is divided
into two parts: instance segmentation evaluation and translation robustness evaluation.
bioRxiv preprint doi: https://doi.org/10.1101/2024.04.02.587810; this version posted April 3, 2024. The copyright holder for this preprint (which
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY-NC-ND 4.0 International license.

In the instance segmentation evaluation, we assess model segmentation performance for individual lumbar
spine instances/objects in the MR images. The instance segmentation task is formulated as a task of labeling 12
distinct objects (5 lumbar discs, 6 vertebrae, and a background). The input to each model is a single-channel mid-
sagittal lumbar spine MR image with size of 512 × 512 pixels. In the segmentation output, each class is represented
by a distinct channel as a binary segmentation map. We employ both the Dice Similarity Coefficient and the 95%
Hausdorff Distance (HD95) as evaluation metrics. In both experiments, the original test set with 20 samples and
the augmented test set with 2500 samples are used separately for model performance assessment on unseen data.
Each model was trained on a Nvidia A6000 GPU with 48GB VRAM. During the training process, a batch
size of 6 was used for most of the models, except for training MedT, where a batch size of 2 was utilized due to its
large model size. The Adam optimizer with an initial learning rate of 0.0001 was employed for model optimization.
A low learning rate is generally preferred to ensure stable convergence during training. Although a low learning
rate might slow down the convergence process, it helps avoid convergence failures. Gradient clipping is applied
during training to prevent potentially large gradients from causing instability in the learning process. We performed
model selection based on the performance on the validation set.
4.3. Results of Experiment-A with the original training set
In experiment-A, the 15 models were trained on the original training set with elastic deformation and
random-shift, and then the models were evaluated on both the original test set and the augmented test set to measure
instance segmentation accuracy and translation robustness. Figure 3 displays the performance of the top 4 models.

Fig. 3. Top 4 Model Comparison Results (Dice) on the augmented test set
4.3.1. Instance Segmentation Evaluation
Table 2 summarizes the instance segmentation results for vertebrae bodies (VB) and intervertebral discs
(IVD) in terms of the Dice Similarity Coefficient on the original test set consisting of 20 samples. For better clarity
and ease of understanding, we have converted Dice Similarity Coefficient into percentage ratios between 0 and
100%. The results show that Transformer-based models surpasses the CNN-only models.
bioRxiv preprint doi: https://doi.org/10.1101/2024.04.02.587810; this version posted April 3, 2024. The copyright holder for this preprint (which
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY-NC-ND 4.0 International license.

Table 2. Dice (the higher, the better) of each model on the original test set

L1 L2 L3 L4 L5 S1 D1 D2 D3 D4 D5 Average

93.126 96.813 93.874 95.805 94.380 92.194 92.757 94.299 92.821 89.777 91.077 93.357
Swin UNETR
±7.356 ±1.641 ±11.389 ±3.189 ±4.816 ±6.449 ±4.568 ±3.519 ±6.563 ±7.645 ±10.179 ±4.098
91.967 97.052 94.328 93.619 95.763 93.805 92.990 92.938 90.282 91.481 92.589 93.347
SLT-Net
±6.991 ±0.902 ±10.261 ±13.504 ±2.553 ±1.815 ±4.893 ±6.914 ±16.896 ±5.639 ±6.048 ±5.886
94.355 96.795 91.900 95.546 94.099 92.988 93.261 93.554 90.931 90.435 91.446 93.210
UNETR
±3.131 ±1.709 ±18.125 ±3.196 ±3.814 ±4.539 ±3.878 ±4.254 ±9.003 ±5.678 ±4.198 ±4.333
89.330 93.332 91.819 95.245 96.011 94.193 90.069 92.566 93.635 93.539 94.326 93.097
BianqueNet
±21.689 ±16.860 ±18.667 ±6.709 ±2.155 ±5.037 ±21.017 ±9.930 ±4.852 ±3.373 ±4.325 ±2.840
Inception- 91.397 97.335 91.926 93.837 91.156 91.592 93.533 93.281 89.739 87.727 91.233 92.069
SwinUnet ±19.415 ±0.812 ±20.366 ±12.637 ±18.678 ±8.410 ±4.614 ±7.055 ±20.753 ±14.249 ±11.216 ±9.514
94.240 94.116 90.123 91.649 93.456 92.662 92.948 88.941 89.008 89.158 91.986 91.662
HSNet
±12.384 ±10.544 ±22.355 ±21.243 ±11.688 ±9.901 ±7.903 ±21.43 ±21.266 ±17.099 ±12.085 ±13.085
90.839 93.514 91.781 91.781 90.076 92.197 91.638 90.000 89.755 87.864 88.554 90.727
Swin-Unet
±20.912 ±16.261 ±21.085 ±21.08 ±21.037 ±6.612 ±11.014 ±20.632 ±20.720 ±20.629 ±16.934 ±16.984
92.354 95.835 91.220 89.282 88.818 87.325 92.082 91.271 87.280 85.790 89.065 90.029
UNeXt
±15.512 ±4.808 ±21.036 ±21.93 ±18.828 ±14.428 ±9.072 ±16.774 ±21.834 ±19.574 ±17.579 ±12.849
92.245 87.526 88.178 91.605 91.219 89.705 81.444 86.309 88.812 88.060 90.294 88.672
TransUNet
±9.199 ±25.59 ±24.371 ±21.122 ±21.025 ±20.496 ±32.928 ±24.176 ±21.076 ±20.596 ±20.782 ±20.293
88.707 94.461 91.011 90.999 87.121 87.183 87.639 89.689 87.318 82.663 85.180 88.361
MedT
±20.984 ±8.57 ±20.989 ±20.981 ±25.111 ±15.317 ±20.944 ±19.384 ±20.279 ±23.855 ±22.131 ±18.641
81.322 82.886 85.159 88.802 90.776 87.080 77.511 78.711 83.856 85.608 89.312 84.639
UTNet
±33.18 ±31.442 ±29.06 ±24.539 ±20.988 ±14.823 ±32.691 ±32.650 ±28.478 ±21.674 ±18.307 ±23.716
75.346 78.997 81.862 80.907 86.107 86.968 80.058 80.407 76.787 81.144 88.063 81.513
UCTransNet
±31.945 ±28.28 ±26.04 ±29.371 ±22.629 ±20.202 ±25.635 ±22.833 ±30.266 ±25.322 ±19.596 ±20.303
Attention 74.333 74.288 68.732 76.499 86.964 85.474 73.700 68.943 68.092 80.147 87.694 76.806
U-Net ±35.642 ±33.164 ±36.905 ±29.188 ±22.583 ±25.111 ±37.113 ±37.476 ±35.719 ±25.685 ±21.912 ±25.335
68.734 66.998 74.033 71.468 81.297 83.980 65.450 71.473 67.208 79.834 84.529 74.091
UNet++
±40.68 ±35.694 ±27.911 ±40.229 ±31.709 ±27.527 ±39.937 ±28.932 ±40.117 ±28.708 ±26.387 ±26.461
77.421 74.309 66.317 66.828 76.870 84.644 75.710 70.025 58.563 72.553 75.217 72.587
MultiResUNet
±31.596 ±29.915 ±34.739 ±35.945 ±35.698 ±22.624 ±33.508 ±27.873 ±39.241 ±33.342 ±31.98 ±26.547
Table 3 shows the instance segmentation results measured by 95% Hausdorff distance (HD95) on the
original test set. The results show that Transformer-based models surpasses the CNN-only models.
Table 3. HD95 (the lower, the better) of each model on the original test set.

L1 L2 L3 L4 L5 S1 D1 D2 D3 D4 D5 Average
8.594 11.024 2.992 3.147 3.759 9.454 19.057 1.894 3.466 3.814 8.128 6.848
Swin UNETR
±12.047 ±33.718 ±3.483 ±6.521 ±5.905 ±13.059 ±41.678 ±2.472 ±5.801 ±3.828 ±12.917 ±10.137
13.716 1.332 2.371 2.778 2.363 8.11 2.348 2.004 3.118 2.865 4.762 4.161
SLT-Net
±16.122 ±0.64 ±3.562 ±5.493 ±2.252 ±17.93 ±3.912 ±2.338 ±5.032 ±2.174 ±12.584 ±4.089
3.744 1.512 3.679 2.237 5.927 4.724 2.442 1.704 3.589 3.277 7.517 3.668
UNETR
±6.201 ±1.261 ±7.65 ±1.722 ±6.302 ±5.909 ±3.25 ±1.3 ±4.982 ±2.968 ±8.093 ±2.34
5.056 7.158 3.624 3.375 2.290 5.308 4.278 5.106 3.138 5.857 3.713 4.446
BianqueNet
±10.105 ±15.206 ±7.993 ±7.988 ±2.285 ±9.319 ±9.886 ±10.580 ±7.382 ±12.764 ±6.602 ±9.739
Inception- 3.378 1.197 3.5 3.033 3.362 6.39 7.53 2.051 1.59 3.957 5.715 3.791
SwinUnet ±7.275 ±0.555 ±8.067 ±6.334 ±3.809 ±10.21 ±25.967 ±3.129 ±0.876 ±4.714 ±9.957 ±4.134
3.006 3.634 4.121 3.94 5.547 5.196 2.869 5.378 4.145 5.966 5.045 4.441
HSNet
±6.714 ±8.238 ±8.874 ±9.34 ±11.36 ±10.482 ±7.442 ±11.527 ±9.551 ±11.862 ±11.035 ±7.376
2.131 2.226 1.753 1.971 2.767 4.732 2.221 2.937 1.679 2.08 6.339 2.803
Swin-Unet
±3.459 ±4.686 ±1.591 ±2.484 ±2.596 ±7.47 ±3.907 ±6.313 ±1.159 ±1.308 ±10.19 ±2.24
2.805 2.016 1.933 7.259 13.485 10.8 5.98 2.536 5.69 7.292 11.01 6.437
UNeXt
±4.959 ±2.768 ±1.542 ±19.018 ±17.479 ±15.286 ±14.131 ±4.028 ±15.364 ±13.081 ±15.926 ±8.162
7.727 8.556 9.261 3.836 4.372 7.595 8.968 8.662 3.876 5.176 4.15 6.562
TransUNet
±15.874 ±20.98 ±19.824 ±9.163 ±9.063 ±12.805 ±22.198 ±19.145 ±9.056 ±10.127 ±9.494 ±11.646
8.665 2.557 6.168 2.386 7.056 10.406 5.684 7.024 63.762 5.359 18.524 12.508
MedT
±14.016 ±2.564 ±11.39 ±2.751 ±10.149 ±10.254 ±10.784 ±11.272 ±83.727 ±8.133 ±33.337 ±7.994
6.897 7.209 10.986 5.092 8.523 20.203 7.421 13.133 11.486 10.463 8.903 10.029
UTNet
±12.365 ±12.726 ±16.485 ±10.427 ±13.468 ±35.303 ±13.484 ±18.251 ±17.583 ±17.173 ±13.608 ±12.359
11.554 21.176 30.833 12.752 11.034 9.536 20.424 23.576 20.578 14.814 7.444 16.702
UCTransNet
±13.053 ±20.644 ±45.698 ±15.773 ±15.015 ±15.319 ±32.647 ±27.95 ±22.595 ±16.837 ±13.146 ±12.286
Attention 9.394 13.821 16.946 21.225 10.944 7.474 13.878 15.805 28.632 10.027 7.124 14.116
U-Net ±12.866 ±17.43 ±18.222 ±32.217 ±15.876 ±13.279 ±18.345 ±19.732 ±35.708 ±15.161 ±13.263 ±13.022
6.733 16.872 22.11 15.901 12.515 9.518 9.862 21.599 19.863 10.369 9.929 14.116
UNet++
±11.95 ±17.754 ±20.232 ±19.52 ±16.298 ±14.037 ±14.455 ±20.421 ±26.463 ±16.036 ±15.288 ±11.826
8.841 17.292 19.035 15.814 9.93 9.582 9.63 18.331 22.223 10.918 10.632 13.839
MultiResUNet
±11.614 ±16.037 ±17.013 ±15.802 ±13.24 ±13.03 ±14.817 ±18.619 ±19.32 ±15.737 ±14.034 ±11.524
bioRxiv preprint doi: https://doi.org/10.1101/2024.04.02.587810; this version posted April 3, 2024. The copyright holder for this preprint (which
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY-NC-ND 4.0 International license.

We also assessed the segmentation performance of all models on the augmented test set consisting of 2500
samples. Table 4 (Dice) and Table 5 (HD95) summarize the instance segmentation performance of each model
evaluated on the augmented test set.
Table 4. Dice (the higher, the better) of each model on the augmented test set

L1 L2 L3 L4 L5 S1 D1 D2 D3 D4 D5 Average
93.108 96.337 93.286 95.270 94.695 93.072 93.127 94.191 91.970 90.438 91.308 93.346
Swin UNETR
±8.399 ±3.289 ±14.544 ±5.054 ±4.905 ±5.52 ±5.447 ±4.227 ±9.192 ±6.799 ±6.215 ±4.983
91.925 96.736 93.889 92.538 94.557 92.437 92.712 93.833 89.858 90.419 91.149 92.732
SLT-Net
±11.161 ±2.401 ±12.742 ±18.37 ±8.863 ±8.045 ±7.199 ±4.888 ±18.676 ±11.831 ±11.399 ±9.07
93.839 96.260 91.773 94.984 94.052 89.817 93.039 93.865 90.486 90.317 89.976 92.583
UNETR
±6.152 ±3.454 ±18.475 ±5.13 ±5.003 ±10.051 ±3.998 ±4.332 ±9.81 ±5.791 ±6.801 ±5.384
92.103 95.329 94.670 95.486 96.003 95.191 92.275 93.724 93.752 93.384 94.187 94.191
BianqueNet
±16.133 ±10.126 ±10.097 ±7.397 ±2.653 ±2.992 ±13.814 ±7.138 ±6.614 ±5.339 ±4.917 ±8.996
Inception- 91.988 96.977 91.923 92.382 94.237 92.904 93.660 93.689 89.551 88.781 91.991 92.553
SwinUnet ±18.095 ±3.847 ±20.632 ±18.076 ±8.303 ±9.058 ±6.763 ±6.904 ±20.673 ±14.933 ±7.246 ±10.665
94.628 95.868 92.061 91.512 95.849 95.682 93.117 93.874 89.820 89.645 94.570 93.330
HSNet
±14.373 ±10.502 ±18.001 ±21.12 ±8.781 ±3.865 ±13.282 ±10.222 ±19.811 ±17.916 ±4.643 ±9.196
91.328 93.299 91.703 91.669 91.198 91.366 91.283 90.320 89.189 87.815 90.297 90.861
Swin-Unet
±18.901 ±17.946 ±21.094 ±21.131 ±18.97 ±12.268 ±14.972 ±20.105 ±20.739 ±20.844 ±13.743 ±17.347
91.280 95.036 91.013 88.953 91.170 89.812 91.255 92.186 87.131 87.645 90.288 90.525
UNeXt
±16.796 ±8.14 ±21.094 ±23.119 ±13.263 ±12.544 ±11.141 ±12.876 ±22.224 ±15.647 ±10.6 ±13.111
92.319 93.879 92.586 94.676 94.741 94.003 90.806 91.343 90.749 91.550 92.894 92.686
TransUNet
±16.2 ±16.636 ±17.395 ±10.604 ±7.245 ±5.979 ±18.041 ±16.735 ±16.481 ±10.169 ±6.709 ±10.357
86.861 92.629 91.421 89.039 88.147 84.398 86.124 89.722 86.538 83.335 83.075 87.390
MedT
±23.723 ±15.296 ±19.013 ±21.809 ±20.587 ±18.84 ±22.55 ±19.247 ±20.526 ±23.549 ±21.478 ±18.852
82.709 80.782 84.073 85.725 91.096 91.306 76.822 80.621 83.916 85.401 90.734 84.835
UTNet
±28.31 ±30.95 ±26.989 ±27.893 ±16.215 ±9.014 ±30.988 ±28.224 ±27.028 ±20.584 ±11.839 ±19.788
72.343 76.444 81.393 79.606 87.287 92.642 73.083 81.171 74.872 79.460 89.673 80.725
UCTransNet
±34.307 ±29.91 ±24.405 ±28.661 ±18.466 ±8.999 ±32.634 ±23.118 ±29.442 ±23.231 ±13.893 ±18.567
Attention 68.915 69.948 65.592 75.681 86.721 93.227 66.819 63.876 66.104 75.905 91.537 74.939
U-Net ±36.382 ±35.755 ±34.768 ±28.502 ±22.809 ±8.204 ±38.116 ±37.865 ±32.424 ±28.8 ±11.763 ±20.816
65.096 69.498 69.993 66.925 81.392 90.460 64.565 72.047 63.061 72.931 87.159 73.012
UNet++
±40.792 ±36.367 ±31.743 ±37.244 ±28.029 ±13.191 ±40.33 ±31.75 ±38.234 ±31.167 ±20.266 ±23.317
73.766 72.475 70.752 70.171 78.962 87.809 68.730 69.854 68.738 72.479 79.657 73.945
MultiResUNet
±33.559 ±32.877 ±32.574 ±35.186 ±32.165 ±18.254 ±36.759 ±34.213 ±34.452 ±33.345 ±28.326 ±25.091

Table 5. HD95 value (the lower, the better) of each model on the augmented test set
L1 L2 L3 L4 L5 S1 D1 D2 D3 D4 D5 Average
11.353 9.468 7.005 5.783 5.375 16.07 13.829 3.708 6.191 4.775 11.15 8.609
Swin UNETR
±33.189 ±31.478 ±23.16 ±17.828 ±11.093 ±35.564 ±40.509 ±15.61 ±18.392 ±11.302 ±17.265 ±15.102
8.464 2.537 3.463 3.078 3.766 5.409 3.96 2.376 3.835 4.023 5.751 4.242
SLT-Net
±17.817 ±9.812 ±9.614 ±8.708 ±9.767 ±15.909 ±15.752 ±5.825 ±13.919 ±10.257 ±11.967 ±7.772
5.873 2.63 4.111 2.506 5.014 10.871 2.772 2.103 4.111 3.735 6.404 4.557
UNETR
±10.615 ±6.155 ±7.566 ±3.004 ±7.516 ±19.237 ±4.803 ±3.727 ±6.555 ±4.708 ±8.943 ±4.305
5.555 5.767 3.398 3.173 3.318 3.540 5.303 5.818 2.231 2.868 3.384 4.032
BianqueNet
±10.365 ±13.823 ±8.156 ±6.849 ±7.518 ±11.313 ±11.274 ±15.604 ±4.499 ±6.468 ±7.982 ±10.045
Inception- 3.41 1.529 3.685 3.125 4.276 5.534 1.677 2.424 3.671 3.923 6.093 3.577
SwinUnet ±10.003 ±5.754 ±10.956 ±7.22 ±10.702 ±13.91 ±3.326 ±7.185 ±9.82 ±6.581 ±11.203 ±5.158
2.575 2.72 3.826 2.955 1.958 2.37 2.165 2.778 3.13 2.894 2.258 2.693
HSNet
±6.116 ±9.091 ±9.386 ±7.523 ±5.101 ±4.584 ±6.245 ±7.9 ±8.164 ±6.34 ±6.682 ±4.533
3.718 2.281 2.092 2.044 4.841 5.539 2.601 2.392 2.103 2.735 6.01 3.305
Swin-Unet
±12.477 ±9.182 ±5.183 ±5.193 ±11.639 ±9.335 ±10.854 ±9.554 ±4.736 ±5.601 ±10.775 ±4.993
7.212 4.233 4.913 7.77 11.646 9.462 8.457 3.848 7.327 7.806 11.651 7.666
UNeXt
±18.727 ±12.846 ±16.54 ±17.801 ±17.788 ±21.051 ±20.29 ±10.381 ±19.721 ±12.668 ±16.133 ±9.932
6.245 6.008 6.191 5.246 6.385 7.016 5.417 8.688 7.87 5.241 9.589 6.718
TransUNet
±18.622 ±18.599 ±17.579 ±18.663 ±20.503 ±19.462 ±18.977 ±23.985 ±21.105 ±14.506 ±24.032 ±13.318
8.115 3.878 4.385 5.283 10.009 13.935 6.139 4.428 72.075 8.635 12.997 13.625
MedT
±12.876 ±6.953 ±7.671 ±9.288 ±13.817 ±16.408 ±11.146 ±9.506 ±82.33 ±16.218 ±17.864 ±9.645
8.667 10.592 15.484 9.202 14.237 14.724 10.674 15.102 11.572 10.649 8.581 11.771±
UTNet
±17.117 ±20.449 ±20.453 ±17.838 ±27.825 ±22.491 ±16.985 ±21.452 ±19.287 ±18.77 ±16.592 11.935
14.267 21.247 23.542 20.769 12.504 9.226 21.576 27.888 28.971 17.921 10.608 18.956
UCTransNet
±20.828 ±25.68 ±31.268 ±31.321 ±16.412 ±22.252 ±32.676 ±31.415 ±34.852 ±25.073 ±19.215 ±13.438
Attention 21.033 18.984 22.082 31.271 11.244 6.783 18.986 19 32.27 19.275 11.469 19.309
U-Net ±29.85 ±20.704 ±22.065 ±36.969 ±20.511 ±16.671 ±25.984 ±21.259 ±31.194 ±26.051 ±28.816 ±14.67
18.011 16.192 23.953 18.749 12.423 6.939 13.378 20.232 25.36 17.679 8.728 16.513
UNet++
±27.449 ±19.325 ±20.546 ±21.889 ±16.826 ±12.271 ±20.277 ±21.096 ±30.082 ±20.881 ±15.29 ±12.019
13.291 16.623 20.534 16.424 8.357 7.503 13.491 20.504 19.608 11.27 8.239 14.168
MultiResUNet
±19.988 ±17.923 ±20.18 ±18.717 ±12.915 ±11.828 ±20.581 ±23.767 ±22.555 ±16.494 ±13.843 ±11.371
bioRxiv preprint doi: https://doi.org/10.1101/2024.04.02.587810; this version posted April 3, 2024. The copyright holder for this preprint (which
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY-NC-ND 4.0 International license.

Figure 4 shows segmentation examples of the 15 models. All of the models produce misclassifications or
fragmentation errors. For example, BianqueNet exhibits hollow holes in both D3 and D4. TransUnet incorrectly
identifies D3 as D2 and exhibits a significant segmentation error in the lower left corner in the MR scan. Also seen
in Figure 4, many models produce incorrect predictions around pixels in close proximity to the boundary of two
adjacent objects.

. . 1 . 1 1. . 11. .1 1. . . . 1 1 . .1 1.

1. 10. 1 0. 1 .10 1.0 . 0. 1 .1 . . . 0. . 0 . 1 . 1 .1

Fig. 4. Segmentation examples of the 15 models. IMG is the input image. GT indicates the Ground-Truth annotation.
Dice (green color) and HD95 (yellow color) are shown on top of each image. The 11 lumbar objects are shown in
different colors. Segmenttion errors are indicated by pink arrows.

4.4. Results of Experiment-B with the augmented training set

In this section, we show the advantages of our data augmentation method in Section 3.1s. The top four
models in Table 2 are Swin UNETR, SLT-Net, UNETR, BianqueNet. In the experiment-B, each of the four models
was trained from scratch using the augmented training set consisting of 7000 samples. Model evaluations were
conducted on the augmented test set (see Table 8 and Table 9). The results show that the segmentation performance
of most models is improved, suggesting that the augmented data can enhance the generalization and overall
performance of a model.
It is noteworthy that training Transformer-based models with limited data is challenging. Nevertheless, the
new data augmentation method effectively tackles the challenge of data scarcity with the support of SSM and
biomechanics. Our data augmentation can generate synthetic images that closely resemble real data, effectively
enhancing the model training process. This approach assists in augmenting the available data, enabling Transformer
models to learn from a more diverse and representative dataset. The augmented/synthesized datasets can be made
publicly available without any concerns related to medical data privacy.
By using the instance segmentation evaluation and the additional translation robustness assessment, a
comprehensive evaluation of each model’s performance is achieved. This ensures that models excel not only
instance segmentation ability but also maintain their accuracy under realistic and varied conditions.

4.4.1. Instance Segmentation Evaluation

Table 8 (Dice) and Table 9 (HD95) present the instance segmentation evaluation results on the augmented
test set. It is evident that the segmentation performances of all models have improved by using the augmented
dataset, as indicated by enhancements in both evaluation metrics, except for Swin UNETR on the HD95 metric.
bioRxiv preprint doi: https://doi.org/10.1101/2024.04.02.587810; this version posted April 3, 2024. The copyright holder for this preprint (which
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY-NC-ND 4.0 International license.

The augmented dataset offers a broader domain, enabling the models to acquire more diverse knowledge and
achieve improved segmentation performance. This increased resilience to data variations is a notable benefit of the
augmented training dataset. Increasing the complexity of the training dataset helps prevent overfitting and regulating
models from memorizing the training.
Table 8. Dice (the higher, the better) of each model on the augmented test set. The “change” is the Dice
difference between training on the augmented training set and training on the original training set
L1 L2 L3 L4 L5 S1 D1 D2 D3 D4 D5 Average Change
Swin 94.593 96.124 93.027 94.442 95.345 93.384 94.949 94.369 92.323 92.806 93.900 94.115
+0.769
UNETR ±8.113 ±7.352 ±17.824 ±12.723 ±6.840 ±6.406 ±3.461 ±7.674 ±13.660 ±5.711 ±4.372 ±9.581
90.928 96.708 92.836 92.604 96.143 95.545 91.135 95.435 91.254 90.982 95.427 93.545
SLT-Net +0.813
±21.398 ±5.097 ±18.462 ±19.937 ±4.279 ±2.669 ±18.062 ±3.448 ±17.130 ±15.676 ±2.972 ±14.102
96.099 96.878 93.981 94.798 95.509 94.832 94.930 94.486 92.957 92.616 93.974 94.642
UNETR +2.059
±2.574 ±2.479 ±11.711 ±8.922 ±3.888 ±2.951 ±2.159 ±5.341 ±8.036 ±4.935 ±3.926 ±6.081
93.381 95.339 93.786 96.670 96.789 95.666 92.427 93.723 94.040 94.184 95.276 94.662
BianqueNet +0.471
±16.166 ±12.916 ±13.760 ±4.308 ±2.243 ±2.302 ±16.211 ±11.633 ±7.633 ±4.318 ±3.268 ±10.231

Table 9. HD95 (the lower, the better) of each model on the augmented test set. The “change” is the HD
difference between training on the augmented training set and training on the original training set
L1 L2 L3 L4 L5 S1 D1 D2 D3 D4 D5 Average Change
Swin 17.591 13.865 12.207 11.290 10.393 19.144 9.570 12.249 7.145 5.524 11.808 11.889
+3.280
UNETR ±52.428 ±44.911 ±42.917 ±38.870 ±35.903 ±52.802 ±35.018 ±43.617 ±30.825 ±21.039 ±30.508 ±40.236
3.720 1.577 3.227 1.899 2.012 1.871 2.344 1.387 3.100 2.584 1.602 2.302
SLT-Net -1.940
±8.774 ±3.310 ±7.838 ±5.205 ±2.264 ±2.843 ±6.312 ±2.514 ±8.190 ±4.752 ±2.714 ±5.549
1.923 1.774 2.679 3.029 3.056 3.340 1.329 1.788 2.348 2.640 2.455 2.396
UNETR -2.161
±2.598 ±3.164 ±4.416 ±7.859 ±5.502 ±5.462 ±1.5817 ±2.9369 ±3.860 ±3.108 ±3.079 ±4.341
3.815 2.907 3.849 2.605 2.268 2.667 3.383 4.257 3.149 2.061 2.860 3.075
BianqueNet -0.957
±9.985 ±8.092 ±10.217 ±9.389 ±7.335 ±10.589 ±9.089 ±11.508 ±10.065 ±5.126 ±13.087 ±9.734

5. Conclusion
In this paper, we present an evaluation of 15 DNN models for instance segmentation of lumbar spine MR
images, using the original dataset of 100 patients and the generated SSMSpine dataset of thousands of virtual
patients. We developed the SSM-biomechanics based data augmentation method to further improve model
performance by providing large and diverse datasets of synthetic images with ground-truth. Given that our
augmented datasets consist entirely of synthetic data, we have made our augmented dataset, SSMSpine, publicly
available. The results presented indicate that models trained on the augmented training set had comparably or even
better performance than the same models trained on the original training set. This underscores that our data
augmentation method can generate synthetic data that eliminates privacy concerns while retaining in the same image
domain.
Our current study mainly focused on the mid-sagittal lumbar spine MR images for two major reasons. First,
as shown in a clinical study (Hu et al., 2018), the mid-sagittal image of a patient provides the most useful
information for the diagnosis of lumbar spine degeneration. Second, the slice thickness of a lumbar MR scan in the
sagittal direction is often much larger than 5mm, which causes difficulties to create accurate 3D ground-truth
annotation for model training. Nevertheless, the models could be directly extended to handle 3D images once the
slice thickness becomes acceptably small with the advancement of imaging technology.
The 15 DNN models often generate segmentation artifacts, such as: (1) extra areas not belonging to any
discs or vertebrae, (2) assigning the same class label to two different discs, and (3) broken area of a disc or vertebrae.
Thus, new models are needed for artifact-free geometry reconstruction of lumbar spine from MR images.
bioRxiv preprint doi: https://doi.org/10.1101/2024.04.02.587810; this version posted April 3, 2024. The copyright holder for this preprint (which
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY-NC-ND 4.0 International license.

References

Ambellan, F., Lamecker, H., von Tycowicz, C., Zachow, S., 2019. Statistical Shape Models: Understanding and
Mastering Variation in Anatomy, in: Rea, P.M. (Ed.), Biomedical Visualisation : Volume , Advances in
Experimental Medicine and Biology. Springer International Publishing, Cham, pp. 67–84.
https://doi.org/10.1007/978-3-030-19385-0_5
Berhane, H., Scott, M., Elbaz, M., Jarvis, K., McCarthy, P., Carr, J., Malaisrie, C., Avery, R., Barker, A.J.,
Robinson, J.D., Rigsby, C.K., Markl, M., 2020. Fully automated 3D aortic segmentation of 4D flow MRI
for hemodynamic analysis using deep learning. Magn. Reson. Med. 84, 2204–2218.
https://doi.org/10.1002/mrm.28257
Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M., 2023. Swin-Unet: Unet-Like Pure
Transformer for Medical Image Segmentation, in: Karlinsky, L., Michaeli, T., Nishino, K. (Eds.),
Computer Vision – ECCV 2022 Workshops, Lecture Notes in Computer Science. Springer Nature
Switzerland, Cham, pp. 205–218. https://doi.org/10.1007/978-3-031-25066-8_9
Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., Lu, L., Yuille, A.L., Zhou, Y., 2021. TransUNet:
Transformers Make Strong Encoders for Medical Image Segmentation.
https://doi.org/10.48550/arXiv.2102.04306
Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H., 2018. Encoder-Decoder with Atrous Separable
Convolution for Semantic Image Segmentation. Presented at the Proceedings of the European Conference
on Computer Vision (ECCV), pp. 801–818.
Chevrefils, C., Chériet, F., Grimard, G., Aubin, C.-E., 2007. Watershed Segmentation of Intervertebral Disk and
Spinal Canal from MRI Images, in: Kamel, M., Campilho, A. (Eds.), Image Analysis and Recognition,
Lecture Notes in Computer Science. Springer, Berlin, Heidelberg, pp. 1017–1027.
https://doi.org/10.1007/978-3-540-74260-9_90
Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J., 1995. Active Shape Models-Their Training and Application.
Comput. Vis. Image Underst. 61, 38–59. https://doi.org/10.1006/cviu.1995.1004
Cox, M., Serra, R., Shapiro, I., Risbud, M., 2014. The Intervertebral Disc: Molecular and Structural Studies of the
Disc in Health and Disease.
Davies, R.H., Twining, C.J., Daniel Allen, P., Cootes, T.F., Taylor, C.J., 2003. Building optimal 2D statistical
shape models. Image Vis. Comput., British Machine Vision Computing 2001 21, 1171–1182.
https://doi.org/10.1016/j.imavis.2003.09.003
Dong, X., Bao, J., Chen, Dongdong, Zhang, W., Yu, N., Yuan, L., Chen, Dong, Guo, B., 2022. CSWin
Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows.
https://doi.org/10.48550/arXiv.2107.00652
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer,
M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N., 2021. An Image is Worth 16x16 Words:
Transformers for Image Recognition at Scale. https://doi.org/10.48550/arXiv.2010.11929
Dufter, P., Schmitt, M., Schütze, H., 2022. Position Information in Transformers: An Overview. Comput.
Linguist. 48, 733–763. https://doi.org/10.1162/coli_a_00445
Dwivedi, K.Kr., Lakhani, P., Kumar, S., Kumar, N., 2022. A hyperelastic model to capture the mechanical
behaviour and histological aspects of the soft tissues. J. Mech. Behav. Biomed. Mater. 126, 105013.
https://doi.org/10.1016/j.jmbbm.2021.105013
Feng, K., Ren, L., Wang, G., Wang, H., Li, Y., 2022. SLT-Net: A codec network for skin lesion segmentation.
Comput. Biol. Med. 148, 105942. https://doi.org/10.1016/j.compbiomed.2022.105942
Galbusera, F., Casaroli, G., Bassani, T., 2019. Artificial intelligence and machine learning in spine research. JOR
SPINE 2, e1044. https://doi.org/10.1002/jsp2.1044
Gao, Y., Zhou, M., Metaxas, D.N., 2021. UTNet: A Hybrid Transformer Architecture for Medical Image
Segmentation, in: de Bruijne, M., Cattin, P.C., Cotin, S., Padoy, N., Speidel, S., Zheng, Y., Essert, C.
(Eds.), Medical Image Computing and Computer Assisted Intervention – MICCAI 2021, Lecture Notes in
bioRxiv preprint doi: https://doi.org/10.1101/2024.04.02.587810; this version posted April 3, 2024. The copyright holder for this preprint (which
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY-NC-ND 4.0 International license.

Computer Science. Springer International Publishing, Cham, pp. 61–71. https://doi.org/10.1007/978-3-

030-87199-4_6
Hatamizadeh, A., Nath, V., Tang, Y., Yang, D., Roth, H.R., Xu, D., 2022. Swin UNETR: Swin Transformers
for Semantic Segmentation of Brain Tumors in MRI Images, in: Crimi, A., Bakas, S. (Eds.), Brainlesion:
Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, Lecture Notes in Computer Science.
Springer International Publishing, Cham, pp. 272–284. https://doi.org/10.1007/978-3-031-08999-2_22
Hatamizadeh, A., Tang, Y., Nath, V., Yang, D., Myronenko, A., Landman, B., Roth, H., Xu, D., 2021. UNETR:
Transformers for 3D Medical Image Segmentation. https://doi.org/10.48550/arXiv.2103.10504
Hu, X., Chen, M., Pan, J., Liang, L., Wang, Y., 2018. Is it appropriate to measure age-related lumbar disc
degeneration on the mid-sagittal MR image? A quantitative image study. Eur. Spine J. 27, 1073–1081.
https://doi.org/10.1007/s00586-017-5357-3
Hu, Y., Zhang, L., Mu, N., Liu, L., 2022. Parameter-Efficient Transformer with Hybrid Axial-Attention for
Medical Image Segmentation. https://doi.org/10.48550/arXiv.2211.09533
Huang, J., Jian, F., Wu, H., Li, H., 2013. An improved level set method for vertebra CT image segmentation.
Biomed. Eng. OnLine 12, 48. https://doi.org/10.1186/1475-925X-12-48
Huang, Y., Yang, X., Liu, L., Zhou, H., Chang, A., Zhou, X., Chen, R., Yu, J., Chen, J., Chen, C., Chi, H., Hu, X.,
Fan, D.-P., Dong, F., Ni, D., 2023. Segment Anything Model for Medical Images?
https://doi.org/10.48550/arXiv.2304.14660
Huang, Y.L., Chen, D.R., 2004. Watershed segmentation for breast tumor in 2-D sonography. Ultrasound Med.
Biol. 30, 625–632. https://doi.org/10.1016/j.ultrasmedbio.2003.12.001
Hufnagel, H., Pennec, X., Ehrhardt, J., Handels, H., Ayache, N., 2007. Shape Analysis Using a Point-Based
Statistical Shape Model Built on Correspondence Probabilities, in: Ayache, N., Ourselin, S., Maeder, A.
(Eds.), Medical Image Computing and Computer-Assisted Intervention – MICCAI 2007, Lecture Notes in
Computer Science. Springer, Berlin, Heidelberg, pp. 959–967. https://doi.org/10.1007/978-3-540-75757-
3_116
Ibtehaz, N., Rahman, M.S., 0 0. MultiResUNet : Rethinking the U-Net architecture for multimodal biomedical
image segmentation. Neural Netw. 121, 74–87. https://doi.org/10.1016/j.neunet.2019.08.025
Kirnaz, S., Capadona, C., Wong, T., Goldberg, J.L., Medary, B., Sommer, F., McGrath, L.B., Härtl, R., 2022.
Fundamentals of Intervertebral Disc Degeneration. World Neurosurg. 157, 264–273.
https://doi.org/10.1016/j.wneu.2021.09.066
Kos, N., Gradisnik, L., Velnar, T., 2019. A Brief Review of the Degenerative Intervertebral Disc Disease. Med.
Arch. 73, 421–424. https://doi.org/10.5455/medarh.2019.73.421-424
Kuang, X., Cheung, J.P., Wu, H., Dokos, S., Zhang, T., 2020. MRI-SegFlow: a novel unsupervised deep learning
pipeline enabling accurate vertebral segmentation of MRI images, in: 2020 42nd Annual International
Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). Presented at the 2020
42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC),
pp. 1633–1636. https://doi.org/10.1109/EMBC44109.2020.9175987
Landman, B., Xu, Z., Igelsias, J., Styner, M., Langerak, T., Klein, A., 2015. Miccai multi-atlas labeling beyond
the cranial vault–workshop and challenge. Presented at the Proc. MICCAI Multi-Atlas Labeling Beyond
Cranial Vault—Workshop Challenge, p. 12. https://doi.org/10.7303/syn3193805
Li, N., Liu, M., Li, Y., 2007. Image Segmentation Algorithm using Watershed Transform and Level Set Method,
in: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP ’0 .
Presented at the 2007 IEEE International Conference on Acoustics, Speech and Signal Processing -
ICASSP ’0 , p. I-613-I–616. https://doi.org/10.1109/ICASSP.2007.365982
Liang, L., Liu, M., Elefteriades, J., Sun, W., 2023. PyTorch-FEA: Autograd-enabled finite element analysis
methods with applications for biomechanical analysis of human aorta. Comput. Methods Programs
Biomed. 238, 107616. https://doi.org/10.1016/j.cmpb.2023.107616
Liang, L., Liu, M., Martin, C., Elefteriades, J.A., Sun, W., 2017. A machine learning approach to investigate the
relationship between shape features and numerically predicted risk of ascending aortic aneurysm.
Biomech. Model. Mechanobiol. 16, 1519–1533. https://doi.org/10.1007/s10237-017-0903-9
bioRxiv preprint doi: https://doi.org/10.1101/2024.04.02.587810; this version posted April 3, 2024. The copyright holder for this preprint (which
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY-NC-ND 4.0 International license.

Lin, T., Wang, Y., Liu, X., Qiu, X., 2022. A survey of transformers. AI Open 3, 111–132.
https://doi.org/10.1016/j.aiopen.2022.10.001
Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., van der Laak, J.A.W.M., van
Ginneken, B., Sánchez, C.I., 2017. A survey on deep learning in medical image analysis. Med. Image
Anal. 42, 60–88. https://doi.org/10.1016/j.media.2017.07.005
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B., 2021. Swin Transformer: Hierarchical
Vision Transformer using Shifted Windows. https://doi.org/10.48550/arXiv.2103.14030
Mallio, C.A., Vadalà, G., Russo, F., Bernetti, C., Ambrosio, L., Zobel, B.B., Quattrocchi, C.C., Papalia, R.,
Denaro, V., 2022. Novel Magnetic Resonance Imaging Tools for the Diagnosis of Degenerative Disc
Disease: A Narrative Review. Diagnostics 12, 420. https://doi.org/10.3390/diagnostics12020420
Moccia, S., De Momi, E., El Hadji, S., Mattos, L.S., 2018. Blood vessel segmentation algorithms — Review of
methods, datasets and evaluation metrics. Comput. Methods Programs Biomed. 158, 71–91.
https://doi.org/10.1016/j.cmpb.2018.02.001
Noothout, J.M.H., Vos, B.D. de, Wolterink, J.M., Išgum, I., 01 . Automatic segmentation of thoracic aorta
segments in low-dose chest CT, in: Medical Imaging 2018: Image Processing. Presented at the Medical
Imaging 2018: Image Processing, SPIE, pp. 446–451. https://doi.org/10.1117/12.2293114
Ogden, R.W., Hill, R., 1997. Large deformation isotropic elasticity – on the correlation of theory and experiment
for incompressible rubberlike solids. Proc. R. Soc. Lond. Math. Phys. Sci. 326, 565–584.
https://doi.org/10.1098/rspa.1972.0026
Oktay, O., Schlemper, J., Folgoc, L.L., Lee, M., Heinrich, M., Misawa, K., Mori, K., McDonagh, S., Hammerla,
N.Y., Kainz, B., Glocker, B., Rueckert, D., 2018. Attention U-Net: Learning Where to Look for the
Pancreas. https://doi.org/10.48550/arXiv.1804.03999
Pang, S., Pang, C., Su, Z., Lin, L., Zhao, L., Chen, Y., Zhou, Y., Lu, H., Feng, Q., 2022. DGMSNet: Spine
segmentation for MR image by a detection-guided mixed-supervised segmentation network. Med. Image
Anal. 75, 102261. https://doi.org/10.1016/j.media.2021.102261
Pepe, A., Li, J., Rolf-Pissarczyk, M., Gsaxner, C., Chen, X., Holzapfel, G.A., Egger, J., 2020. Detection,
segmentation, simulation and visualization of aortic dissections: A review. Med. Image Anal. 65, 101773.
https://doi.org/10.1016/j.media.2020.101773
Pu, Y., Zhang, Q., Qian, C., Zeng, Q., Li, N., Zhang, L., Zhou, S., Zhao, G., 2023. Semi-supervised segmentation
of coronary DSA using mixed networks and multi-strategies. Comput. Biol. Med. 156, 106493.
https://doi.org/10.1016/j.compbiomed.2022.106493
Roberts, S., Gardner, C., Jiang, Z., Abedi, A., Buser, Z., Wang, J.C., 2021. Analysis of trends in lumbar disc
degeneration using kinematic MRI. Clin. Imaging 79, 136–141.
https://doi.org/10.1016/j.clinimag.2021.04.028
Ronneberger, O., Fischer, P., Brox, T., 2015. U-Net: Convolutional Networks for Biomedical Image
Segmentation, in: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (Eds.), Medical Image Computing
and Computer-Assisted Intervention – MICCAI 2015, Lecture Notes in Computer Science. Springer
International Publishing, Cham, pp. 234–241. https://doi.org/10.1007/978-3-319-24574-4_28
Sarkalkan, N., Weinans, H., Zadpoor, A.A., 2014. Statistical shape and appearance models of bones. Bone 60,
129–140. https://doi.org/10.1016/j.bone.2013.12.006
Sekuboyina, A., Kukačka, J., Kirschke, J.S., Menze, B.H., Valentinitsch, A., 01 . Attention-Driven Deep
Learning for Pathological Spine Segmentation, in: Glocker, B., Yao, J., Vrtovec, T., Frangi, A., Zheng, G.
(Eds.), Computational Methods and Clinical Applications in Musculoskeletal Imaging, Lecture Notes in
Computer Science. Springer International Publishing, Cham, pp. 108–119. https://doi.org/10.1007/978-3-
319-74113-0_10
Shamshad, F., Khan, S., Zamir, S.W., Khan, M.H., Hayat, M., Khan, F.S., Fu, H., 2023. Transformers in medical
imaging: A survey. Med. Image Anal. 88, 102802. https://doi.org/10.1016/j.media.2023.102802
Shaw, P., Uszkoreit, J., Vaswani, A., 2018. Self-Attention with Relative Position Representations.
https://doi.org/10.48550/arXiv.1803.02155
Shen, D., Wu, G., Suk, H.-I., 2017. Deep Learning in Medical Image Analysis. Annu. Rev. Biomed. Eng. 19,
221–248. https://doi.org/10.1146/annurev-bioeng-071516-044442
bioRxiv preprint doi: https://doi.org/10.1101/2024.04.02.587810; this version posted April 3, 2024. The copyright holder for this preprint (which
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY-NC-ND 4.0 International license.

Si, C., Yu, W., Zhou, P., Zhou, Y., Wang, X., Yan, S., 2022. Inception Transformer.
https://doi.org/10.48550/arXiv.2205.12956
Sirinukunwattana, K., Pluim, J.P.W., Chen, H., Qi, X., Heng, P.-A., Guo, Y.B., Wang, L.Y., Matuszewski, B.J.,
Bruni, E., Sanchez, U., Böhm, A., Ronneberger, O., Cheikh, B.B., Racoceanu, D., Kainz, P., Pfeiffer, M.,
Urschler, M., Snead, D.R.J., Rajpoot, N.M., 2017. Gland segmentation in colon histology images: The
glas challenge contest. Med. Image Anal. 35, 489–502. https://doi.org/10.1016/j.media.2016.08.008
Soomro, T.A., Afifi, A.J., Zheng, L., Soomro, S., Gao, J., Hellwich, O., Paul, M., 2019. Deep Learning Models
for Retinal Blood Vessels Segmentation: A Review. IEEE Access 7, 71696–71717.
https://doi.org/10.1109/ACCESS.2019.2920616
Suganyadevi, S., Seethalakshmi, V., Balasamy, K., 2022. A review on deep learning in medical image analysis.
Int. J. Multimed. Inf. Retr. 11, 19–38. https://doi.org/10.1007/s13735-021-00218-1
Tamagawa, S., Sakai, D., Nojiri, H., Sato, M., Ishijima, M., Watanabe, M., 2022. Imaging Evaluation of
Intervertebral Disc Degeneration and Painful Discs—Advances and Challenges in Quantitative MRI.
Diagnostics 12, 707. https://doi.org/10.3390/diagnostics12030707
Tao, R., Liu, W., Zheng, G., 2022. Spine-transformers: Vertebra labeling and segmentation in arbitrary field-of-
view spine CTs via 3D transformers. Med. Image Anal. 75, 102258.
https://doi.org/10.1016/j.media.2021.102258
Valanarasu, J.M.J., Oza, P., Hacihaliloglu, I., Patel, V.M., 2021. Medical Transformer: Gated Axial-Attention for
Medical Image Segmentation, in: de Bruijne, M., Cattin, P.C., Cotin, S., Padoy, N., Speidel, S., Zheng,
Y., Essert, C. (Eds.), Medical Image Computing and Computer Assisted Intervention – MICCAI 2021,
Lecture Notes in Computer Science. Springer International Publishing, Cham, pp. 36–46.
https://doi.org/10.1007/978-3-030-87193-2_4
Valanarasu, J.M.J., Patel, V.M., 2022. UNeXt: MLP-Based Rapid Medical Image Segmentation Network, in:
Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (Eds.), Medical Image Computing and Computer
Assisted Intervention – MICCAI 2022, Lecture Notes in Computer Science. Springer Nature Switzerland,
Cham, pp. 23–33. https://doi.org/10.1007/978-3-031-16443-9_3
van Veldhuizen, W.A., Schuurmann, R.C.L., IJpma, F.F.A., Kropman, R.H.J., Antoniou, G.A., Wolterink, J.M.,
de Vries, J.-P.P.M., 2022. A Statistical Shape Model of the Morphological Variation of the Infrarenal
Abdominal Aortic Aneurysm Neck. J. Clin. Med. 11, 1687. https://doi.org/10.3390/jcm11061687
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I., 01 .
Attention is All you Need, in: Advances in Neural Information Processing Systems. Curran Associates,
Inc.
Wang, B., Qin, J., Lv, L., Cheng, M., Li, L., Xia, D., Wang, S., 2023. MLKCA-Unet: Multiscale large-kernel
convolution and attention in Unet for spine MRI segmentation. Optik 272, 170277.
https://doi.org/10.1016/j.ijleo.2022.170277
Wang, H., Cao, P., Wang, J., Zaiane, O.R., 2022. UCTransNet: Rethinking the Skip Connections in U-Net from a
Channel-Wise Perspective with Transformer. Proc. AAAI Conf. Artif. Intell. 36, 2441–2449.
https://doi.org/10.1609/aaai.v36i3.20144
Wang, H., Zhu, Y., Green, B., Adam, H., Yuille, A., Chen, L.-C., 2020. Axial-DeepLab: Stand-Alone Axial-
Attention for Panoptic Segmentation, in: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (Eds.),
Computer Vision – ECCV 2020, Lecture Notes in Computer Science. Springer International Publishing,
Cham, pp. 108–126. https://doi.org/10.1007/978-3-030-58548-8_7
Wang, W., Xie, E., Li, X., Fan, D.-P., Song, K., Liang, D., Lu, T., Luo, P., Shao, L., 2022. PVT v2: Improved
Baselines with Pyramid Vision Transformer. Comput. Vis. Media 8, 415–424.
https://doi.org/10.1007/s41095-022-0274-8
Wiputra, H., Matsumoto, S., Wagenseil, J.E., Braverman, A.C., Voeller, R.K., Barocas, V.H., 2023. Statistical
shape representation of the thoracic aorta: accounting for major branches of the aortic arch. Comput.
Methods Biomech. Biomed. Engin. 26, 1557–1571. https://doi.org/10.1080/10255842.2022.2128672
Wu, K., Peng, H., Chen, M., Fu, J., Chao, H., 2021. Rethinking and Improving Relative Position Encoding for
Vision Transformer. Presented at the Proceedings of the IEEE/CVF International Conference on
Computer Vision, pp. 10033–10041.
bioRxiv preprint doi: https://doi.org/10.1101/2024.04.02.587810; this version posted April 3, 2024. The copyright holder for this preprint (which
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY-NC-ND 4.0 International license.

You, X., Gu, Y., Liu, Y., Lu, S., Tang, X., Yang, J., 2022. EG-Trans3DUNet: A Single-Staged Transformer-
Based Model for Accurate Vertebrae Segmentation from Spinal Ct Images, in: 2022 IEEE 19th
International Symposium on Biomedical Imaging (ISBI). Presented at the 2022 IEEE 19th International
Symposium on Biomedical Imaging (ISBI), pp. 1–5. https://doi.org/10.1109/ISBI52829.2022.9761551
Zhang, L., Yang, J., Liu, D., Zhang, F., Nie, S., Tan, Y., Guo, T., 2022. Spine X-ray Image Segmentation Based
on Transformer and Adaptive Optimized Postprocessing, in: 2022 IEEE 2nd International Conference on
Software Engineering and Artificial Intelligence (SEAI). Presented at the 2022 IEEE 2nd International
Conference on Software Engineering and Artificial Intelligence (SEAI), pp. 88–92.
https://doi.org/10.1109/SEAI55746.2022.9832144
Zhang, W., Fu, C., Zheng, Y., Zhang, F., Zhao, Y., Sham, C.-W., 2022. HSNet: A hybrid semantic network for
polyp segmentation. Comput. Biol. Med. 150, 106173.
https://doi.org/10.1016/j.compbiomed.2022.106173
Zhang, Z., Zhang, W., 2022. Pyramid Medical Transformer for Medical Image Segmentation.
https://doi.org/10.48550/arXiv.2104.14702
Zheng, H.-D., Sun, Y.-L., Kong, D.-W., Yin, M.-C., Chen, J., Lin, Y.-P., Ma, X.-F., Wang, H.-S., Yuan, G.-J.,
Yao, M., Cui, X.-J., Tian, Y.-Z., Wang, Y.-J., 2022. Deep learning-based high-accuracy quantitation for
lumbar intervertebral disc degeneration from MRI. Nat. Commun. 13, 841.
https://doi.org/10.1038/s41467-022-28387-5
Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J., 2018. UNet++: A Nested U-Net Architecture for
Medical Image Segmentation, in: Stoyanov, D., Taylor, Z., Carneiro, G., Syeda-Mahmood, T., Martel, A.,
Maier-Hein, L., Tavares, J.M.R.S., Bradley, A., Papa, J.P., Belagiannis, V., Nascimento, J.C., Lu, Z.,
Conjeti, S., Moradi, M., Greenspan, H., Madabhushi, A. (Eds.), Deep Learning in Medical Image
Analysis and Multimodal Learning for Clinical Decision Support, Lecture Notes in Computer Science.
Springer International Publishing, Cham, pp. 3–11. https://doi.org/10.1007/978-3-030-00889-5_1

Spine Magnetic Resonance Image Segmentation Using Deep Learning Techniques
No ratings yet
Spine Magnetic Resonance Image Segmentation Using Deep Learning Techniques
6 pages
Computational Intelligence and Neuroscience - 2022 - Wang - Automatic Segmentation of Lumbar Spine MRI Images Based on (1)
No ratings yet
Computational Intelligence and Neuroscience - 2022 - Wang - Automatic Segmentation of Lumbar Spine MRI Images Based on (1)
10 pages
Bharath Simha Reddy 2021 IOP Conf. Ser. Mater. Sci. Eng. 1022 012020
No ratings yet
Bharath Simha Reddy 2021 IOP Conf. Ser. Mater. Sci. Eng. 1022 012020
11 pages
UNesT - Local Spatial Representation Learning With Hierarchical Transformer For Efficient Medical Segmentation
No ratings yet
UNesT - Local Spatial Representation Learning With Hierarchical Transformer For Efficient Medical Segmentation
21 pages
Miscnn: A Framework For Medical Image Segmentation With Convolutional Neural Networks and Deep Learning
No ratings yet
Miscnn: A Framework For Medical Image Segmentation With Convolutional Neural Networks and Deep Learning
11 pages
Sustainability 13 01224 v2
No ratings yet
Sustainability 13 01224 v2
29 pages
Article - 1 - Automated Segmentation of Multiple Sclerosis Lesions Based On Convolutional Neural Networks
No ratings yet
Article - 1 - Automated Segmentation of Multiple Sclerosis Lesions Based On Convolutional Neural Networks
20 pages
2017 Article 9983-Read
No ratings yet
2017 Article 9983-Read
11 pages
Lumbar Spine Segmentation in MR Images: A Dataset and A Public Benchmark
No ratings yet
Lumbar Spine Segmentation in MR Images: A Dataset and A Public Benchmark
9 pages
Poster
No ratings yet
Poster
1 page
Medical Image Segmentation With Deep Learning
No ratings yet
Medical Image Segmentation With Deep Learning
42 pages
diagnostics-13-02658-v3
No ratings yet
diagnostics-13-02658-v3
17 pages
Deep Learning and Convolutional Neural Networks For Medical Imaging and Clinical Informatics
No ratings yet
Deep Learning and Convolutional Neural Networks For Medical Imaging and Clinical Informatics
452 pages
Paper 19-A Comprehensive Study on Medical Image Segmentation
No ratings yet
Paper 19-A Comprehensive Study on Medical Image Segmentation
18 pages
MISSFormer An Effective Transformer For 2D Medical Image Segmentation
No ratings yet
MISSFormer An Effective Transformer For 2D Medical Image Segmentation
12 pages
H2Former_An_Efficient_Hierarchical_Hybrid_Transformer_for_Medical_Image_Segmentation
No ratings yet
H2Former_An_Efficient_Hierarchical_Hybrid_Transformer_for_Medical_Image_Segmentation
13 pages
Multi-task-deep-learning-for-medical-image-comput_2023_Computers-in-Biology-
No ratings yet
Multi-task-deep-learning-for-medical-image-comput_2023_Computers-in-Biology-
15 pages
Retraction: Retracted: Deep Neural Networks For Medical Image Segmentation
No ratings yet
Retraction: Retracted: Deep Neural Networks For Medical Image Segmentation
16 pages
U-Net-Based Medical Image Segmentation
No ratings yet
U-Net-Based Medical Image Segmentation
16 pages
Transattunet: Multi-Level Attention-Guided U-Net With Transformer For Medical Image Segmentation
No ratings yet
Transattunet: Multi-Level Attention-Guided U-Net With Transformer For Medical Image Segmentation
13 pages
1-s2.0-S1361841524002548-main
No ratings yet
1-s2.0-S1361841524002548-main
14 pages
Final Version
No ratings yet
Final Version
26 pages
AI in MRI
No ratings yet
AI in MRI
9 pages
Bioengineering 12 00140 v2
No ratings yet
Bioengineering 12 00140 v2
16 pages
Literature Survey: Performance Comparision of Residual Deep Network For The Brain Tumor Detection
No ratings yet
Literature Survey: Performance Comparision of Residual Deep Network For The Brain Tumor Detection
19 pages
Nihms 1034737
No ratings yet
Nihms 1034737
42 pages
Project Name: Center of Excellence in Artificial Intelligence For Medical Image Segmentation
No ratings yet
Project Name: Center of Excellence in Artificial Intelligence For Medical Image Segmentation
6 pages
Medical Image Segmentation With 3D Convolutional Neural Networks: A Survey
No ratings yet
Medical Image Segmentation With 3D Convolutional Neural Networks: A Survey
34 pages
IET Image Processing - 2022 - Wang - Medical Image Segmentation Using Deep Learning A Survey
No ratings yet
IET Image Processing - 2022 - Wang - Medical Image Segmentation Using Deep Learning A Survey
25 pages
Sharmila
No ratings yet
Sharmila
8 pages
An Intelligent Deep Hash Coding Network For Content Base - 2024 - Egyptian Infor
No ratings yet
An Intelligent Deep Hash Coding Network For Content Base - 2024 - Egyptian Infor
16 pages
Dataset Meds2
No ratings yet
Dataset Meds2
8 pages
4
No ratings yet
4
16 pages
Transfer - Learning - For - Medical - Image - Classification SLR
No ratings yet
Transfer - Learning - For - Medical - Image - Classification SLR
14 pages
2211.14830v1
No ratings yet
2211.14830v1
38 pages
Machine Learning Based Image Analysis Tool f0r Assessing Degenerative Changes in the Intervertebral Disc
No ratings yet
Machine Learning Based Image Analysis Tool f0r Assessing Degenerative Changes in the Intervertebral Disc
15 pages
s12859-023-05196-1
No ratings yet
s12859-023-05196-1
22 pages
jimaging-07-00074
No ratings yet
jimaging-07-00074
4 pages
A Comprehensive Analysis of Medical Image Segmentation Using Deep Learning
No ratings yet
A Comprehensive Analysis of Medical Image Segmentation Using Deep Learning
10 pages
Intelligent Computing Techniques On Medical Image Segmentation and Analysis A Survey
No ratings yet
Intelligent Computing Techniques On Medical Image Segmentation and Analysis A Survey
6 pages
MTANet_Multi-Task_Attention_Network_for_Automatic_Medical_Image_Segmentation_and_Classification
No ratings yet
MTANet_Multi-Task_Attention_Network_for_Automatic_Medical_Image_Segmentation_and_Classification
12 pages
04 Manuscript
No ratings yet
04 Manuscript
15 pages
Research Proposal Azeem
No ratings yet
Research Proposal Azeem
10 pages
2020 - Singh - 3D Deep Learning On Medical Images
No ratings yet
2020 - Singh - 3D Deep Learning On Medical Images
26 pages
Deep Convolutional Neural Networks For Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
No ratings yet
Deep Convolutional Neural Networks For Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
14 pages
1 s2.0 S0097849320300546 Main
No ratings yet
1 s2.0 S0097849320300546 Main
10 pages
Research Paper
No ratings yet
Research Paper
5 pages
An Image Classification For Larger Healthcare Datasets Using Machine Learning
No ratings yet
An Image Classification For Larger Healthcare Datasets Using Machine Learning
10 pages
Research Paper
No ratings yet
Research Paper
12 pages
Deep Learning For Medical Image Analysis Applicati
No ratings yet
Deep Learning For Medical Image Analysis Applicati
10 pages
1 s2.0 S187705092030644X Main
No ratings yet
1 s2.0 S187705092030644X Main
11 pages
Machine Learning and Deep Learning Approach For Medical Image Analysis: Diagnosis To Detection
No ratings yet
Machine Learning and Deep Learning Approach For Medical Image Analysis: Diagnosis To Detection
39 pages
Brain MRI
No ratings yet
Brain MRI
24 pages
Advantages of Transformer and Its Application For Medical Image Segmentation: A Survey
No ratings yet
Advantages of Transformer and Its Application For Medical Image Segmentation: A Survey
22 pages
verma2021
No ratings yet
verma2021
6 pages
Depp Learning For Medical Image Processing
No ratings yet
Depp Learning For Medical Image Processing
57 pages
TotalSegmentator MRI Sequence-Independent Segmentation of 59 Anatomical Structures in MR images
No ratings yet
TotalSegmentator MRI Sequence-Independent Segmentation of 59 Anatomical Structures in MR images
17 pages
R2 Unet PDF
No ratings yet
R2 Unet PDF
12 pages
Augmented Reality Assisted Surgery: Enhancing Surgical Precision through Computer Vision
From Everand
Augmented Reality Assisted Surgery: Enhancing Surgical Precision through Computer Vision
Fouad Sabry
No ratings yet
Age-related macular degeneration: Diagnosis, symptoms and treatment, an overview
From Everand
Age-related macular degeneration: Diagnosis, symptoms and treatment, an overview
Christopher Schütze
No ratings yet
A Prospective Study in Analgesic Effect of Distal Sodium Channel Blockers (DSCB) in Patients With Sciatica
100% (1)
A Prospective Study in Analgesic Effect of Distal Sodium Channel Blockers (DSCB) in Patients With Sciatica
5 pages
General Orthopaedics and Basic Science Nikolaos K. Paschos pdf download
No ratings yet
General Orthopaedics and Basic Science Nikolaos K. Paschos pdf download
57 pages
LP, 201
No ratings yet
LP, 201
36 pages
Head - Spinal - Injury SC1
100% (1)
Head - Spinal - Injury SC1
48 pages
The Assignment Problem: Examwise Marks Disrtibution-Assignment
No ratings yet
The Assignment Problem: Examwise Marks Disrtibution-Assignment
59 pages
Diagnosis and Management of Lumbar Spinal Stenosis
No ratings yet
Diagnosis and Management of Lumbar Spinal Stenosis
12 pages
Questions - Emberyo
No ratings yet
Questions - Emberyo
3 pages
Lumbar Spine Access Surgery - A Comprehensive Guide To Anterior and Lateral Approaches 2024
100% (1)
Lumbar Spine Access Surgery - A Comprehensive Guide To Anterior and Lateral Approaches 2024
360 pages
Points and Pathways 2022 2
No ratings yet
Points and Pathways 2022 2
114 pages
100 Challenging Spinal Pain Syndrome Cases L. G. F. Giles pdf download
100% (1)
100 Challenging Spinal Pain Syndrome Cases L. G. F. Giles pdf download
51 pages
Dermatomes+Myotomes Lower Limb Lecture Slides
No ratings yet
Dermatomes+Myotomes Lower Limb Lecture Slides
13 pages
تجميعة اسئلة
No ratings yet
تجميعة اسئلة
14 pages
Orientation Last Year
No ratings yet
Orientation Last Year
4 pages
Laporan Akhir Transmisi Distribusi
No ratings yet
Laporan Akhir Transmisi Distribusi
123 pages
Tugas Besar Anstruk 1 (Muhammad)
No ratings yet
Tugas Besar Anstruk 1 (Muhammad)
57 pages
2150 Keith Drive - Construction Schedule (6-4-2024)
No ratings yet
2150 Keith Drive - Construction Schedule (6-4-2024)
2 pages
Model Question Pape 1 M3 - 21-22
No ratings yet
Model Question Pape 1 M3 - 21-22
5 pages
Uhv-Mid-Ii QP
No ratings yet
Uhv-Mid-Ii QP
1 page
Spinal Nerves
No ratings yet
Spinal Nerves
2 pages
تقرير فقرات الرقبه والفقرات القطنيه 07-10-2023
No ratings yet
تقرير فقرات الرقبه والفقرات القطنيه 07-10-2023
4 pages
Previous-Load Cases Definition CII PDF
100% (1)
Previous-Load Cases Definition CII PDF
8 pages
Plexus
No ratings yet
Plexus
16 pages
Practice Quiz - Deep Back and Spinal Cord
No ratings yet
Practice Quiz - Deep Back and Spinal Cord
10 pages
2200 Smopea02 E20 00001 - 02 - Ifa
No ratings yet
2200 Smopea02 E20 00001 - 02 - Ifa
29 pages
BB Protection Stability Secondary
100% (1)
BB Protection Stability Secondary
16 pages
Spinal Nerves Wikipedia
No ratings yet
Spinal Nerves Wikipedia
6 pages
The Lumbar Plexus - Spinal Nerves - Branches - TeachMeAnatomy
No ratings yet
The Lumbar Plexus - Spinal Nerves - Branches - TeachMeAnatomy
5 pages
Case Presenation Potts Disease
No ratings yet
Case Presenation Potts Disease
43 pages
Orthopedic Management of the Hip and Pelvis
No ratings yet
Orthopedic Management of the Hip and Pelvis
433 pages
Nöropunktur Nörobilim Akupunkturunun Klinik El Kitabı, İkinci Baskı
100% (1)
Nöropunktur Nörobilim Akupunkturunun Klinik El Kitabı, İkinci Baskı
175 pages

Evaluation_of_Deep_Neural_Network_Models_for_Insta

Uploaded by

Evaluation_of_Deep_Neural_Network_Models_for_Insta

Uploaded by

bioRxiv preprint doi: https://doi.org/10.1101/2024.04.02.587810; this version posted April 3, 2024.

The copyright holder for this preprint (which

Evaluation of Deep Neural Network Models for Instance

Liang Liang, Ph.D.

Department of Computer Science

Ungar Building, Room 330K

Coral Gables, FL, 33146

Tel: (305) 284-8381; Email: [email protected]

2. Review of DNN models for Lumbar Image Segmentation

2.2. Transformer-based Networks for the segmentation of spine images

2.3. Self-Attention in Transformer

𝑂𝑢𝑡 = 𝐴𝑡𝑡𝑛 × 𝑉 (2)

2.3.2. Position Embedding

Content-Content Content-Position Position-Position

2.3.3. Image Self-Attention in the Existing Image Segmentation Models

Table 1. Image Self-Attention in the attention/Transformer-based segmentation models

3.2. Loss function

ℒ = 0.5𝐿𝐷𝑖𝑐𝑒 + 0.5𝐿𝑎𝑤_𝑐𝑒 (2)

4.1. Original Dataset and Augmented Datasets

4.2. Model evaluation and comparison

1. 10. 1 0. 1 .10 1.0 . 0. 1 .1 . . . 0. . 0 . 1 . 1 .1

4.4. Results of Experiment-B with the augmented training set

4.4.1. Instance Segmentation Evaluation

Computer Science. Springer International Publishing, Cham, pp. 61–71. https://doi.org/10.1007/978-3-

You might also like