0% found this document useful (0 votes)

27 views

U Transformer

1) The document presents the U-Transformer network, which combines a U-Net architecture for medical image segmentation with self-attention and cross-attention modules from Transformers. 2) U-Transformer aims to overcome limitations of U-Nets in modeling long-range contextual interactions and spatial dependencies, which are important for accurate segmentation of complex anatomical structures. 3) It introduces self-attention at the encoder level to leverage global interactions, and cross-attention in the decoder skip connections to help spatial recovery while filtering out non-semantic features.

Uploaded by

marko Cavdar

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

U Transformer

Uploaded by

marko Cavdar

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

U-Net Transformer: Self and Cross Attention for

Medical Image Segmentation

Olivier Petit1,2 , Nicolas Thome1 , Clement Rambour1 , Loic Themyr1,3 , Toby

Collins3 , and Luc Soler2
1
CEDRIC - Conservatoire National des Arts et Metiers, Paris, France
2
Visible Patient SAS, Strasbourg, France
3
IRCAD, Strasbourg, France
[email protected]

Abstract. Medical image segmentation remains particularly challeng-

ing for complex and low-contrast anatomical structures. In this paper,
we introduce the U-Transformer network, which combines a U-shaped
architecture for image segmentation with self- and cross-attention from
Transformers. U-Transformer overcomes the inability of U-Nets to model
long-range contextual interactions and spatial dependencies, which are
arguably crucial for accurate segmentation in challenging contexts. To
this end, attention mechanisms are incorporated at two main levels: a
self-attention module leverages global interactions between encoder fea-
tures, while cross-attention in the skip connections allows a fine spatial
recovery in the U-Net decoder by filtering out non-semantic features.
Experiments on two abdominal CT-image datasets show the large per-
formance gain brought out by U-Transformer compared to U-Net and
local Attention U-Nets. We also highlight the importance of using both
self- and cross-attention, and the nice interpretability features brought
out by U-Transformer.

Keywords: Medical Image Segmentation · Transformers · Self-attention

· Cross-attention · Spatial layout · Global interactions

1 Introduction
Organ segmentation is of crucial importance in medical imaging and computed-
aided diagnosis, e.g. for radiologists to assess physical changes in response to a
treatment or for computer-assisted interventions.
Currently, state-of-the-art methods rely on Fully Convolutional Networks
(FCNs), such as U-Net and variants [9, 2, 7, 18]. U-Nets use an encoder-decoder
architecture: the encoder extracts high-level semantic representations by using
a cascade of convolutional layers, while the decoder leverages skip connections
to re-use high-resolution feature maps from the encoder in order to recover lost
spatial information from high-level representations.
Despite their outstanding performances, FCNs suffer from conceptual limi-
tations in complex segmentation tasks, e.g. when dealing with local visual am-
biguities and low contrast between organs. This is illustrated in Fig 1a) for
2 O. Petit et al.

Fig. 1. Global context is crucial for complex organ segmentation but cannot be cap-
tured by vanilla U-Nets with a limited receptive field, i.e. blue cross region in a) with
failed segmentation in c). The proposed U-Transformer network represents full image
context by means of attention maps b), which leverage long-range interactions with
other anatomical structures to properly segment the complex pancreas region in d).

segmenting the blue cross region corresponding to the pancreas with U-Net: the
limited Receptive Field (RF) framed in red does not capture sufficient contextual
information, making the segmentation fail, see Fig 1c).
In this paper, we introduce the U-Transformer network, which leverages the
strong abilities of transformers [13] to model long-range interactions and spatial
relationships between anatomical structures. U-Transformer keeps the inductive
bias of convolution by using a U-shaped architecture, but introduces attention
mechanisms at two main levels, which help to interpret the model decision.
Firstly, a self-attention module leverages global interactions between semantic
features at the end of the encoder to explicitly model full contextual information.
Secondly, we introduce cross-attention in the skip connections to filter out non-
semantic features, allowing a fine spatial recovery in the U-Net decoder.
Fig 1b) shows a cross-attention map induced by U-Transformer, which high-
lights the most important regions for segmenting the blue cross region in Fig 1a):
our model leverages the long-range interactions with respect to other organs
(liver, stomach, spleen) and their positions to properly segment the whole pan-
creas region, see Fig 1d). Quantitative experiments conducted on two abdom-
inal CT-image datasets show the large performance gain brought out by U-
Transformer compared to U-Net and to the local attention in [11].
Related Work. Attention mechanisms are a relatively recent problem in med-
ical imaging [16, 8, 10–12]. Attention in segmentation is often based on multi-
resolution features combined with a simple attention module [16, 6]. These con-
tributions however fail to incorporate long-range dependencies. Recent works
successfully tackle this aspect through Dual attention networks [12, 5] proving
the importance of full range attention but to the cost of large parameter overhead
and multiple concurrent loss functions.
Transformers [13] models also bring global attention and have witnessed in-
creasing success in the last five years, started in natural language processing with
text embeddings [3]. A pioneer use of transformers in computer vision is non-
U-Transformer: Self and Cross Attention for Medical Image Segmentation 3

local networks [15], which combine self-attention with a convolutional backbone.

Recent applications include object detection [1], semantic segmentation [17, 14],
and image classification [4].
U-Transformer combines the power of Transformers to grasp long-range de-
pendencies and multi-resolution information processing through self- and cross-
attention modules. Our cross-attention mechanism shares the high-level motiva-
tion of Attention U-Net [11] to help the recovery of fine spatial information from
rich semantic features, with the noticeable difference that the U-Transformer’s
attention embraces all input features whereas Attention U-Net’s attention uses
each local feature independently.

2 The U-Transformer Network

As mentioned in Section 1, encoder-decoder U-shaped architectures lack global
context information to handle complex medical image segmentation tasks. We
introduce the U-Transformer network, which augments U-Nets with attention
modules built from multi-head transformers. U-Transformer models long-range
contextual interactions and spatial dependencies by using two types of attention
modules (see Fig 2): Multi-Head Self-Attention (MHSA) and Multi-Head Cross-
Attention (MHCA). Both modules are designed to express a new representation
of the input based on its self-attention in the first case (cf. 2.1) or on the attention
paid to higher level features in the second (cf. 2.2).

Fig. 2. U-Transformer augments U-Nets with transformers to model long-range con-

textual interactions. The Multi-Head Self-Attention (MHSA) module at the end of the
U-Net encoder gives access to a receptive field containing the whole image (shown in
purple), in contrast to the limited U-Net receptive field (shown in blue). Multi-Head
Cross-Attention (MHCA) modules are dedicated to combine the semantic richness in
high level feature maps with the high resolution ones coming from the skip connections.

2.1 Self-attention
The MHSA module is designed to extract long range structural information from
the images. To this end, it is composed of multi-head self-attention functions as
described in [13] positioned at the bottom of the U-Net as shown in Figure 2.
4 O. Petit et al.

The main goal of MHSA is to connect every element in the highest feature map
with each other, thus giving access to a receptive field including all the input
image. The decision for one specific pixel can thus be influenced by any input
pixel. The attention formulation is given in Equation 1. A self-attention module
takes three inputs, a matrix of queries Q ∈ Rn×dk , a matrix of keys K ∈ Rn×dk
and a matrix of values V ∈ Rn×dk .
QK T
Attention(Q, K, V ) = softmax( √ )V = AV (1)
dk
A line of the attention matrix A ∈ Rn×n corresponds to the similarity of a
given element in Q with respect to all the elements in K. Then, the attention
function performs a weighted average of the elements of the value V to account
for all the interactions between the queries and the keys as illustrated in Figure
3. In our segmentation task, Q, K and V share the same size and correspond
to different learnt embedding of the highest level feature map denoted by X in
Figure 3. The embedding matrices are denoted as W q , W k and W v . The atten-
tion is calculated separately in multiple heads before being combined through
another embedding. Moreover, to account for absolute contextual information,
a positional encoding is added to the input features. It is especially relevant for
medical image segmentation, where the different anatomical structures follow a
fixed spatial position. The positional encoding can thus be leveraged to capture
absolute and relative position between organs in MHSA.

Fig. 3. MHSA module: the input tensor is embedded into a matrix of queries Q,
keys K and values V . The attention matrix A in purple is computed based on Q and
K. (1) A line of A corresponds to the attention given to all the elements in K with
respect to one element in Q. (2) A column of the value V corresponds to a feature
map weighted by the attention in A.

2.2 Cross-attention

The MHSA module allows to connect every element in the input with each other.
Attention may also be used to increase the U-Net decoder efficiency and in par-
ticular enhance the lower level feature maps that are passed through the skip
connections. Indeed, if these skip connections insure to keep a high resolution
U-Transformer: Self and Cross Attention for Medical Image Segmentation 5

information they lack the semantic richness that can be found deeper in the net-
work. The idea behind the MHCA module is to turn off irrelevant or noisy areas
from the skip connection features and highlight regions that present a signifi-
cant interest for the application. Figure 4 shows the cross-attention module. The
MHCA block is designed as a gating operation of the skip connection S based on
the attention given to a high level feature map Y . The computed weight values
are then re-scaled between 0 and 1 through a sigmoid activation function. The
resulting tensor, denoted Z in Figure 4, is a filter where low magnitude elements
indicate noisy or irrelevant areas to be reduced. A cleaned up version of S is
then given by the Hadamard product Z S. Finally, the result of this filtering
operation is concatenated with the high level feature tensor Y . Here, the keys
and queries are computed from the same source as we are designing a filtering
operation whereas for NLP tasks, having homogeneous keys and values may be
more meaningful. This configuration proved to be empirically more effective.

Fig. 4. MHCA module: the value of the attention function corresponds to the skip
connection S weighted by the attention given to the high level feature map Y . This
output is transformed into a filter Z and applied to the skip connection.

3 Experiments
We evaluate U-Transformer for abdominal organ segmentation on the TCIA
pancreas public dataset, and an internal multi-organ dataset.
Accurate pancreas segmentation is particularly difficult, due to its small size,
complex and variable shape, and because of the low contrast with the neigh-
6 O. Petit et al.

boring structures, see Fig 1. In addition, the multi-organ setting assesses how
U-transformer can leverage attention from multi-organ annotations.
Experimental setup The TCIA pancreas dataset4 contains 82 CT-scans with
pixel-level annotations. Each CT-scan has around 181 ∼ 466 slices of 512 × 512
pixels and a voxel spacing of ([0.66 ∼ 0.98] × [0.66 ∼ 0.98] × [0.5 ∼ 1.0]) mm3 .
We also experiment with an Internal Multi-Organ (IMO) dataset composed
of 85 CT-scans annotated with 7 classes: liver, gallbladder, pancreas, spleen,
right and left kidneys, and stomach. Each CT-scan has around 57 ∼ 500 slices
of 512 × 512 pixels and a voxel spacing of ([0.42 ∼ 0.98] × [0.42 ∼ 0.98] ×
[0.63 ∼ 4.00])mm3 .
All experiments follow a 5-fold cross validation, using 80% of images in train-
ing and 20% in test. We use the Tensorflow library to train the model, with Adam
optimizer (10−4 learning rate, exponential decay scheduler).
We compare U-Transformer to the U-Net baseline [9] and Attention U-
Net [11] with the same convolutional backbone for fair comparison. We also
report performances with self-attention only (MHSA, section 2.1), and the cross-
attention only (MHCA, section 2.2). U-Net has ∼ 30M parameters, the overhead
from U-transformer is limited (MHSA ∼ 5M, each MHCA block ∼ 2.5M).

3.1 U-Transformer performances

Table 1 reports the performances in Dice averaged over the 5 folds, and over or-
gans for IMO. U-Transformer outperforms U-Net by 2.4pts on TCIA and 1.3pts
for IMO, and Attention U-Net by 1.7pts for TCIA and 1.6pts for IMO. The
gains are consistent on all folds, and paired t-tests show that the improvement
is significant with p−values < 3% for every experiment.
Table 1. Results for each method in Dice similarity coefficient (DSC, %)

Dataset U-Net [9] Attn U-Net [11] MHSA MHCA U-Transformer

TCIA 76.13 (± 0.94) 76.82 (± 1.26) 77.71 (± 1.31) 77.84 (± 2.59) 78.50 (± 1.92)
IMO 86.78 (± 1.72) 86.45 (± 1.69) 87.29 (± 1.34) 87.38 (± 1.53) 88.08 (± 1.37)

Figure 5 provides qualitative segmentation comparison between U-Net, At-

tention U-Net and U-Transformer. We observe that U-Transformer performs
better on difficult cases, where the local structures are ambiguous. For example,
in the second row, the pancreas has a complex shape which is missed by U-Net
and Attention U-Net but U-Transformer successfully segments the organ.
In Table 1, we can see that the self-attention (MHSA) and cross-attention
(MHCA) alone already outperform U-Net and Attention U-Net on TCIA and
IMO. Since MHCA and Attention U-Net apply attention mechanisms at the
skip connection level, it highlights the superiority of modeling global interactions
between anatomical structures and positional information instead of the simple
4
https://wiki.cancerimagingarchive.net/display/Public/Pancreas-CT
U-Transformer: Self and Cross Attention for Medical Image Segmentation 7

Fig. 5. Segmentation results for U-Net [9], Attention U-Net [11] and U-Transformer
on the multi-organ IMO dataset (first row) and on TCIA pancreas (second row).

local attention in [11]. Finally, the combination of MHSA and MHCA in U-

Transformer shows that the two attention mechanisms are complementary and
can collaborate to provide better segmentation predictions.
Table 2 details the results for each organ on the multi-organ IMO dataset.
This further highlights the interest of U-Transformer, which significantly out-
performs U-Net and Attention U-Net for the most challenging organs: pancreas:
+3.4pts, gallbladder: +1.3pts and stomach: +2.2pts. This validates the capacity
of U-Transformer to leverage multi-label annotations to drive the interactions
between anatomical structures, and use easy organ predictions to improve the
detection and delineation of more difficult ones. We can note that U-Transformer
is better for every organ, even the liver which has a high score > 95% with U-Net.
Table 2. Results on IMO in Dice similarity coefficient (DSC, %) detailed per organ.

Organ U-Net [9] Attn U-Net [11] MHSA MHCA U-Transformer

Pancreas 69.71 (± 3.74) 68.65 (± 2.95) 71.64 (± 3.01) 71.87 (± 2.97) 73.10 (± 2.91)
Gallbladder 76.98 (± 6.60) 76.14 (± 6.98) 76.48 (± 6.12) 77.36 (± 6.22) 78.32 (± 6.12)
Stomach 83.51 (± 4.49) 82.73 (± 4.62) 84.83 (± 3.79) 84.42 (± 4.35) 85.73 (± 3.99)
Kidney(R) 92.36 (± 0.45) 92.88 (± 1.79) 92.91 (± 1.84) 92.98 (± 1.70) 93.32 (± 1.74)
Kidney(L) 93.06 (± 1.68) 92.89 (± 0.64) 92.95 (± 1.30) 92.82 (± 1.06) 93.31 (± 1.08)
Spleen 95.43 (± 1.76) 95.46 (± 1.95) 95.43 (± 2.16) 95.41 (± 2.21) 95.74 (± 2.07)
Liver 96.40 (± 0.72) 96.41 (± 0.52) 96.82 (± 0.34) 96.79 (± 0.29) 97.03 (± 0.31)

3.2 U-Transformer analysis and properties

Positional encoding and multi-level MHCA. The Positional Encoding

(PE) allows to leverage the absolute position of the objects in the image. Table 3
8 O. Petit et al.

Fig. 6. Cross-attention maps for the yellow-crossed pixel (left image).

shows an analysis of its impact, on one fold on both datasets. For MHSA, the
PE improves the results by +0.7pt for TCIA and +0.6pt for IMO. For MHCA,
we evaluate a single level of attention with and without PE. We can observe an
improvement of +1.7pts for TCIA and +0.6pt for IMO between the two versions.
Table 3 also shows the favorable impact of using multi vs single-level attention
for MHCA: +1.8pts for TCIA and +0.6pt for IMO. It is worth noting that
Attention U-Net uses multi-level attention but remains below MHCA with a
single level. Figure 6 shows attention maps at each level of U-Transformer: level
3 corresponds to high-resolution features maps, and tends to focus on more
specific regions compared to the first levels.
Table 3. Ablation study on the positional encoding and multi-level on one fold of
TCIA and IMO.

MHSA MHCA
U-Net Attn U-Net wo PE – w PE 1 lvl wo PE – 1 lvl w PE – multi-lvl w PE
TCIA 76.35 77.23 78.17 78.90 77.18 78.88 80.65
IMO 88.18 87.52 88.16 88.76 87.96 88.52 89.13

Further analysis. To further analyse the behaviour of U-Transformer, we eval-

uate the impact of the number of attention heads for MHSA (supplementary, Fig
1): more heads lead to better performances, but the biggest gain comes from the
first head (i.e. U-Net to MHSA). Finally, the evaluation of U-Transformer with
respect to the Hausdorff distance (supplementary, Table 1) follows the same
trend than with Dice score. This highlights the capacity of U-Transformer to
reduce prediction artefacts by means of self- and cross-attention.

4 Conclusion
This paper introduces the U-Transformer network, which augments a U-shaped
FCN with Transformers. We propose to use self and cross-attention modules
to model long-range interactions and spatial dependencies. We highlight the
relevance of the approach for abdominal organ segmentation, especially for small
and complex organs. Future works could include the study of U-Transformer in
3D networks, with other modalities such as MRI or US images, as well as for
other medical image tasks.
U-Transformer: Self and Cross Attention for Medical Image Segmentation 9

References

1. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-
to-end object detection with transformers. In: European Conference on Computer
Vision. pp. 213–229. Springer (2020)
2. Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3d u-net:
Learning dense volumetric segmentation from sparse annotation. In: MICCAI. pp.
424–432 (2016)
3. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirec-
tional transformers for language understanding. CoRR abs/1810.04805 (2018),
http://arxiv.org/abs/1810.04805
4. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner,
T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.:
An image is worth 16x16 words: Transformers for image recognition at scale. In:
International Conference on Learning Representations (2021)
5. Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., Lu, H.: Dual attention network for
scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR) (June 2019)
6. Li, C., Tong, Q., Liao, X., Si, W., Sun, Y., Wang, Q., Heng, P.A.: Attention based
hierarchical aggregation network for 3d left atrial segmentation. In: Statistical At-
lases and Computational Models of the Heart. Atrial Segmentation and LV Quan-
tification Challenges. pp. 255–264 (2019)
7. Milletari, F., Navab, N., Ahmadi, S.: V-net: Fully convolutional neural networks for
volumetric medical image segmentation. In: 2016 Fourth International Conference
on 3D Vision (3DV). pp. 565–571 (2016)
8. Nie, D., Gao, Y., Wang, L., Shen, D.: Asdnet: Attention based semi-supervised deep
networks for medical image segmentation. In: Frangi, A., Fichtinger, G., Schnabel,
J., Alberola-López, C., Davatzikos, C. (eds.) MICCAI 2018. pp. 370–378. Lecture
Notes in Computer Science, Springer Verlag (2018)
9. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomed-
ical image segmentation. In: MICCAI. pp. 234–241 (2015)
10. Roy, A.G., Navab, N., Wachinger, C.: Concurrent spatial and channel squeeze &
excitation in fully convolutional networks. In: MICCAI. vol. abs/1803.02579 (2018)
11. Schlemper, J., Oktay, O., Schaap, M., Heinrich, M., Kainz, B., Glocker,
B., Rueckert, D.: Attention gated networks: Learning to leverage salient
regions in medical images. Medical Image Analysis 53 (02 2019).
https://doi.org/10.1016/j.media.2019.01.012
12. Sinha, A., Dolz, J.: Multi-scale self-guided attention for medical image segmenta-
tion. IEEE Journal of Biomedical and Health Informatics pp. 1–1 (2020)
13. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser,
L., Polosukhin, I.: Attention is all you need. In: NeurIPS. pp. 5998–6008 (2017)
14. Wang, H., Zhu, Y., Green, B., Adam, H., Yuille, A., Chen, L.C.: Axial-deeplab:
Stand-alone axial-attention for panoptic segmentation. In: European Conference
on Computer Vision. pp. 108–126 (2020)
15. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Pro-
ceedings of the IEEE conference on computer vision and pattern recognition. pp.
7794–7803 (2018)
16. Wang, Y., Deng, Z., Hu, X., Zhu, L., Yang, X., xu, X., Heng, P.A., Ni, D.: Deep
attentional features for prostate segmentation in ultrasound. In: MICCAI (09 2018)
10 O. Petit et al.

17. Ye, L., Rochan, M., Liu, Z., Wang, Y.: Cross-modal self-attention network for
referring image segmentation. In: Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition. pp. 10502–10511 (2019)
18. Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: Unet++: A nested
u-net architecture for medical image segmentation. In: Deep Learning in Medical
Image Analysis and Multimodal Learning for Clinical Decision Support. pp. 3–11
(2018)

Fig. 7. Evolution of the Dice Score on TCIA (fold 1) when the number of heads varies
between 0 and 8 in MHSA.

Table 4. Hausdorff Distances (HD) for the different models

Dataset U-Net Attn U-Net U-Transformer

TCIA 13.61 (± 2.01) 12.48 (± 1.36) 12.34 (± 1.51)
IMO 12.06 (± 1.65) 12.13 (± 1.58) 12.00 (± 1.32)

WCM Methods & Tools
100% (1)
WCM Methods & Tools
45 pages
Transattunet: Multi-Level Attention-Guided U-Net With Transformer For Medical Image Segmentation
No ratings yet
Transattunet: Multi-Level Attention-Guided U-Net With Transformer For Medical Image Segmentation
13 pages
H2Former_An_Efficient_Hierarchical_Hybrid_Transformer_for_Medical_Image_Segmentation
No ratings yet
H2Former_An_Efficient_Hierarchical_Hybrid_Transformer_for_Medical_Image_Segmentation
13 pages
Miccal 2022 ConTrans Improving Transformer With Convolutional Attention For Medical Image Segmentation
No ratings yet
Miccal 2022 ConTrans Improving Transformer With Convolutional Attention For Medical Image Segmentation
11 pages
UTNet A Hybrid Transformer Architecture For Medical Image Segmentation PDF
No ratings yet
UTNet A Hybrid Transformer Architecture For Medical Image Segmentation PDF
11 pages
Transunet: Transformers Make Strong Encoders For Medical Image Segmentation
No ratings yet
Transunet: Transformers Make Strong Encoders For Medical Image Segmentation
13 pages
Rahman Medical Image Segmentation via Cascaded Attention Decoding WACV 2023 Paper
No ratings yet
Rahman Medical Image Segmentation via Cascaded Attention Decoding WACV 2023 Paper
10 pages
2211.01784v1
No ratings yet
2211.01784v1
7 pages
s12859-023-05196-1
No ratings yet
s12859-023-05196-1
22 pages
Hierarchical Attention
No ratings yet
Hierarchical Attention
10 pages
UNETR: Transformers For 3D Medical Image Segmentation
No ratings yet
UNETR: Transformers For 3D Medical Image Segmentation
11 pages
Medical Image Segmentation
No ratings yet
Medical Image Segmentation
12 pages
MT-Unet
No ratings yet
MT-Unet
5 pages
COMA Netpreprint
No ratings yet
COMA Netpreprint
16 pages
MTANet_Multi-Task_Attention_Network_for_Automatic_Medical_Image_Segmentation_and_Classification
No ratings yet
MTANet_Multi-Task_Attention_Network_for_Automatic_Medical_Image_Segmentation_and_Classification
12 pages
U-Net-Based Medical Image Segmentation
No ratings yet
U-Net-Based Medical Image Segmentation
16 pages
Swin-Unet: Unet-Like Pure Transformer For Medical Image Segmentation
No ratings yet
Swin-Unet: Unet-Like Pure Transformer For Medical Image Segmentation
14 pages
MISSFormer An Effective Transformer For 2D Medical Image Segmentation
No ratings yet
MISSFormer An Effective Transformer For 2D Medical Image Segmentation
12 pages
A Hybrid CNN-Transformer Architecture for Precise Medical Image Segmentation
No ratings yet
A Hybrid CNN-Transformer Architecture for Precise Medical Image Segmentation
13 pages
CA Net
No ratings yet
CA Net
12 pages
Attention U-Net Learning Where To Look For The Pancreas
No ratings yet
Attention U-Net Learning Where To Look For The Pancreas
10 pages
Multi-Axis Vision Transformer for Medical Image Segmentation
No ratings yet
Multi-Axis Vision Transformer for Medical Image Segmentation
49 pages
Rahman 24 A
No ratings yet
Rahman 24 A
19 pages
EMCAD Efficient Multi-scale Convolutional Attention Decoding for Medical Image Segmentation
No ratings yet
EMCAD Efficient Multi-scale Convolutional Attention Decoding for Medical Image Segmentation
14 pages
Attention Unet A Nested Attention-Aware U-Net For Liver CT Image Segmentation
No ratings yet
Attention Unet A Nested Attention-Aware U-Net For Liver CT Image Segmentation
5 pages
End-to-End Boundary Aware Networks For Medical Image Segmentation
No ratings yet
End-to-End Boundary Aware Networks For Medical Image Segmentation
8 pages
RM UNetUNet LikeMambawithrotationalSSMmoduleformedical
No ratings yet
RM UNetUNet LikeMambawithrotationalSSMmoduleformedical
17 pages
Medical Transformer: Gated Axial-Attention For Medical Image Segmentation
No ratings yet
Medical Transformer: Gated Axial-Attention For Medical Image Segmentation
18 pages
1731586696_02174_42af7ff544a26508ef54123e5c4bad37
No ratings yet
1731586696_02174_42af7ff544a26508ef54123e5c4bad37
16 pages
Zhang Et Al - 2020 - DENSE-Inception U-Net For Medical Image Segmentation2
No ratings yet
Zhang Et Al - 2020 - DENSE-Inception U-Net For Medical Image Segmentation2
40 pages
1-s2.0-S0010482524015178-main
No ratings yet
1-s2.0-S0010482524015178-main
10 pages
1 s2.0 S131915782300099X Main
No ratings yet
1 s2.0 S131915782300099X Main
13 pages
Multi-Scale Feature Pyramid Fusion Network For Medical Image Segmentation
No ratings yet
Multi-Scale Feature Pyramid Fusion Network For Medical Image Segmentation
13 pages
LET_NET_Semantic_Segmentation
No ratings yet
LET_NET_Semantic_Segmentation
15 pages
DETA-Net: A Dual Encoder Network With Text-Guided Attention Mechanism For Skin-Lesions Segmentation
No ratings yet
DETA-Net: A Dual Encoder Network With Text-Guided Attention Mechanism For Skin-Lesions Segmentation
13 pages
1809.10486 - Nnu NetSelf adaptingFrameworkforUnetBasedMedicalImageSegmentation
No ratings yet
1809.10486 - Nnu NetSelf adaptingFrameworkforUnetBasedMedicalImageSegmentation
11 pages
R2 Unet PDF
No ratings yet
R2 Unet PDF
12 pages
1 s2.0 S0031320322007075 Main
No ratings yet
1 s2.0 S0031320322007075 Main
12 pages
Advantages of Transformer and Its Application For Medical Image Segmentation: A Survey
No ratings yet
Advantages of Transformer and Its Application For Medical Image Segmentation: A Survey
22 pages
UNesT - Local Spatial Representation Learning With Hierarchical Transformer For Efficient Medical Segmentation
No ratings yet
UNesT - Local Spatial Representation Learning With Hierarchical Transformer For Efficient Medical Segmentation
21 pages
Medical Image Segmentation
No ratings yet
Medical Image Segmentation
13 pages
Huimin Huang, Lanfen Lin, Ruofeng Tong, Hongjie Hu, Qiaowei Zhang, Yutaro Iwamoto, Xianhua Han, Yen-Wei Chen, Jian Wu
No ratings yet
Huimin Huang, Lanfen Lin, Ruofeng Tong, Hongjie Hu, Qiaowei Zhang, Yutaro Iwamoto, Xianhua Han, Yen-Wei Chen, Jian Wu
5 pages
Modality preserving U-Net for segmentation of multimodal medical images
No ratings yet
Modality preserving U-Net for segmentation of multimodal medical images
16 pages
Diagnostics 12 03064 v2
No ratings yet
Diagnostics 12 03064 v2
31 pages
BEST CODE UNETR Delving Into Efficient and Accurate 3D Medical Image Segmentation
No ratings yet
BEST CODE UNETR Delving Into Efficient and Accurate 3D Medical Image Segmentation
14 pages
Use Skin Cancer To Learn Model To Segment Gi Disorders
No ratings yet
Use Skin Cancer To Learn Model To Segment Gi Disorders
7 pages
Automatic_Thyroid_Ultrasound_Image_Segmentation_Based_on_U-shaped_Network
No ratings yet
Automatic_Thyroid_Ultrasound_Image_Segmentation_Based_on_U-shaped_Network
5 pages
Multi‑attention network for automatic liver tumor segmentation
No ratings yet
Multi‑attention network for automatic liver tumor segmentation
20 pages
1-s2.0-S0925231220309218-main
No ratings yet
1-s2.0-S0925231220309218-main
15 pages
UNetFormer - A Unified Vision Transformer Model and Pre-Training Framework For 3D Medical Image Segmentation - 2204.00631v2
No ratings yet
UNetFormer - A Unified Vision Transformer Model and Pre-Training Framework For 3D Medical Image Segmentation - 2204.00631v2
12 pages
2024_Real-Time-Multi-Organ-Classification-onComputed-Tomography-Images
No ratings yet
2024_Real-Time-Multi-Organ-Classification-onComputed-Tomography-Images
11 pages
Corneal Image Segmentation Using Attention U-Net
No ratings yet
Corneal Image Segmentation Using Attention U-Net
8 pages
1 s2.0 S0097849320300546 Main
No ratings yet
1 s2.0 S0097849320300546 Main
10 pages
CNN-based Segmentation of Medical Imaging Data
No ratings yet
CNN-based Segmentation of Medical Imaging Data
24 pages
cas-dc-template
No ratings yet
cas-dc-template
14 pages
Deep Learning and Convolutional Neural Networks For Medical Imaging and Clinical Informatics
No ratings yet
Deep Learning and Convolutional Neural Networks For Medical Imaging and Clinical Informatics
452 pages
1-s2.0-S1361841524002548-main
No ratings yet
1-s2.0-S1361841524002548-main
14 pages
s12559-023-10126-7 (2)
No ratings yet
s12559-023-10126-7 (2)
15 pages
Swin Transformer Medical
No ratings yet
Swin Transformer Medical
11 pages
2018_ SeGAN_adversarial network with multi-scale l 1 loss for medical
No ratings yet
2018_ SeGAN_adversarial network with multi-scale l 1 loss for medical
10 pages
Types of Supply Chain Models Explained
No ratings yet
Types of Supply Chain Models Explained
3 pages
Dragons_soul
No ratings yet
Dragons_soul
10 pages
Electrical Chelopara Uposhakha, Bogura-Model PDF
No ratings yet
Electrical Chelopara Uposhakha, Bogura-Model PDF
1 page
Part B Unit 2 Reasoning
100% (3)
Part B Unit 2 Reasoning
57 pages
Chapter 9 Mechanical Properties of Solids
No ratings yet
Chapter 9 Mechanical Properties of Solids
16 pages
Elec4613 S22015V2 PDF
No ratings yet
Elec4613 S22015V2 PDF
11 pages
Transport and Logistics Module 1 (1) (1)
No ratings yet
Transport and Logistics Module 1 (1) (1)
22 pages
Qualification of Facility, Equipments, Systems
100% (1)
Qualification of Facility, Equipments, Systems
4 pages
C264 Application: Modular Remote Terminal Unit
No ratings yet
C264 Application: Modular Remote Terminal Unit
4 pages
Dmart
50% (10)
Dmart
23 pages
Messian Essay
No ratings yet
Messian Essay
2 pages
4-1 Resutpdf
No ratings yet
4-1 Resutpdf
56 pages
J1939-13 Off-Board Diagnostic Connector
No ratings yet
J1939-13 Off-Board Diagnostic Connector
7 pages
Parking Guidance System: Ultrasonic Censor
No ratings yet
Parking Guidance System: Ultrasonic Censor
9 pages
Impression C D
100% (1)
Impression C D
48 pages
HL_hp_04067_BW
No ratings yet
HL_hp_04067_BW
1 page
Smart Bias-T AISG
No ratings yet
Smart Bias-T AISG
6 pages
Competitive strategies-performance nexus and the mediating role of enterprise risk management practices: a multi-group analysis for fully fledged Islamic banks and conventional banks with Islamic window in Pakistan
No ratings yet
Competitive strategies-performance nexus and the mediating role of enterprise risk management practices: a multi-group analysis for fully fledged Islamic banks and conventional banks with Islamic window in Pakistan
22 pages
AirPod Pro
No ratings yet
AirPod Pro
1 page
Ebook (EPUB) Rosdahl's Textbook of Basic Nursing 12e Renée Davis, Judy Hyland, Naomi Lee, Kelly Moseley
No ratings yet
Ebook (EPUB) Rosdahl's Textbook of Basic Nursing 12e Renée Davis, Judy Hyland, Naomi Lee, Kelly Moseley
67 pages
Pecho Inclinado PDF
No ratings yet
Pecho Inclinado PDF
1 page
1784U2DHP
No ratings yet
1784U2DHP
12 pages
Video Essay Script For Hairspray Character Analysis Thingy
No ratings yet
Video Essay Script For Hairspray Character Analysis Thingy
23 pages
Chem Grade 9 Answer Key of End Chapter Questions
No ratings yet
Chem Grade 9 Answer Key of End Chapter Questions
3 pages
I. Objectives: The Learners Explain The Use of Possessive Pronouns
100% (1)
I. Objectives: The Learners Explain The Use of Possessive Pronouns
2 pages
Reynolds-Averaged Navier-Stokes Equations For Turbulence Modeling
No ratings yet
Reynolds-Averaged Navier-Stokes Equations For Turbulence Modeling
20 pages
Taxonomic Key For The Genera of Elmidae Coleoptera
No ratings yet
Taxonomic Key For The Genera of Elmidae Coleoptera
9 pages
Lcs Star Observation
No ratings yet
Lcs Star Observation
6 pages
Problems Encountered in Diaphragm Wall Excavation
100% (1)
Problems Encountered in Diaphragm Wall Excavation
5 pages

U Transformer

Uploaded by

U Transformer

Uploaded by

U-Net Transformer: Self and Cross Attention for

Medical Image Segmentation

Olivier Petit1,2 , Nicolas Thome1 , Clement Rambour1 , Loic Themyr1,3 , Toby

Abstract. Medical image segmentation remains particularly challeng-

Keywords: Medical Image Segmentation · Transformers · Self-attention

local networks [15], which combine self-attention with a convolutional backbone.

2 The U-Transformer Network

Fig. 2. U-Transformer augments U-Nets with transformers to model long-range con-

3.1 U-Transformer performances

Dataset U-Net [9] Attn U-Net [11] MHSA MHCA U-Transformer

Figure 5 provides qualitative segmentation comparison between U-Net, At-

local attention in [11]. Finally, the combination of MHSA and MHCA in U-

Organ U-Net [9] Attn U-Net [11] MHSA MHCA U-Transformer

3.2 U-Transformer analysis and properties

Positional encoding and multi-level MHCA. The Positional Encoding

Fig. 6. Cross-attention maps for the yellow-crossed pixel (left image).

Further analysis. To further analyse the behaviour of U-Transformer, we eval-

Table 4. Hausdorff Distances (HD) for the different models

Dataset U-Net Attn U-Net U-Transformer

You might also like