0% found this document useful (0 votes)
11 views

Multimodal Medical Image Fusion Network Based On Target Information Enhancement

Uploaded by

Sabeer Reigns
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Multimodal Medical Image Fusion Network Based On Target Information Enhancement

Uploaded by

Sabeer Reigns
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Received 20 April 2024, accepted 11 May 2024, date of publication 20 May 2024, date of current version 28 May 2024.

Digital Object Identifier 10.1109/ACCESS.2024.3402965

Multimodal Medical Image Fusion Network


Based on Target Information Enhancement
YUTING ZHOU 1, XUEMEI YANG 1, SHIQI LIU 1, AND JUNPING YIN 2
1 China Academy of Engineering Physics, Beijing, Sichuan 100193, China
2 Institute of Applied Physics and Computational Mathematics, Beijing 100094, China

Corresponding author: Junping Yin ([email protected])


This work was supported in part by Beijing Natural Science Foundation under Grant Z210003, in part by the National Natural Science
Foundation of China under Grant NSFC12026607 and Grant NSFC12031016, and in part by the Key Research and Development Program
of the Scientific Research Department under Grant 2020YFA0712203 and Grant 2020YFA0712201.

ABSTRACT Glioma is a kind of brain disease with high incidence, high recurrence rate, high mortality,
and low cure rate. To obtain accurate diagnosis results of brain glioma, doctors need to manually
compare the imaging results of different modalities many times, which will increase the diagnosis time
and reduce the diagnostic efficiency. Image fusion technology has been widely used in recent years to
obtain information on multimodal medical images. This paper proposes a novel image fusion framework,
target information enhanced image fusion network (TIEF), using cross-modal learning and information
enhancement techniques. The framework consists of a multi-sequence feature extraction block, a feature
selection block, and a fusion block. The multi-sequence feature extraction block consists of multiple sobel
dense conv leaky ReLu block (SDCL-block). SDCL-block mainly realizes the extraction of edge features,
shallow features, and deep features. The feature selection block identifies the feature channels with rich
texture information and strong discrimination ability through the effective combination of global information
entropy criterion and feature jump connection. The feature fusion block mainly comprises multi-head and
spatial attention mechanisms, which can realize the fusion of intra-modality and inter-modality features.
On this basis, considering the influence of tumor spatial location and structure information on the fusion
results, a loss function is designed, which is a weighted combination of texture loss, structure loss, and
saliency loss so that texture information from multimodal magnetic resonance imaging (MMRI) and saliency
information from different anatomical structures of the brain can be fused at the same time to improve the
expression ability of features. In this paper, the TIEF algorithm is trained and validated on the MMRI and
(Single-Photon Emission Computed Tomography-MRI) SPECT-MRI datasets of glioma and generalized
on the (Computed Tomography-MRI) CT-MRI dataset of meningioma to verify the performance of the
TIEF algorithm. In the image fusion task, quantitative results showed that TIEF exhibited optimal or
suboptimal performance in information entropy, spatial frequency, and average gradient metrics. Qualitative
results indicate that the fused images can highlight tumor and edematous features. A downstream image
segmentation task was used for evaluation to further verify TIEF’s effectiveness. TIEF achieved the
best results in both (Dice similarity coefficient) Dice and (Hausdorff distance 95%) HD95 segmentation
metrics. In the generalization task, quantitative results indicated that TIEF obtained more information in the
meningioma dataset. In conclusion, TIEF can effectively achieve cross-domain information acquisition and
fusion and has robustness and generalization ability.

INDEX TERMS Medical image fusion, multimodal magnetic resonance imaging, transformer, feature
selection.
I. INTRODUCTION
Brain tumors rank among the most common diseases
The associate editor coordinating the review of this manuscript and globally. From 2019 to 2020, China recorded an average
approving it for publication was Zhen Ren . of 12,768 brain tumor patients annually on the (National
2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
VOLUME 12, 2024 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 70851
Y. Zhou et al.: Multimodal Medical Image Fusion Network Based on Target Information Enhancement

Image fusion methods can be divided into two branches:


traditional fusion framework and deep learning fusion
framework. Conventional methods include multi-scale trans-
form [11], sparse representation [12], spatial domain
method [13], and hybrid method [14]. Traditional image
fusion methods are limited by factors such as the complexity
FIGURE 1. MMRI of brain tumors in the BraTs2019 dataset.
of source images, the complex design of artificial fusion
rules, and the prolonged processing time [15]. Medical image
fusion algorithms based on deep learning are divided into
Brain Tumor Registry Research Platform) NBTRC platform, convolutional neural networks (CNN), generative adversarial
a figure nearly 10 times higher than the cases reported in networks (GAN) and Transformer. On the contrary, deep
the past decade [1]. Cancer arises from the mutation or learning methods CNN [16] and GAN [17] have the
change of cellular function [2], resulting in an inability of advantages of detailed edge texture information, reduced
cells to undergo programmed death [3]. These tumors affect computational cost, and elimination of explicit fusion rule
various organs and tissues [4], [5]. While brain tumors rarely design. However, due to the localized nature of convolution
spread to other parts of the body, they remain perilous. The operations [18], they need help to capture comprehensive
growth of tumors can lead to the proliferation and harm of global knowledge. In addition, the Transformer model has
brain tissue in neighboring areas. Even benign tumors can been successful in various vision tasks [19] and applied
exert significant pressure on brain tissue, causing high-impact to medical imaging. Still, the Transformer model mainly
complications [6], [7]. Brain tumors account for about 2.17% focuses on the global information within the domain and
of all cancer-related deaths, and their 5-year survival rate is ignores the crucial cross-domain integration in the image
only 5.6% [8]. In diagnosing brain tumors, clinicians need to fusion task. This approach faces challenges distinguishing
combine different sequences of multimodal MRI, or CT and between target volumes, such as enhanced tumors and
MRI, to determine the condition of brain tumors and further background.
determine whether the brain tumors are benign or malignant Although the results of existing multimodal image fusion
and what kind of treatment plan to use. algorithms are better than those of traditional image fusion
MMRI represents different sequences of MRI. Different methods, many things could still be improved. Due to the lack
MRI sequences offer distinct different details about brain of ground truth for medical image fusion, most methods based
tissues and anatomical features. Clinicians will combine on deep learning achieve image fusion by designing loss
multiple modes of MMRI to comprehensively judge the functions. At present, most of the design of the loss function
situation of tumors. In recent years, MMRI has emerged as is limited to the global pixel information level, which is not
an indispensable tool for precise and personalized medical enough to form a fused image better than the source image
care [9]. Unlike techniques involving ionizing radiation, MRI and limits the quality of the fused image, thereby limiting
remains unaffected by sampling errors and internal variations. the applicability of image fusion in medical applications.
Fig. 1 shows a panel of brain tumor images obtained Although the fusion results of some methods also contain
through various MRI sequences. Observation of Fig. 1 rich texture details, they have no significant contrast and
reveals unique characteristics across different sequences. cannot clearly distinguish the target from the background.
T1 weighted imaging sequence (T1WI) presents clear The features containing rich texture information and edge
anatomical structures but fails to distinctly depict lesions. information only exist in specific feature channels. Using all
In contrast, T1 weighted enhancement scanning imaging the shallow features for fusion reduces the fusion effect of the
sequence (T1Gd) highlights areas with active blood flow, model.
a crucial criterion for accentuating tumors. T2 weighted To solve the above problems, this paper proposes a brain
imaging sequence (T2WI) displays relatively straightforward tumor MMRI fusion network based on feature selection
images aiding in overall tumor assessment. Fluid attenuated and attention mechanism: TIEF. This network’s fusion
inversion recovery image sequence (FLAIR) suppressing image of a brain tumor contains clear brain structure
high signals in cerebrospinal fluid, delineates peritumoral and anatomical information. More importantly, it integrates
edema areas. These diverse imaging patterns capture addi- the description of the edema part, enhanced tumor, and
tional pathological information. Given the limitations of necrotic tumor core of multiple modalities, and the dis-
individual imaging modes, image fusion aims to merge crimination degree of each area is evident, which can
multi-modal images into a unified output, amalgamating provide doctors with more precise and more accurate tumor
complementary information to facilitate enhanced human information. This work contributes significantly to multiple
visual perception and automated tumor detection. Multi- aspects.
modal brain MRI image fusion contributes to more precise 1) We proposed a feature information measurement block
insights into lesion shapes, organizational structures, and based on information entropy, a simple yet robust tool
relative space position [10], facilitating the design of more that measures feature information effectively. This block
accurate individualized treatment plans. establishes efficient skip connections between encoding and

70852 VOLUME 12, 2024


Y. Zhou et al.: Multimodal Medical Image Fusion Network Based on Target Information Enhancement

decoding stages, filtering high-detail and texture-rich feature information (MI) and universal quality index (UIQI) by
channels to enhance feature reuse. 5-15%. Additionally, Guo et al. [23] proposed a multi-
2) We designed a fusion block devised to extract and merge modal image fusion framework based on two-scale image
multi-modality deep features. Comprising a cross-modality- decomposition and sparse representation, overcoming the
based token learner block, transformer block, token fusion limitations of single traditional methods. This approach
block, and spatial attention block, this block dynamically retained finer details and edge features, showing an average
identifies critical areas within the input multi-modality deep improvement of 30% in metrics like edge intensity (EN)
features, enabling spatial and cross-modal fusion. compared to optimal strategies.
3) We proposed a new loss function that incorporates Traditional multimodal medical image fusion methods
modality and tissue weighting, utilizing the regional contrast combine the target task to set the fusion rules and improve
index. This function controls the preservation degree of the image clarity by processing the complementary infor-
information from source images and focuses on regions of mation between multiple images. However, although these
interest vital in various medical applications. traditional algorithms are relatively simple, they are only
The remainder of this paper is organized as follows. applicable to specific tasks or specific datasets, have limited
Section II provides a brief review of existing methods in generalization ability, and they require more demanding
multimodal medical image fusion. Section III introduces an feature extraction and processing, leading to slower com-
efficient method tailored to the task of multimodal MRI putation speeds. The image fusion algorithm based on deep
brain tumor fusion. Section IV delves into experimental learning provides a promising solution to solve the limitations
Settings, Outlines implementation details, presents fusion of traditional methods by enhancing the image fusion
experimental results, performs ablation studies and gen- effect.
eralization studies, compares efficiencies and parameters,
and discusses limitations and potential future directions.
Section V draws conclusions based on the findings in this B. DEEP LEARNING IMAGE FUSION METHOD
paper. Deep learning-based image fusion methods encompass
various techniques such as CNN, GAN, and Transformer.
II. RELATED WORK CNN excel at processing spatial and structural information
In this section, the focus is on reviewing pertinent research within adjacent regions of input medical images. Typically
in image fusion and vision transformer techniques. These composed of convolutional, pooling, and fully connected
two techniques hold considerable relevance to the method layers, CNN extract features from source images, mapping
adopted in this study, and we aim to provide an overview of them to final outputs. These networks define image fusion as
their significant developments. a classification problem, utilizing CNN-based algorithms to
transform images, measure activity levels, and devise fusion
A. TRADITIONAL MEDICAL IMAGE FUSION METHOD rules. Medical image fusion based on CNN mainly includes
Traditional methods for medical image fusion can be pixel-level fusion and feature-level fusion. Pixel-level fusion
categorized into spatial domain techniques, frequency is simply a weighted average of pixel values. The fusion of
domain-based fusion, and sparse representation approaches. feature levels mostly involves joining or adding the channels
Jiang et al. [20] introduced and applied the linked inde- of the feature map. For instance, Vaswani et al. [24] designed
pendent component analysis method in a multi-modal MRI an early CNN-based fusion method, integrating traditional
study of Alzheimer’s patients. The study revealed increased activity level measurements with CNN-based feature extrac-
mean diffusivity, decreased gray matter volume, alterations in tion to produce fused images via pixel-weighted averages or
anisotropy fraction and diffusion tensor patterns in the corpus selected fusion strategies. Similarly, Li et al. [25] designed a
callosum and forceps, and increased anisotropy fraction multi-scale CNN framework, training the network to generate
and diffusion tensor pattern in the regions of the superior decision graphs for image fusion. Despite CNN’s ability to
longitudinal fasciculus passing through the descending fibers, learn from limited medical image datasets, the challenge
such as the internal capsule, corona radiata, and superior of overfitting persists due to the scarcity of medical image
longitudinal fasciculus. Wang et al. [21] proposed a joint samples. CNN learn hierarchical features, enhancing image
Laplacian pyramid method integrating multiple features to content comprehension and analysis. However, because CNN
effectively transfer salient features from source images to a only focuses on local information, the complexity and
single fused image, improving indicators such as standard diversity of multimodal medical image fusion limits its ability
deviation (STD) by 10-15% compared to other traditional to achieve optimal results.
methods. Kang et al. [22] presented a novel approach The GAN algorithm differs significantly from CNN,
utilizing group sparsity and graph positivity regularization in employing a generator and discriminator for feature extrac-
dictionary learning (DL-GSGR) for medical image denoising tion and optimization. In GAN, the generator produces an
and fusion. This method demonstrated more effective feature image, while the discriminator discerns between real and
extraction compared to standard sparse representation and generated images. GAN is trained using an adversarial
multi-resolution analysis, enhancing indicators like mutual loss function, where the generator and discriminator are

VOLUME 12, 2024 70853


Y. Zhou et al.: Multimodal Medical Image Fusion Network Based on Target Information Enhancement

FIGURE 2. The overall architecture of the proposed TIEF method.

in a constant adversarial interplay, striving for equilibrium. framework. For better fusion result, Du et al. introduced
The medical image fusion of GAN mainly generates the parallel Transformer and CNN architecture into the AE-
fused image through the generator and judges whether based fusion framework, (i.e., TransMEF [31]). Furthermore,
the generated image is realistic by discrimination. The Wang et al. also injected Transformer into GAN-based fusion
confrontation between the generator and the discriminator framework to learn the global fusion relations [32]. To reduce
optimizes the fusion effect. Liu et al. [26] proposed the fusion computational costs, Guo et al. [33] proposed a hierarchical
GAN algorithm, treating image fusion as an image generation Transformer (i.e., Swin Transformer) by adopting shifted
task and utilizing the least squares GAN objective to stabilize windows to compute the representation. In their method,
training. While GAN-based fusion methods address some Swin Transformer allowed cross-window connection and
issues through adversarial confrontation, the simplicity of limited self-attention computation to non-overlapping local
single-scale networks in generators may lead to information windows, which achieved greater efficiency and flexibility.
loss and excessive smoothing, causing distortions in the Motivated by [34], residual Swin Transformer (RSTB)
fused image. To address this challenge, Liang et al. [27] has been proposed to extract deep feature for image
introduced the fusion network, incorporating lightweight restoration [35]. The image fusion method based on the
transformer blocks and adversarial learning to emphasize Transformer ignores the cross-domain information, fails
global fusion. This model enables interaction between to capture the local and correlation information between
shallow CNN-extracted features and the transformer fusion different modalities, and fails to highlight the lesion in the
block, refining spatial and cross-channel fusion relationships. tumor tissue area emphasized by enhanced tumor and other
GAN models excel in retaining the selected information multimodal images. This is the key to the problem of MMRI
from source images without requiring labeled data, delivering brain tumor image fusion.
clear and minimally distorted images. However, due to the
complexity of the GAN model, the gradient is prone to III. METHODS
disappear. Although the generator and discriminator of GAN In this section, we designed a TIEF network tailored for
can realize cross-modal learning, they cannot adaptively mining and fusing multi-modal MRI images, illustrated in
learn complementary information and screen important Fig. 2. The architecture primarily comprises the SDCL-block,
information and channels. feature selection block, and fusion block. The SDCL-block
Recently, Transformer-based algorithms has received a operates as a detail-enhanced dual branch for deep feature
lot of attention in the image fusion community. There are extraction, while the feature selection block serves as a block
many medical image fusion frames based on Transformer for channel selection, focusing on information-rich channels.
that have achieved impressive performance. To extract The fusion block is responsible for integrating intra-modality
local and global information, Du et al. [28] used Patch and inter-modality deep features obtained from the encoder.
Pyramid Transformer (PPT) to extract non-local information TIEF adopts a U-shaped framework, featuring four branches
from the entire image [29], based on the AE-based fusion in the encoding section for individual extraction of deep
framework. In addition, Maqsood et al. designed spatio- features from four source images. Conversely, the decoding
Transformer as a multi-scale fusion strategy to capture both section consists of a single branch dedicated to reconstructing
local and global contexts [30], based on CNN-based fusion the fused image.

70854 VOLUME 12, 2024


Y. Zhou et al.: Multimodal Medical Image Fusion Network Based on Target Information Enhancement

FIGURE 3. Illustration of the proposed SDCL block.

We denoted the source multi-modal MRI images as


X ∈ RI ×H ×W , where I represents the number of modalities. FIGURE 4. Illustration of the process of feature selection.

For the MMRI fusion task, I = 4, cor-responding to T1WI,


T1Gd, T2WI, and FLAIR modalities. In the case of SPECT
-MRI fusion, I = 3, cor-responding to T1WI, T2WI, and (stride is 1). The two-dimensional information entropy of Fd
SPECT modalities. The symbols of W and H denote the is defined as:
image’s width and height. X255 X255
H (Fd ) = − pij log2 pij (1)
To enhance feature extraction within the encoding stage, i=0 j=0
we designed a novel SDCL-block, depicted in Fig. 3, f (in , jn )
pij = (2)
employing a double parallel structure. One branch comprised WH
a dense block, optimizing the utilization of features extracted where in denotes the gray value of the center pixel in the
through various convolutional layers. The other branch nth sliding window, jn represents the mean gray value of
employed gradient operations to calculate feature gradient the neighborhood of the rest pixels centered at in in the nth
magnitudes, focusing on texture information extraction. The sliding window. Thus, we get a set {(in , jn )}HW n=1 to reflect
Conv up and Conv down stages incorporated 1 × 1 con- the comprehensive characteristics of the central pixel and its
volutional layers to standardize channel counts within the surrounding pixels. The occurrence probability of f (in , jn ) in
double-branch structure features. Subsequently, an additional the image is defined as pij , where f (in , jn ) is the occurrence
operation integrated the depth and detail features obtained number of (in , jn ), and W and H are the dimensions of the
from the dual branches. The latter part of the SDCL-block feature map.
further accentuates feature integration by repeating the dense Visualizing feature maps across different depths and
block structure, reinforcing the propagation strength of the channels and calculating their information entropy assists
extracted features. in discerning the relationship between feature information
richness and feature map entropy. This process is particularly
A. FEATURE SELECTION valuable in identifying feature maps that align more closely
According to the pruning algorithm [36], the importance with human vision.
of neurons, filters, and channels can be measured using We calculated the entropy of Flair and T1ce feature maps
specific criteria. The less important branches can be pruned across different layer depths (Fig. 5 and Fig. 6). Both figures
to reduce the model size and speed up the calculation without demonstrate that each channel extracts distinct information.
compromising the accuracy. To optimize feature utilization The value in the upper right corner of the figure is the
in decoding, we’ve devised a novel feature selection block. information calculated by Eq. (1), and the circle in the upper
This block dynamically filters richer-detail features for more left corner represents the channel whose information entropy
effective skip connections. is greater than the threshold. Channels within the same layer
We opted for information entropy as the criterion of focus on varied details and different areas. Channels with
choice. Information theory supports entropy as an eval- higher entropy values, compared to those with lower entropy
uation metric to quantify the information content within values, exhibit richer texture details and more salient pixels
an image or feature. It effectively reflects the intensity in tumor areas, aiding in visual tumor detection. To optimize
distribution’s spatial and aggregative characteristics. A higher fusion image reconstruction in the decoding phase, selecting
entropy value signifies greater information content within feature maps with rich information–specifically, channels
an image. While one-dimensional entropy assesses the gray with high entropy–is essential. We calculated and ranked all
value aggregation, it does not capture spatial information. channel features’ entropy values {H (Fd )}D d=1 in lth layer,
Contrastingly, two-dimensional entropy encapsulates spatial selecting the top r entropy channels. In this paper, the
characteristics. In this study, the two-dimensional entropy researcher adopted r = 8. Merely selecting features with
enables the characterization of content abundance within each rich spatial information might not accurately and compre-
feature channel. For a feature map Fd ∈ RH ×W in l th layer, hensively represent image content. Hence, a feature selection
where d = 1, . . . , D(with D being the number of channels). model is integrated as a network branch, addressing crucial
We used a 3 × 3 sliding window to traverse the whole map features in decoding through skip connections (Fig. 4).

VOLUME 12, 2024 70855


Y. Zhou et al.: Multimodal Medical Image Fusion Network Based on Target Information Enhancement

FIGURE 5. Visualization and entropy of shallow features.

FIGURE 6. Visualization and entropy of deeper layer features.

B. FUSION BLOCK comprises a cross-modality-based token learner block, trans-


In this study, we introduce a fusion block aimed at mining former block, token fusion block, and spatial attention block.
and integrating multi-modality deep features. This block These components adaptively tokenize crucial regions within

70856 VOLUME 12, 2024


Y. Zhou et al.: Multimodal Medical Image Fusion Network Based on Target Information Enhancement

FIGURE 7. The architecture of Fusion Block: Cross-Modality-Based Token Learner Block, Transformer Block, Fused Tokens Block, and Spatial
Attention Block.

the input multi-modality deep features, facilitating spatial features effectively, ensuring robust integration of comple-
and modality-based fusion. The network architecture of this mentary information.
proposed fusion block is depicted in Fig. 7. We represented We started by applying a simple linear layer (denoted as
the deep feature maps from each modality as {F1 , . . . ,FI }, flinear , where flinear ∈ RHW ×SI ) independently across each
Fi ∈ RC×H ×W (i = 1, 2, . . . ,I ), where C denotes the number channel of F, incorporating a sigmoid activation function
of channels. Upon concatenating {F1 , . . . ,FI }, we derived a and reshaping operation. This operation results in Ft ∈
tensor F ∈ RIC×H ×W . RSI ×C×HW . Subsequently, the token tensor, denoted as
Ttokens ∈ RSI ×C , is generated by the transformer block.
1) CROSS-MODALITY-BASED TOKEN LEARNER BLOCK We then executed Fs = Ft ⊙Ttokens (resulting in F s ∈
We learned to generate a series of tokenizer function {Ai }Si=1 , RSI ×C×HW ) and executed token-wise addition on Fs along
where S represents the number of tokens, aiming to map the the token axis. Consequently, we obtained the modality-
modality feature F to a token vector Vi . enhanced feature embedding Ffused tokens ∈ RC×HW .
Vi = Global average pool(F ⊙ Ai (F)) (3)
3) SPATIAL ATTENTION BLOCK
where Vi ∈ RIC×1×1 and ⊙ is the Hadamard product Spatial attention serves to identify crucial regions within an
(i.e., element-wise multiplication). This approach enabled the image by assigning significance scores to various spatial
tokens to dynamically adapt their spatial selections rather regions within the feature map. This mechanism accentuates
than being fixed splits of the input tensor. These varying important areas while dampening features in less relevant
tokens effectively mine intra-modality and inter-modality regions. The spatial attention block employs global max-
deep features, enabling the modeling of their relationships pooling and global average-pooling operations along the
and interactions. The resulting tokens are aggregated to form channel axis, concatenating their outputs to generate an
the learned token tensor V ∈ RSI ×C . In this paper, we adopted effective feature descriptor. The computation of the spatial
S = 8. Subsequently, the learned token tensor is forwarded attention mechanism unfolds as follows:
to the subsequent transformer block.   
Matrix max = Global max pool Layernorm F C (4)
  
2) FUSED TOKENS BLOCK Matrix avg = Global average pool Layernorm F C
Following the token generation by the cross-modality-
based token learner block and subsequent processing by (5)
the transformer block, the fused tokens block is employed Weight spatial = Sigmoid (Conv (Concat (Matrix1, Matrix2)))
to further amalgamate information among the tokens. This (6)
functionality facilitates the model in capturing cross-modality
Ffused features = Weight spatial ⊙ Ffused tokens (7)
‘patterns’ formulated by these tokens. The synergy between
the cross-modality-based token learner block and the fused Subsequently, we acquired the fused feature maps that
tokens block aims to fuse intra- and inter-modality deep encompass a selection of pixels, spatial locations, and modal-

VOLUME 12, 2024 70857


Y. Zhou et al.: Multimodal Medical Image Fusion Network Based on Target Information Enhancement

ities, ensuring an adaptive and informative amalgamation uniform distribution of pixels and a mild gradient outside the
across modalities and spatial aspects. tumor. The distribution of pixel intensity in FLAIR images
was not uniform, and the gradient change was noticeable.
C. LOSS FUNCTION However, relying solely on gradients as a measure may
To facilitate the reconstruction of multi-modal image fusion, lead to misleading results that significantly affect marginal
we established a comprehensive loss function considering assessments.
three perspectives: texture information, structural informa- The entropy calculation depends on the probability dis-
tion, and salient target information. tribution of the individual gray levels within the image.
Considering the overall pixel intensity distribution through
1) TEXTURE LOSS the probability distribution, it is not easy to be affected by
Different source images exhibit different features, such as the sparse gradient. The higher the information entropy in
independent units, signal-to-noise ratio, voxel count, spatial the image, the richer the content contained in the image.
smoothness, and intensity distribution. Images of the same The contribution of different modal photos to the final image
morphology but different regions share overall structural fusion was calculated according to the information entropy of
similarity but demonstrate different specific details and the image, and the corresponding weight was given.
textures. The purpose of fused images is to bridge the eκHij
detail gap caused by modal heterogeneity while preserving wij = P kH (10)
i,j e
ij
the complex texture details. Through feature visualization
experiments, an optimal texture loss function is determined in where κ is the adjustment coefficient, balancing the ratio
this paper. This function preserves more texture information between Hij .
by fusing different modes of the image. At the same In similarity calculation, a mask is considered for the tumor
time, to reduce the loss of image details, the loss function region to ensure that critical information is covered within a
introduces the Canny operator to depict the subtle differences small receptive field range. Therefore, this paper designs a
in the texture. With these considerations in mind, texture loss mask region similarity loss function:
was formulated to encourage fused images to contain richer 1 XI XJ
texture information. Mathematically defined as: LM −SSIM = 1 − SSIM (wij IMij ,Gj ) (11)
IJ i=1 j=1
1 IMij = Ii ⊙ Mask j (12)
Ltexture = ∥|∇G| − max(|∇Ii |)∥1 (8)
HW Gj = G ⊙ Mask j (13)
where ∇ denotes the canny operator and ∥.∥1 denotes the loss
Here, i = 1, 2, . . . . . . ,I ; j = 1, 2, . . . . . . ,J ; I =
of L1 .
4;J = 4.⊙ denotes element-wise multiplication, and Mask j
represents masks of different tumor regions, with Mask 1
2) STRUCTURE LOSS
indicating the normal tissue area.
The Structural Similarity (SSIM) [37] metric is commonly
employed to impose structural constraints, ensuring that the
3) SALIENT LOSS
fusion results encompass adequate structural details. The
To better meet the human vision and realize the significant
SSIM applies a structural similarity index measurement to
presentation of different tumor parts in the fusion image, this
constrain the resemblance between the fusion image and the
paper increased the contrast between different tissues of the
source images. Mathematically defined as:
tumor to make the tumor salient. Therefore, we introduce the
(2µx µy + C1 )(2σxy + C2 ) Salient Loss term:
SSIM (x, y) = (9)
(µ2x+ µ2y + C1 )(σx2 + σy2 + C2 )  ′
1 X4 R Gj − R(Gj )
Lsalient = 1− (14)
where µ and σ are the mean and variance operations, σxy 3 j=2 R(G)
denotes the covariance, C1 and C2 are two constants.  sum(Gj )
Entropy is used to measure the richness of image informa- R Gj = (15)
tion. From the information theory perspective, regions with sum(Mask j )
richer texture details have higher information content and G′j denotes the rest of the image region of G except Gj .
entropy. It is worth noting that although both entropy and
gradient methods can evaluate texture richness, the entropy
  sum(G′j )
R G′j = (16)
method is more advantageous than the gradient method. The sum(A − Mask j )
imaging gradients showed varying degrees of response in Here, A is an all-ones matrix of size H × W .
different modes. For example, on T2WI, there is a strong
gradient in the tumor region at the edge of the tumor, while 4) TOTAL LOSS
the gradient in the other areas is sparse and more minor. T1WI
Total loss is calculated by the following formulae:
showed a slight tumor response with a weak gradient. T1Gd
showed marked intratumoral enhancement with a relatively Ltotal = αL texture + βLM −SSIM + ηL salient (17)

70858 VOLUME 12, 2024


Y. Zhou et al.: Multimodal Medical Image Fusion Network Based on Target Information Enhancement

where α, β, η is the balance coefficient, which is used to similarities between source and fused images in terms of
control the proportion of the influence of texture information, correlation, luminance, and contrast distortion. Higher SSIM
structure information, and saliency information on the fusion values indicate lower structural loss and distortion. PSNR
result. represents the ratio of peak value power to noise power
in the fused image, where higher PSNR values signify
IV. RESULTS AND DISCUSSION closer proximity to the source images. EN quantifies image
This section presents a comparative analysis of TIEF and information, where greater information entropy signifies
several state-of-the-art methods using multimodal med- richer knowledge in the fused image. AG measures grayscale
ical images. These experiments involve qualitative and changes across image boundaries, indicating image sharpness
quantitative comparisons using publicly available datasets. and detail contrast. Higher AG values indicate better fusion
In addition, we performed ablation and generalization studies performance. SF gauges row and column frequency in
to delve into the performance and components of the method. the fused image, reflecting image texture and edge detail
richness. CI signifies contrast between foreground and
A. DATASETS
background, aiding in differentiating diseased and normal
tissue areas visually. Higher CI values improve tumor
The fusion effect of TIEF was verified by using several
visibility. MI assesses image intensity similarity between
different multimodal medical datasets, and the generalization
source and fused images, while QAB/F measures edge
effect of TIEF was verified by using one multimodal medical
information similarity. Greater MI and QAB/F values denote
dataset of various diseases. The BraTs2019 dataset [38]
superior fusion performance.
consists of 335 cases, each containing four MRI sequences
(FLAIR, T1WI, T1Gd, T2WI) and tag sequences that outline
the tumor core, post-enhancement tumor, edema, and the C. EXPERIMENTAL DETAILS
entire tumor region. These labels helped we make masks. The epoch count was set at 320, employing an initial learning
A fusion experiment was performed on RGB multimodal rate of 0.0005 with exponential decay and using the Adam
medical images from the neoplastic disease (brain tumor) optimizer. Each batch size is set to 32. BraTs2019 dataset
dataset in AANLIB [39] to verify the fusion performance. cases, comprising FLAIR, T1WI, T1Gd, and T2WI multi-
This dataset included SPECT-T1WI, GAD, and T2WI modal MRI images, were aligned to FLAIR modality and
images. Notably, the BraTs2019 dataset consists of gray- resized to 128 × 128 × 32. AANLIB dataset cases, which
scale images, while the SPECT-T1WI images in the AANLIB encompassed T2WI, GAD, and SPECT-T1WI multi-modal
dataset are in RGB format. Medical image fusion of CT MRI images, were aligned to GAD modality and resized to
and MRI was performed using meningioma data from the 128 × 128 × 32. For hyperparameters in Eq. (10), κ was set to
ANNLIB dataset to verify the model’s generalization. 0.25. In Eq. (17), α, β, and η were set to 0.3, 0.4, and 0.3. All
experiments were conducted using PyTorch on a Windows
B. COMPARSION METHODS AND EVALUATION workstation equipped with an Intel®Core™i9-10900X CPU
INDICATORS and an NVIDIA Geforce GTX Titan A100 GPU.
We compared the proposed TIEF with a comprehensive
set of established image fusion methods used in the field. D. RESULT
This comparison included traditional methods such as CBF 1) BRATS2019 MULTI-MODAL MRI FUSION
(2015) [40] and MGFF (2019) [41], alongside contemporary The fusion results on the BraTs2019 dataset are shown
techniques like U2Fusion (2020) [42], EMFusion (2021) in Fig 8, showcasing outcomes from three experiments,
[43], and SeAFusion (2022) [44], which are CNN-based each involving four distinct MRI image modalities: FLAIR,
fusion approaches. Additionally, we evaluate the proposed T1 WI, T1Gd, and T2WI. We selected three specific images
technique against GAN-based methodologies such as Fusion- that highlight variations in modalities regarding information
GAN (2019) [45], DDcGAN (2020) [46], and GANMcC richness and imaging quality, notably the image quality of
(2020) [47]. Furthermore, the performance of the proposed T1WI. Each mode encompasses distinct details about the
TIEF was assessed against recent Transformer-based fusion tumor, resulting in noticeable differences.
methods, including SwinFusion (2022) [48], MRSCFusion The fusion results on the BraTs2019 dataset are shown
(2023) [49] and DesTrans (2024) [50]. in Fig. 8, and each experiment involved four different
Further, for quantitative comparison, we utilized eight MRI image modalities: FLAIR, T1WI, T1Gd, and T2WI.
metrics to assess fusion performance across all models We selected three images that showed differences in
presented in this study. These metrics included average information richness and imaging quality. Combined with
gradient (AG) [51], spatial frequency (SF) [52], entropy Fig. 8, it can be found that the fusion results of CBF,
(EN) [53], mutual information (MI) [54], peak signal-to- FusionGAN, DDcGAN, GANMcC, and MGFF lose more
noise ratio (PSNR) [55], structural similarity index measure details of the source map, which reduces the identifiability
(SSIM) [56], gradient-based fusion performance (QAB/F ) of structural information and makes the image unclear.
[39], and contrast index (CI) [57]. SSIM evaluates structural In contrast, TIEF preserves the critical information, modal

VOLUME 12, 2024 70859


Y. Zhou et al.: Multimodal Medical Image Fusion Network Based on Target Information Enhancement

TABLE 1. Quantitative comparison of different methods for 8 evaluation items indicators in the Brats2019 dataset (Red: Optimal, Blue: Suboptimal).

structural details, texture details for each modality, and differences, and the perception result of a region will be
pixel intensities in the fusion results, thus enhancing clarity, affected by the brightness of its neighboring areas. Just as
structure, and texture. This advantage is because the adopted TIEF expects the brightness of different brain lesion tissues to
method enhances the extraction and transfer of structural be significantly different in fused images. Therefore, PSNR
information, ensuring that the extraction and retention of is inconsistent with human subjective feeling, and PSNR is
source image knowledge is more comprehensive than other suboptimal in several algorithms. The MI metric of TIEF
techniques. TIEF shares information in the fusion block and is suboptimal because the information is converted from
the loss function part. Therefore, when a particular mode the source map to the fusion map, and theoretically, the
image quality is low and the texture is unclear, the fusion amount of information should remain the same. However, the
result minimizes the interference of the low-quality mode, fusion images obtained by TIEF emphasize the advantages of
strengthens the information of other modes, and prevents different modes, and other lesions are clearly distinguished in
the loss of edge texture cues. On the contrary, in the third the fusion images. The fused image enhanced the individual
line of the experiment, the fused images of other methods and cooperative information, reducing the interference of
were significantly affected by the T1WI mode, resulting in redundant information. So, although the overall mutual infor-
a decrease in image quality. In terms of tumor details, while mation decreases from a macro perspective, the synergy of
EMFusion, SeAFusion, and U2Fusion retain information information increases. So, there is a specific reduction in MI.
from a variety of patterns representing different tumor tissues,
like SwinFusion, MRSCFusion, and DesTrans, in their fusion
results, The boundaries of the tumor core, post-enhancement 2) EXTENSION TO SPECT AND MRI FUSION
tumor, and edema were not apparent. In contrast, TIEF TIEF was experimentally performed on SPECT and MRI
delineates the pixels of different tumor tissues in the loss images within the AANLIB dataset to further demonstrate
function, which ensures a more obvious distinction between the generality of the proposed method. Seventy pairs of multi-
tumor core, enhanced tumor, and edema in the final fusion model MRI/SPECT images were used for training, resampled
result. to 128 × 128, of which 5 pairs were used for testing. Since
The qualitative fusion results are shown in Table 1. The the SPECT images are RGB, the investigators converted
comparative analysis with the other 11 methods showed them to YUV and extracted the Y-channel for fusion with
that TIEF had better EN, SF, AG, and SSIM scores, the TIEF single-channel grayscale MRI images. The output
indicating that the fusion image quality was higher, and the y-component is the basis for the fused image, which is
multimodal image feature information was better preserved. then converted back to the RGB of the final image. T2WI
In addition, TIEF obtained the best QAB/F and CI scores, images have rich texture details, and in GAD and SPECT-
indicating reduced distortion, improved visual quality, and T1WI images, there is a clear contrast between the pixel
good agreement with human visual perception. The PSNR values of normal tissue and the diseased area. Therefore,
index of TIEF is suboptimal. The reason is that PSNR is the the adopted fusion evaluation criteria prioritize preserving
most common and widely used objective image evaluation precise texture details, structural information, and pixel con-
index, but it is different from human visual characteristics. trast within the lesion area. The evaluation results are shown
The human eye has a high sensitivity to luminance contrast in Table 2 and Fig. 9.

70860 VOLUME 12, 2024


Y. Zhou et al.: Multimodal Medical Image Fusion Network Based on Target Information Enhancement

FIGURE 8. On the three typical image pairs of the Brats2019 dataset. a, b, and c correspond to three typical images. The fusion results
obtained by CBF, MGFF, FusionGAN, DDcGAN, GANMcC, EMFusion, SeAFusion, U2Fusion, SwinFusion, MRSCFusion, DesTran and TIEF are
shown in order. The enlarged section in the bottom corner provides a more detailed comparison.

The results showed that CBF introduced noise and reduced and U2Fusion images are blurred, lack texture detail, and
image quality. MGFF, FusionGAN, GANMcC, SwinFusion, have low pixel intensity and contrast in critical areas.

VOLUME 12, 2024 70861


Y. Zhou et al.: Multimodal Medical Image Fusion Network Based on Target Information Enhancement

FIGURE 8. (Continued.) On the three typical image pairs of the Brats2019 dataset. a, b, and c correspond to three typical images. The fusion results
obtained by CBF, MGFF, FusionGAN, DDcGAN, GANMcC, EMFusion, SeAFusion, U2Fusion, SwinFusion, MRSCFusion, DesTran and TIEF are shown in
order. The enlarged section in the bottom corner provides a more detailed comparison.

TABLE 2. Quantitative comparison of different methods for 8 evaluation items indicators in the AANLIB dataset (Red: Optimal, Blue: Suboptimal).

SeAFusion retained relatively rich texture and structural contrast. Compared with other methods, TIEF effectively
information, but the overall image was dark, probably due preserves the intricate texture of the source image and
to the ubiquitous black background in SPECT-T1WI, which the color information of RGB, which is more suitable
affected the pixel intensity of the final image. MRSCFusion, for human visual perception. Based on the quantitative
DDcGAN, EMFusion, and DesTrans retain more comprehen- analysis, TIEF obtained the optimal EN, SF, QAB/F , CI,
sive source image information but require enhanced detail AG, and the suboptimal MI, PSNR, and SSIM in the

70862 VOLUME 12, 2024


Y. Zhou et al.: Multimodal Medical Image Fusion Network Based on Target Information Enhancement

FIGURE 9. On the three typical image pairs of the ANNLIB dataset. a, b, and c correspond to three typical images. The fusion results obtained by CBF,
MGFF, FusionGAN, DDcGAN, GANMcC, EMFusion, SeAFusion, U2Fusion, SwinFusion, MRSCFusion, DesTran and TIEF are shown in order. The enlarged
section in the bottom corner provides a more detailed comparison.

test images. The main reasons for suboptimal MI and is higher than that of TIEF 0.012. However, from the
PSNR are the same as the BraTs2019 dataset, but the visual point of view, the gradient change inside the tumor
main reason for suboptimal SSIM is that the SeAFusion of TIEF is more prominent, while there is no gradient
network takes fine-grained details into account when it is change inside the tumor of SeAFusion, which will have a
constructed, and the SeAFusion network does not use any particular impact on the localization of the cancer, and this
downsampling, which indicates that SEAFusion keeps more result is further verified in the downstream segmentation
similar information. This is why the SSIM of SeAFusion task.

VOLUME 12, 2024 70863


Y. Zhou et al.: Multimodal Medical Image Fusion Network Based on Target Information Enhancement

FIGURE 10. Qualitative comparison of three typical image pairs in the BraTs2019 dataset to validate the
effect of different blocks. From left to right, the sequence comprises FLAIR, T1WI, T1Gd, and T2WI images,
followed by the fusion results of our method, fusion results without the fusion block, fusion results without
the feature selection block, and fusion results without the feature loss (M-SSIM) block.

TABLE 3. Quantitative results obtained with a combination of different blocks in the Brats2019 dataset (Red: Optimal, Blue: Suboptimal).

FIGURE 11. Qualitative comparison of three typical image pairs in the AANLIB dataset to validate the
effect of different blocks. The sequence from left to right includes T2WI, GAD, and SPECT-T1WI images,
followed by the fusion results of our method, fusion results without the fusion block, fusion results
without the feature selection block, and fusion results without the feature loss (M-SSIM) block.

3) ABLATION STUDY embody diverse modalities, each showcasing distinct tumor-


To assess the impact of various blocks on the model’s related information. The primary aim was to ensure that the
efficacy, we conducted experiments on both the BraTs2019 fusion results maintained the intricate details from the source
and AANLIB test datasets. Three representative images were maps while accentuating discrepancies among tumor regions.
selected for these experimental evaluations. These images The adopted ablation experiments encompassed different

70864 VOLUME 12, 2024


Y. Zhou et al.: Multimodal Medical Image Fusion Network Based on Target Information Enhancement

TABLE 4. Quantitative results obtained with a combination of different blocks in the AANLIB dataset (Red: Optimal, Blue: Suboptimal).

TABLE 5. Segmentation task results.

combinations of the loss function, feature selection, and fusion results maintained the intricate details from the source
fusion blocks. The ‘‘Baseline’’ scenario denotes training maps while accentuating discrepancies among tumor regions.
the network without any additional blocks. Observing the The adopted ablation experiments encompassed different
outcomes (Fig. 10), in the absence of the fusion block, combinations of the loss function, feature selection, and
although the fused image retains part of the texture and fusion blocks. The ‘‘Baseline’’ scenario denotes training the
structural information and the tumor area is also apparent, network without any additional blocks. Observing the out-
there is a significant deviation from the actual image. This comes (Fig. 10), in the absence of the fusion block, although
bias leads to substantial information loss, contrary to human the fused image retains part of the texture and structural
visual perception. Comparing the results across experiments information and the tumor area is also apparent, there is a
(Table. 3), the addition of the fusion block enhances various significant deviation from the actual image. This bias leads to
performance metrics such as QAB/F , EN, SF, MI, and PSNR, substantial information loss, contrary to human visual percep-
while showing suboptimal performance in AG, CI, and SSIM tion. Comparing the results across experiments (Table. 3), the
metrics. addition of the fusion block enhances various performance
metrics such as QAB/F , EN, SF, MI, and PSNR, while show-
4) ABLATION STUDY ing suboptimal performance in AG, CI, and SSIM metrics.
To assess the impact of various blocks on the model’s
efficacy, we conducted experiments on both the BraTs2019
and AANLIB test datasets. Three representative images were 5) ABLATION STUDY
selected for these experimental evaluations. These images To assess the impact of various blocks on the model’s
embody diverse modalities, each showcasing distinct tumor- efficacy, we conducted experiments on both the BraTs2019
related information. The primary aim was to ensure that the and AANLIB test datasets. Three representative images were

VOLUME 12, 2024 70865


Y. Zhou et al.: Multimodal Medical Image Fusion Network Based on Target Information Enhancement

TABLE 6. Quantitative comparison of different methods for 8 evaluation items indicators in the AANLIB meningioma dataset.

selected for these experimental evaluations. These images of adding loss function blocks to the experiment is inves-
embody diverse modalities, each showcasing distinct tumor- tigated. The addition of this block significantly improved
related information. The primary aim was to ensure that the EN, AG, CI, MI, PSNR, and other evaluation indicators.
fusion results maintained the intricate details from the source Comparing the TIEF method with or without the addition of
maps while accentuating discrepancies among tumor regions. the loss function block confirmed its importance in enhancing
The adopted ablation experiments encompassed different the contrast between different tumor tissues. The figure shows
combinations of the loss function, feature selection, and that the paired comparison of the fusion results could be
fusion blocks. The ‘‘Baseline’’ scenario denotes training better without the loss function, highlighting the superiority
the network without any additional blocks. Observing the of TIEF.
outcomes (Fig. 10), in the absence of the fusion block, The same test was conducted on the AANLIB dataset.
although the fused image retains part of the texture and Fig. 11 shows that the experimental results without the
structural information and the tumor area is also apparent, fusion block deviate from the image’s source and distort
there is a significant deviation from the actual image. This the color part. The absence of the feature selection block
bias leads to substantial information loss, contrary to human in the experiments leads to a noticeable lack of detailed
visual perception. Comparing the results across experiments information, resulting in blurred images. Similarly, when
(Table. 3), the addition of the fusion block enhances various the experiments lacked a loss function block, the contrast
performance metrics such as QAB/F , EN, SF, MI, and PSNR, in the front and back scenes was notably diminished.
while showing suboptimal performance in AG, CI, and SSIM Table. 4 also illustrates the different effects of the fusion,
metrics. feature selection, and loss function blocks from different
The fusion block better integrated the original image perspectives.
information and fused imaging. This underscores its critical
role in producing high-quality fusion results. In this study,
the baseline approach of feature selection blocks to enhance 6) DOWNSTREAM TASK VALIDATION
fusion results was compared with TIEF, eliminating the effect To verify the model’s effectiveness further, we connected
of feature selection blocks. For example, although the image’s the segmentation network (no-new-Net) nn-Unet after 11
edge texture information from the first row of T2WI is very comparison methods and TIEF to further verify the model’s
prominent, this detail must be accurately represented in the effectiveness through the effect of segmentation. The
fusion results. The advantages of using feature selection obtained segmentation results are shown in Table 5. Dice
blocks become more apparent through our comparison. The represents the similarity of the two samples, ranging from 0
baseline method with a feature selection block significantly to 1, and the closer to 1, the better the segmentation effect.
improved the evaluation indicators compared with the base- HD95 indicates the degree of overlap of the boundaries, and
line method alone. From the comparative analysis in the table, smaller values represent better segmentation. ET represents
the lack of feature selection reduces the information richness the enhancing tumor, TC represents the tumor core, and WT
of the source map in the fusion results. The experimental represents the whole tumor. Through the segmentation results
results show that the feature selection block dramatically of enhancing tumor, tumor core, and the whole tumor, it was
improves the network’s fusion effect. In addition, the impact found that TIEF had the largest Dice and the lowest HD95,

70866 VOLUME 12, 2024


Y. Zhou et al.: Multimodal Medical Image Fusion Network Based on Target Information Enhancement

E. HYPER PARAMETERS COMPARISON


In Eq. (17), α, β, η is the balance coefficient (η = 1−α −
β, α ̸= 0, β ̸= 0, η ̸= 0). In this paper, we set the
value of α = {0.1,0.2,0.3,0.4,0.5,0.7} and the value of
β = {0.1,0.2,0.3,0.4,0.5,0.7} for experimentation. When
α = 0.6 or β = 0.6, the loss function fails to
converge. Each combination was assessed based on various
evaluation indices, and the values were recorded and ranked
in descending order. The best result is when α = 0.3,β =
0.4, η = 1−α − β = 0.3. Observing the loss function of
the pre-experiment, it is found that TIEF begins to decline
FIGURE 12. The parameters are selected as grid plots. Values represent smoothly at an epoch equal to 200 and reaches convergence
index ranking.
around an epoch equal to 320, so epoch 320 is selected
for the formal experiment in this paper. In this paper,
κ is the adjustment coefficient, which is used to control
the proportional change of information entropy of different
modes in weight calculation, to ensure the best fusion effect
of the model. κ mainly plays a key role in Eq. (10) and
Eq. (11), which correspond to the two evaluation indexes EN
and SSIM respectively. The curve of EN and SSIM with the
change of κ value showed that when κ was 0.25, both EN and
SSIM reached the maximum.

FIGURE 13. Loss function curve. V. CONCLUSION


This paper proposed a multi-modal MRI image fusion
method, emphasizing the description of the edema part,
enhanced tumor, and necrotic tumor core in different modal-
ities to generate a fusion image with rich texture information
and clear structure. In the coding region, based on denseness,
we adopted the parallel double-branch design of deep
feature extraction and structural feature extraction to maintain
the balance between structural information and functional
information to better extract and transmit the knowledge
FIGURE 14. Curves for changes in age-adjusted coefficients of EN and of the source image. The feature information measurement
SSIM. method based on information entropy was introduced, and
the feature channels containing rich texture information and
complex structure information were selected to fuse with the
indicating that the segmentation effect of TIEF-based image deep features to enhance the richness of the fused image
fusion was the best, indicating that the fused image of TIEF information. The content richness of the source image of
was of high quality. different modes was used as the weight and combined with
the regional structural similarity index and regional contrast,
7) GENERALIZATION STUDY the loss function was constructed to enhance the difference
To further verify the generalization of the model. We used between tumor tissues. The transformer block with an
meningioma CT and MRI data from the ANNLIB data set. attention mechanism replaced the manually designed fusion
Meningioma is a primary intracranial tumor, primarily benign strategy, and the cross-modal image features were fused.
and asymptomatic in the early stage. However, with the Experiments demonstrate that the linked images generated by
compression of the tumor, headache and epilepsy may occur, the TIEF model produce satisfactory results in multi-modal
and the loss of vision, hearing, and smell may occur in severe brain tumor image fusion tasks for both MMRI, SPECT-MRI
cases. Meningiomas grow between the human skull and brain and CT-MRI images. Qualitative and quantitative analysis
tissue, which differs from the growth location of glioma. verified the validity and generalization of TIEF.
Early screening and diagnosis of meningioma can prolong
the survival of patients. TIEF performed a fusion of CT and REFERENCES
MRI, and the results are shown in Table 6. By calculating the [1] D. Xiao, C. Yan, D. Li, T. Xi, X. Liu, D. Zhu, G. Huang, J. Xu, Z. He,
generalization results, it was found that the indexes reached A. Wu, C. Ma, J. Long, and K. Shu, ‘‘National brain tumour registry of
China (NBTRC) statistical report of primary brain tumours diagnosed
optimal, indicating that the generalization effect of TIEF was in China in years 2019–2020,’’ Lancet Regional Health-Western Pacific,
good. vol. 34, May 2023, Art. no. 100715.

VOLUME 12, 2024 70867


Y. Zhou et al.: Multimodal Medical Image Fusion Network Based on Target Information Enhancement

[2] T. M. Mack and M. Cockburn, Cancers in the Urban Environment. [26] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B.
Amsterdam, The Netherlands: Elsevier, 2004, pp. 5–8. Guo, ‘‘Swin transformer: Hierarchical vision transformer using shifted
[3] D. Ray et al., ‘‘Apoptosis reference block in biomedical sciences,’’ Biomed. windows,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021,
Sci., vol. 10, pp. 10–16, Dec. 2014. pp. 9992–10002.
[4] J. R. Foster et al., ‘‘Introduction to neoplasia,’’ in Comprehensive [27] J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, and R. Timofte, ‘‘SwinIR:
Toxicology, vol. 14, Feb. 2017, pp. 1–10. Image restoration using Swin transformer,’’ in Proc. IEEE/CVF Int. Conf.
[5] J. Yokota, ‘‘Tumor progression and metastasis,’’ Carcinogenesis, vol. 21, Comput. Vis. Workshops (ICCVW), Oct. 2021, pp. 1833–1844.
no. 3, pp. 497–503, 2000. [28] J. Du, W. Li, B. Xiao, and Q. Nawaz, ‘‘Union Laplacian pyramid with
[6] S. J. Moon et al., ‘‘Tumors of the brain central nervous system cancer multiple features for medical image fusion,’’ Neurocomputing, vol. 194,
rehabilitation,’’ Tumors Brain, vol. 10, pp. 19–39, Feb. 2019. pp. 326–339, Jun. 2016.
[7] H. Sontheimer, ‘‘Infectious diseases of the nervous system,’’ 2015. [29] S. Li, H. Yin, and L. Fang, ‘‘Group-sparse representation with dictionary
learning for medical image denoising and fusion,’’ IEEE Trans. Biomed.
[8] N. Reynoso-Noverón et al., ‘‘Epidemiology of brain tumors,’’ in Principles
Eng., vol. 59, no. 12, pp. 3450–3459, Dec. 2012.
of Neuro-Oncology, vol. 7, Dec. 2020, pp. 15–25.
[30] S. Maqsood and U. Javed, ‘‘Multi-modal medical image fusion based
[9] S. Li, B. Yang, and J. Hu, ‘‘Performance comparison of different multi- on two-scale image decomposition and sparse representation,’’ Biomed.
resolution transforms for image fusion,’’ Inf. Fusion, vol. 12, no. 2, Signal Process. Control, vol. 57, Mar. 2020, Art. no. 101810.
pp. 74–84, Apr. 2011.
[31] C. Du and S. Gao, ‘‘Image segmentation-based multi-focus image fusion
[10] Y. Zheng, E. Blasch, and Z. Liu, Multispectral Image Fusion and through multi-scale convolutional neural network,’’ IEEE Access, vol. 5,
Colorization. Bellingham, WA, USA: SPIE Press, Mar. 2018. pp. 15750–15761, 2017.
[11] J. Jose, N. Gautam, M. Tiwari, T. Tiwari, A. Suresh, V. Sundararaj, and [32] Z. Wang, X. Li, H. Duan, X. Zhang, and H. Wang, ‘‘Multifocus image
R. Mr, ‘‘An image quality enhancement scheme employing adolescent fusion using convolutional neural networks in the discrete wavelet trans-
identity search algorithm in the NSST domain for multimodal medical form domain,’’ Multimedia Tools Appl., vol. 78, no. 24, pp. 34483–34512,
image fusion,’’ Biomed. Signal Process. Control, vol. 66, Apr. 2021, Aug. 2019.
Art. no. 102480.
[33] X. Guo, R. Nie, J. Cao, D. Zhou, L. Mei, and K. He, ‘‘FuseGAN:
[12] K. Zhang, Y. Huang, and C. Zhao, ‘‘Remote sensing image fusion Learning to fuse multi-focus image via conditional generative adversarial
via RPCA and adaptive PCNN in NSST domain,’’ Int. J. Wavelets, network,’’ IEEE Trans. Multimedia, vol. 21, no. 8, pp. 1982–1996,
Multiresolution Inf. Process., vol. 16, no. 5, Sep. 2018, Art. no. 1850037. Aug. 2019.
[13] J. Ma, C. Chen, C. Li, and J. Huang, ‘‘Infrared and visible image fusion [34] D. Rao, T. Xu, and X.-J. Wu, ‘‘TGFuse: An infrared and visible
via gradient transfer and total variation minimization,’’ Inf. Fusion, vol. 31, image fusion approach based on transformer and generative adversarial
pp. 100–109, Sep. 2016. network,’’ IEEE Trans. Image Process., early access, May 10, 2023, doi:
[14] W. Xu, Y.-L. Fu, H. Xu, and K. K. L. Wong, ‘‘Medical image fusion 10.1109/TIP.2023.3273451.
using enhanced cross-visual cortex model based on artificial selection and [35] Y. Fu, T. Xu, X. Wu, and J. Kittler, ‘‘PPT fusion: Pyramid patch
impulse-coupled neural network,’’ Comput. Methods Programs Biomed., transformerfor a case study in image fusion,’’ 2021, arXiv:2107.13967.
vol. 229, Feb. 2023, Art. no. 107304. [36] V. Vs, J. M. Jose Valanarasu, P. Oza, and V. M. Patel, ‘‘Image fusion
[15] A. R. Groves, C. F. Beckmann, S. M. Smith, and M. W. Woolrich, transformer,’’ in Proc. IEEE Int. Conf. Image Process. (ICIP), Oct. 2022,
‘‘Linked independent component analysis for multimodal data fusion,’’ pp. 3566–3570.
NeuroImage, vol. 54, no. 3, pp. 2198–2217, Feb. 2011. [37] L. Qu, S. Liu, M. Wang, and Z. Song, ‘‘Transmef: A transformer-based
[16] W. Huang, H. Zhang, H. Guo, W. Li, X. Quan, and Y. Zhang, ‘‘ADDNS: multi-exposure image fusion framework using self-supervised multi-
An asymmetric dual deep network with sharing mechanism for medical task learning,’’ in Proc. AAAI Conf. Artif. Intell., 2022, vol. 36, no. 2,
image fusion of CT and MR-T2,’’ Comput. Biol. Med., vol. 166, Nov. 2023, pp. 2126–2134.
Art. no. 107531. [38] D. P. Bavirisetti, G. Xiao, J. Zhao, R. Dhuli, and G. Liu, ‘‘Multi-
[17] B. Zhan, D. Li, X. Wu, J. Zhou, and Y. Wang, ‘‘Multi-modal MRI image scale guided image and video fusion: A fast and efficient approach,’’
synthesis via GAN with multi-scale gate mergence,’’ IEEE J. Biomed. Circuits, Syst., Signal Process., vol. 38, no. 12, pp. 5576–5605,
Health Informat., vol. 26, no. 1, pp. 17–26, Jan. 2022. May 2019.
[18] J. Huang, Z. Le, Y. Ma, F. Fan, H. Zhang, and L. Yang, ‘‘MGMDc- [39] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, ‘‘Image quality
GAN: Medical image fusion using multi-generator multi-discriminator assessment: From error visibility to structural similarity,’’ IEEE Trans.
conditional generative adversarial network,’’ IEEE Access, vol. 8, Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
pp. 55145–55157, 2020. [40] U. Baid, S. Ghodasara, S. Mohan, M. Bilello, E. Calabrese, E. Colak,
[19] T. Zhou, Q. Li, H. Lu, Q. Cheng, and X. Zhang, ‘‘GAN review: Models K. Farahani, J. Kalpathy-Cramer, F. C. Kitamura, S. Pati, and
and medical image fusion applications,’’ Inf. Fusion, vol. 91, pp. 134–148, L. M. Prevedello, ‘‘The RSNA-ASNR-MICCAI BraTS 2021 benchmark
Mar. 2023. on brain tumor segmentation and radiogenomic classification,’’ 2021,
[20] M. Jiang, M. Zhi, L. Wei, X. Yang, J. Zhang, Y. Li, P. Wang, J. Huang, and arXiv:2107.02314.
G. Yang, ‘‘FA-GAN: Fused attentive generative adversarial networks for [41] D. Summers, ‘‘Harvard whole brain atlas,’’ J. Neurol. Neurosurgery
MRI image super-resolution,’’ Computerized Med. Imag. Graph., vol. 92, Psychiatry, vol. 73, no. 3, p. 288, 2003.
Sep. 2021, Art. no. 101969. [42] B. K. S. Kumar, ‘‘Image fusion based on pixel significance using cross
[21] L. Wang, C. Chang, B. Hao, and C. Liu, ‘‘Multi-modal medical image bilateral filter,’’ Signal, Image Video Process., vol. 9, no. 5, pp. 1193–1204,
fusion based on GAN and the shift-invariant shearlet transform,’’ in Proc. Jul. 2015.
IEEE Int. Conf. Bioinf. Biomed. (BIBM), Dec. 2020, pp. 2538–2543. [43] H. Xu, J. Ma, J. Jiang, X. Guo, and H. Ling, ‘‘U2Fusion: A unified
[22] J. Kang, W. Lu, and W. Zhang, ‘‘Fusion of brain PET and MRI images unsupervised image fusion network,’’ IEEE Trans. Pattern Anal. Mach.
using tissue-aware conditional generative adversarial network with joint Intell., vol. 44, no. 1, pp. 502–518, Jan. 2022.
loss,’’ IEEE Access, vol. 8, pp. 6368–6378, 2020. [44] H. Xu and J. Ma, ‘‘EMFusion: An unsupervised enhanced medical image
[23] K. Guo, X. Hu, and X. Li, ‘‘MMFGAN: A novel multimodal brain fusion network,’’ Inf. Fusion, vol. 76, pp. 177–186, Dec. 2021.
medical image fusion based on the improvement of generative adversarial [45] L. Tang, J. Yuan, and J. Ma, ‘‘Image fusion in the loop of high-level
network,’’ Multimedia Tools Appl., vol. 81, no. 4, pp. 5889–5927, vision tasks: A semantic-aware real-time infrared and visible image fusion
Feb. 2022. network,’’ Inf. Fusion, vol. 82, pp. 28–42, Jun. 2022.
[24] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, and L. Jones, ‘‘Attention [46] J. Ma, W. Yu, P. Liang, C. Li, and J. Jiang, ‘‘FusionGAN: A generative
is all you need,’’ in Proc. Adv. Neural Inf. Process. Syst., vol. 30, 2017, adversarial network for infrared and visible image fusion,’’ Inf. Fusion,
pp. 1–13. vol. 48, pp. 11–26, Aug. 2019.
[25] W. Li, Y. Zhang, G. Wang, Y. Huang, and R. Li, ‘‘DFENet: A dual-branch [47] J. Ma, H. Xu, J. Jiang, X. Mei, and X.-P. Zhang, ‘‘DDcGAN:
feature enhanced network integrating transformers and convolutional A dual-discriminator conditional generative adversarial network for
feature learning for multimodal medical image fusion,’’ Biomed. Signal multi-resolution image fusion,’’ IEEE Trans. Image Process., vol. 29,
Process. Control, vol. 80, Feb. 2023, Art. no. 104402. pp. 4980–4995, 2020.

70868 VOLUME 12, 2024


Y. Zhou et al.: Multimodal Medical Image Fusion Network Based on Target Information Enhancement

[48] J. Ma, H. Zhang, Z. Shao, P. Liang, and H. Xu, ‘‘GANMcC: A generative XUEMEI YANG received the Bachelor of Science
adversarial network with multiclassification constraints for infrared degree in mathematics from Minzu University of
and visible image fusion,’’ IEEE Trans. Instrum. Meas., vol. 70, China, in 2015, and the Master of Science degree
pp. 1–14, 2021. in statistics from North China Electric Power
[49] J. Ma, L. Tang, F. Fan, J. Huang, X. Mei, and Y. Ma, ‘‘SwinFusion: University, in 2019. She is currently pursuing the
Cross-domain long-range learning for general image fusion via Swin degree in computational mathematics with China
transformer,’’ IEEE/CAA J. Autom. Sinica, vol. 9, no. 7, pp. 1200–1217, Academy of Engineering Physics. Her research
Jul. 2022.
interest includes multimodal information fusion.
[50] X. Xie, X. Zhang, S. Ye, D. Xiong, L. Ouyang, B. Yang, H. Zhou, and
Y. Wan, ‘‘MRSCFusion: Joint residual Swin transformer and multiscale
CNN for unsupervised multimodal medical image fusion,’’ IEEE Trans.
Instrum. Meas., vol. 72, pp. 1–17, 2023.
[51] Y. Song, Y. Dai, W. Liu, Y. Liu, X. Liu, Q. Yu, X. Liu, N. Que, and
M. Li, ‘‘DesTrans: A medical image fusion method based on transformer
and improved DenseNet,’’ Comput. Biol. Med., vol. 174, May 2024,
Art. no. 108463.
[52] G. Cui, H. Feng, Z. Xu, Q. Li, and Y. Chen, ‘‘Detail preserved fusion
of visible and infrared images using regional saliency extraction and SHIQI LIU received the Bachelor of Science
multi-scale image decomposition,’’ Opt. Commun., vol. 341, pp. 199–209, degree in mathematics and applied mathematics
Apr. 2015. from Beihang University, in 2021. She is cur-
[53] N. Yu, T. Qiu, F. Bi, and A. Wang, ‘‘Image features extraction and fusion rently pursuing the Ph.D. degree in computational
based on joint sparse representation,’’ IEEE J. Sel. Topics Signal Process., mathematics with China Academy of Engineering
vol. 5, no. 5, pp. 1074–1082, Sep. 2011. Physics. Her research interests include biomedical
[54] J. Van Aardt, ‘‘Assessment of image fusion procedures using entropy, signal processing, time series analysis, and deep
image quality, and multispectral classification,’’ J. Appl. Remote Sens., learning.
vol. 2, no. 1, May 2008, Art. no. 023522.
[55] G. Qu, D. Zhang, and P. Yan, ‘‘Information measure for performance of
image fusion,’’ Electron. Lett., vol. 38, no. 7, p. 313, 2002.
[56] A. R. Alankrita, A. Shrivastava, and V. Bhateja, ‘‘Contrast improvement
of cerebral MRI features using combination of non-linear enhancement
operator and morphological filter,’’ IEEE J. Sel. Topics Signal Process.,
vol. 4, pp. 182–187, 2011.
[57] C. S. Xydeas and V. Petrovic, ‘‘Objective image fusion performance
measure,’’ Electron. Lett., vol. 36, no. 4, pp. 308–309, 2000. JUNPING YIN received the Bachelor of Science
and the Master of Science degrees from the School
of Mathematics and Statistics, Northeast Normal
University, in 2002 and 2005, respectively, and
the Doctor of Science degree from the School
YUTING ZHOU received the bachelor’s degree of Mathematics, Xiamen University, in 2008.
in applied statistics from Southern Medical Uni- He is currently a Researcher with the Institute of
versity, in 2019, and the master’s degree in Applied Physics and Computational Mathematics,
applied statistics from Northeast Normal Univer- Beijing, and the President of Shanghai Zhangjiang
sity, in 2021. She is currently pursuing the Ph.D. Institute of Mathematics. He has been engaged in
degree in computational mathematics with China data science and applied mathematics research for a long time and presided
Academy of Engineering Physics. Her current over more than 20 major projects of the National Natural Science Foundation.
research interests include image processing, com- He has published more than 30 papers, more than ten patents and soft books,
puter vision, and information fusion. and one monograph.

VOLUME 12, 2024 70869

You might also like