0% found this document useful (0 votes)
6 views

Contrastive_Learning-Based_Semantic_Communications

Uploaded by

Srinivas Rao
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Contrastive_Learning-Based_Semantic_Communications

Uploaded by

Srinivas Rao
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

6328 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 72, NO.

10, OCTOBER 2024

Contrastive Learning-Based
Semantic Communications
Shunpu Tang , Qianqian Yang , Member, IEEE, Lisheng Fan , Xianfu Lei , Member, IEEE,
Arumugam Nallanathan , Fellow, IEEE, and George K. Karagiannidis , Fellow, IEEE

Abstract— Recently, there has been a growing interest in significant accuracy improvement of up to 53% on the CIFAR-10
learning-based semantic communication because it can prioritize dataset with a bandwidth compression ratio of 1/24, and also
the preservation of meaningful semantic information over the obtain comparable image reconstruction quality as the bandwidth
accuracy of the transmitted symbols, resulting in improved compression ratio is improved.
communication efficiency. However, existing learning-based
approaches still face limitations in defining semantic level loss Index Terms— Semantic communication, contrastive learning,
and often struggle to find a good trade-off between preserving joint source-channel coding, image transmission.
semantic information and preserving intricate details. In addi-
tion, the existing semantic communication approaches cannot I. I NTRODUCTION
effectively train semantic encoders and decoders without the A. Backgrounds
support of downstream models. To address these limitations,
this paper proposes a contrastive learning (CL)-based semantic
communication system. First, inspired by practical observations,
we introduce the concept of semantic contrastive loss and propose
T HE goal of digital communication system has been to
reliably transmit bits through noisy channels, which is
typically categorized as the technical level of communication.
a semantic contrastive coding (SemCC) approach that treats data The classical information theory proposed by Shannon [2],
corruption during transmission as a form of data augmentation
within the CL framework. Moreover, we propose a semantic
provided the fundamental principle for achieving this goal,
re-encoding (SemRE) operation, which uses a duplicate of the which introduced the concept of channel capacity to provide a
semantic encoder deployed at the receiver to guide the entire theoretical upper bound on the data rate that ensures error-free
training process when the downstream model is inaccessible. transmission. Researchers have made considerable efforts to
Further, we design the training procedure for SemCC and SemRE approximate the channel capacity by developing the advanced
approaches, respectively, to balance the semantic information
and intricate details. Finally, simulations are performed to
channel coding techniques such as low-density parity check
demonstrate the superiority of the proposed approaches over (LDPC) [3] and polar code [4] in the 5G New Radio (NR).
competing approaches. In particular, our approaches achieve a While modern digital communication systems based on these
approaches have achieved remarkable progress, they do not
Manuscript received 9 November 2023; revised 8 March 2024 and 3 May explicitly consider the underlying meaning of the transmitted
2024; accepted 4 May 2024. Date of publication 14 May 2024; date of
current version 18 October 2024. This work was supported by the NSFC data and therefore treat all bits as equal, which may lead to
(No. U23A20273). An earlier version of this paper was presented in part challenges in future applications.
at the IEEE 98th Vehicular Technology Conference (VTC-Fall), Hong Kong, In the upcoming Beyond 5G (B5G) and 6G networks,
October 2023 [DOI: 10.1109/VTC2023-Fall60731.2023.10333392]. The asso-
ciate editor coordinating the review of this article and approving it for a large number of Internet of Things (IoT) devices will
publication was C. Shen. (Corresponding author: Lisheng Fan.) be deployed and various types of multimedia data will be
Shunpu Tang is with the School of Computer Science and Cyber Engi- transmitted to support novel applications such as smart cities,
neering, Guangzhou University, Guangzhou 510006, China, and also with
the College of Information Science and Electronic Engineering, Zhejiang automated driving, virtual reality (VR), and augmented reality
University, Hangzhou 310027, China (e-mail: [email protected]). (AR) [5], [6], [7]. However, the large number of connections,
Qianqian Yang is with the College of Information Science and Elec- data transfer requirements, and ultra-low latency demands
tronic Engineering and the Key Laboratory of Collaborative Sensing and
Autonomous Unmanned Systems of Zhejiang Province, Zhejiang University, will place a significant burden on the network infrastruc-
Hangzhou 310027, China (e-mail: [email protected]). ture. To address these challenges, researchers are shifting
Lisheng Fan is with the School of Computer Science and Cyber their focus to improving communication efficiency within the
Engineering, Guangzhou University, Guangzhou 510006, China (e-mail:
[email protected]). constraints of available channel capacity. In this direction,
Xianfu Lei is with the Institute of Mobile Communications, Southwest Jiao- the importance and meaning behind the transmitted data are
tong University, Chengdu 610031, China (e-mail: [email protected]). taken into account in the system design, and the concept of
Arumugam Nallanathan is with the School of Electronic Engineering and
Computer Science, Queen Mary University of London, E1 4NS London, U.K. semantic communication has attracted increasing attention [8],
(e-mail: [email protected]). [9], [10], [11]. Semantic communication works under the
George K. Karagiannidis is with the Department of Electrical and Com- semantic and effectiveness level of communication, which
puter Engineering, Aristotle University of Thessaloniki, 541 24 Thessaloniki,
Greece, and also with the Artificial Intelligence and Cyber Systems Research aims to prioritize the preservation of meaningful semantic
Center, Lebanese American University (LAU), Beirut 03797751, Lebanon information over the accuracy of transmitted symbols, leading
(e-mail: [email protected]). to improved communication efficiency by transmitting only
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TCOMM.2024.3400912. necessary information relevant to the specific task at the
Digital Object Identifier 10.1109/TCOMM.2024.3400912 receiver. These characteristics of semantic communication
0090-6778 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: SRM University Amaravathi. Downloaded on December 23,2024 at 05:43:43 UTC from IEEE Xplore. Restrictions apply.
TANG et al.: CONTRASTIVE LEARNING-BASED SEMANTIC COMMUNICATIONS 6329

can also better meet the requirements of the aforementioned properties of semantic information. In particular, when we
applications in B5G and 6G networks. compare two unrelated entities, it becomes clear that their
However, a major challenge in semantic communication semantic information has significant differences. In contrast,
is how to effectively extract semantic information at the the semantic information may not change much when a data
transmitter while accurately reconstructing it at the receiver augmentation operation is performed on the original entity.
under constrained communication conditions. While recent These observations can motivate us to incorporate the pop-
efforts used advanced deep learning technologies in semantic ular CL approach into semantic communication, since the
communication systems, there are still some issues that need principles of CL align closely with those of the semantic
to be addressed, which are discussed below. communication system and CL has demonstrated significant
1) How to Evaluate the Loss of Semantic Level During achievements across various domains, including computer
the Training Process: While the performance of a semantic vision [13], [14], [15], [16], [17], natural language processing
communication system can be effectively evaluated by the (NLP) [18], and multimodal applications [19], [20], [21]. More
downstream task, it is crucial to note that the direct use importantly, CL can help improve the generalization ability
of the loss functions of the downstream task, such as the and robustness of deep models [22], [23], [24], [25]. The
cross-entropy loss [12], may not fully match the intrinsic works in [26] and [27] first introduced the concept into seman-
characteristics of semantic information, and may not guide tic communication systems to help extract useful semantic
the training of semantic encoder and decoder well, which information. However, these works still face limitations in
could lead to a decreased effectiveness and robustness. Since evaluating the semantic level loss when these systems work
semantic information is closely related to the meaningful in a noisy channel since they ignore the fact that in an ideal
content and contextual aspects of the data, these may be semantic communication system, the semantic information at
overlooked by the specific labels predicted by the downstream the receiver should remain basically unchanged from its state
network and the use of cross-entropy loss. Therefore, it is before transmission, while still being distinguishable from
necessary to integrate the inherent properties of semantic those unrelated entities.
information into the semantic level loss. To this end, we propose the semantic contrastive coding
2) How to Train Semantic Encoders and Decoders Without (SemCC) approach, which deeply explores the introduc-
the Help of a Pre-Trained Downstream Network: The semantic tion of CL process to the semantic communication system.
communication system faces significant challenges in scenar- Specifically, we replace the conventional data augmentation
ios where the receiver is prohibited from accessing not only the procedure with a wireless transmission process. This change
weights but also the architecture of the downstream model (i.e. is based on the idea that the distortion caused by the noise
the pre-trained downstream is black-box). This limitation is and fading characteristics of the wireless channel during trans-
due to encryption and other security measures, and it severely mission can be considered as a form of data augmentation.
hinders the simultaneous training of the semantic encoder and Therefore, we design the semantic-level loss for SemCC to
decoder in [12]. One possible solution is to retrain a deep ensure that the semantic distance between the original and
neural network (DNN) that can act as a guide for training reconstructed images is small enough while maintaining a
the semantic encoder and decoder. However, it is important considerable semantic distance between the reconstructed and
to note that this solution introduces additional system cost irrelevant images for better discrimination in the downstream
and complexity, which should be carefully considered in a task.
practical system. Moreover, if the chosen architecture of the To address the second question, we introduce the con-
DNN differs from that of the pre-trained downstream model, cept of semantic re-encoding (SemRE), which is inspired by
performance degradation may occur, which poses a challenge the information bottleneck theory that only useful semantic
to this solution. information is allowed to pass through the semantic encoder.
3) How to Make a Good Balance Between Preserving When the semantic information is initially used for data
Semantic Information and Preserving Intricate Details: It reconstruction, the reconstructed data, when passed through
is vital to keep a good balance between preserving impor- the semantic encoder again, should ideally acquire the same
tant semantic information and retaining intricate details for semantic information as the initial one, which can be defined
a well-designed semantic communication system. When the as idempotence in semantic communication. Therefore, the
bandwidth resource is limited, the system may prefer to trans- key design in SemRE is to deploy a semantic encoder at
mit the semantic information over intricate and fine-grained the receiver, which is copied from the one in the transmitter,
details, and should progressively increase the level of detail and use it to guide the training of the semantic encoder and
as the bandwidth availability improves. However, existing decoder.
approaches overemphasize semantic information, which results Furthermore, we introduce training strategies to address the
in a loss of detail when the bandwidth resource is sufficient. third question in the context of our semantic communication
system. In particular, inspired by the approach presented
B. Motivation and Contributions in [12], we introduce a loss function that includes both obser-
To address the above questions, we introduce the contrastive vation loss and semantic level loss, with a hyper-parameter
learning (CL)-based semantic communication system. We start that controls the trade-off between these two components.
with the first question by taking into account the inherent We also design a fine-tuning approach for situations with

Authorized licensed use limited to: SRM University Amaravathi. Downloaded on December 23,2024 at 05:43:43 UTC from IEEE Xplore. Restrictions apply.
6330 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 72, NO. 10, OCTOBER 2024

an available downstream model, which aims to improve the II. R ELATED W ORKS
inference performance of the downstream task.
Finally, simulations are performed to demonstrate the supe- A. Basics on Theory of Semantic Communication
riority of the proposed approaches over competing approaches. The authors in [29] defined the semantic information car-
Without losing generality, we follow the concept of semantic ried by a sentence in terms of logical probability during
communication in [9] and [10] and focus on the specific tasks transmission. Building on Shannon and Weaver’s theory, the
of image reconstruction and image classification at the receiver authors in [8] introduced the concept of a semantic channel
like [12]. In this context, we no longer pay attention to the and proposed a model-theoretic approach to reliable semantic
typical metric of the technical level of communication such communication. In [30], the communication scenario between
as bit error ratio (BER) and symbol error rate (SER). Instead, two intelligent beings was discussed, and the theoretical
we evaluate the system performance based on the effective- formulation of the goals of semantic communication was
ness of the received semantic information, using intrinsic presented to demonstrate the necessity of semantic commu-
task-related metrics such as image quality and inference accu- nication. Subsequently, G. Guler et al. investigated a semantic
racy. We compare the proposed approaches with the advanced communication framework by considering the meanings of
semantic communication system in [12] and [28], as well transmitted codewords over a noisy channel, and optimized
as the classical digital communication system. Simulation the end-to-end average semantic error using a Bayesian
results show that the proposed approaches can achieve leading approach [31]. Based on the aforementioned works, the con-
accuracy performance in the downstream task under a range cept of semantic information theory [32], [33], [34], [35],
of bandwidth compression ratios, and demonstrate remarkable [36] has attracted increasing research interest in recent years,
adaptability to both AWGN and Rayleigh fading channels providing the theoretical foundation for the development of
with different noise levels, and also make a good trade-off semantic communication in various directions.
between the image reconstruction quality and inference
performance.
The main contributions of this paper are summarized as
B. Transmission Strategy of Semantic Communication
follows,
• We propose the SemCC approach, which integrates the With the rapid growth of deep learning technology,
concept of CL into semantic communication. By utilizing researchers have started to explore the deployment of semantic
wireless transmission as a form of data augmentation in communication system with the help of powerful semantic
CL, SemCC ensures minimal semantic distance between extraction provided by deep learning. In this direction, the
original and reconstructed images while maintaining dis- authors in [28], [37], and [38] proposed a deep learning
crimination against irrelevant images. based joint source-channel coding (DeepJSCC) for image
• We introduce the SemRE approach, which uses a copy data, where the encoder and decoder were designed based on
of the semantic encoder deployed at the receiver to guide autoencoder and jointly optimized for semantic information
the entire training process when the downstream model transmission to achieve a good image reconstruction quality.
is inaccessible. Then, the works in [39], [40], and [41] extended DeepJSCC
• We design the training procedure for SemCC and SemRE to different channel conditions and improved the image recon-
approaches, respectively, to balance the semantic infor- struction quality under noisy channels. In addition, motivated
mation and intricate details. by generative models, some works incorporated generative
• We conduct simulations to demonstrate the superiority of adversarial networks (GANs) to further reduce bandwidth
our approaches over existing methods in terms of infer- consumption. For example, the authors in [42] and [43]
ence accuracy, across various bandwidth compression applied the GAN inversion methods [44] to regenerate the
ratios and channel conditions, and also obtain comparable image at the receiver, which leads to significant improve-
image reconstruction quality as the bandwidth compres- ments in communication efficiency. In [45], a joint semantic
sion ratio is improved. In particular, our approaches encoding-modulation system has been explored to facili-
achieve a significant accuracy improvement of up to 53% tate the deployment of semantic communication in practical
on the CIFAR-10 dataset with a bandwidth compression networks.
ratio of 1/24. For text and speech data, the work in [46] extended
DeepJSCC to reduce the BER while preserving the semantic
C. Structure information in sentences. Leveraging the transformer archi-
The rest of this paper is organized as follows. In Section II, tecture, the authors in [47] and [48] proposed a semantic
we provide an overview of related work on semantic communication approach for text, achieving a high semantic
communication, including its theoretical foundations and similarity between transmitted and received sentences. Guo
practical applications. Section III introduces the system et al. [49] explored the ability of pre-trained large language
model of semantic communication. In Section IV and V, model (LLM) such as ChatGPT to extract semantic infor-
we present the implementation details of the proposed SemCC mation by introducing a cross-layer manager, thus achieving
and SemRE approach, respectively. Simulation results are lower semantic loss under limited bandwidth. In addition, the
provided in Section VI. Finally, we conclude this work work in [50] and [51] explored the semantic communication
in Section VII. system for speech signals to reduce perceptual distortion.

Authorized licensed use limited to: SRM University Amaravathi. Downloaded on December 23,2024 at 05:43:43 UTC from IEEE Xplore. Restrictions apply.
TANG et al.: CONTRASTIVE LEARNING-BASED SEMANTIC COMMUNICATIONS 6331

C. Application of Semantic Communication where s is the channel input signals that meets the power
Not limited to data reconstruction at the receiver, semantic constraint, and ∗ denotes the conjugate transpose. Next, s is
communication has been applied to various application sce- transmitted over the noisy channel, where both the additive
narios to support the downstream task. In the work of [52], Gaussian white noise (AWGN) channel and Rayleigh fading
[53], and [54], semantic communication is used to transmit the channels are considered in this paper. Specifically, for the
output of the mid-layer of a neural network (NN) to reduce AWGN channel, the received signals can be expressed as
the inference latency with the help of an edge server. The ŝ = s + ϵ, (3)
authors in [12] proposed a collaborative training framework
for semantic communication, where users could train their where ŝ is the received signals, and ϵ ∈ CN (0, σ 2 I) denotes
semantic encoder to improve the performance of downstream the additional noise sample. In the case of Rayleigh fading
vision inference tasks under limited bandwidth. Moreover, channels, the received signals ŝ is given by
in [55] and [56], the authors used semantic communication
ŝ = H · s + ϵ, (4)
to support the Visual Question Answering (VQA) task by
extracting and transferring the semantic information from where H is the channel parameter and we assume that H can
the correlated multimodal data. The authors in [57] applied be perfectly estimated through some pilot signals.
semantic communication in the UAV network, which enables At the receiver, the semantic decoder will be used to
efficient on-the-fly scene classification. Yang et al. [58] also reconstruct the original image x̂ ∈ Rnc ×nh ×nw from the
introduced semantic communication into the complex vehic- corrupted ŝ according to
ular networks, and jointly optimized the energy efficiency
and semantic transmission reliability to support green V2V x̂ = Dθ2 (ŝ), (5)
communication. The authors in [59] integrated semantic com- where Dθ2 (·) denotes the semantic decoding operation param-
munication in mobile edge computing (MEC) network to eterized by θ2 . Subsequently, x̂ will be used to exert the
support the efficient communication between the edge server downstream task and obtain the inference results through the
and the user equipment (UE), which helps reduce the energy following process
consumption.
f x = Fϕb1 (x̂), (6)
III. S YSTEM M ODEL
where Fϕb1 (·) characterized by parameter ϕ1 denotes
This paper investigates a semantic communication sys- the feature extraction operation performed by the back-
tem, where an NN-based semantic encoder and decoder are bone of the pretrained downstream model, and f x =
deployed in the transmitter and receiver, respectively. More {f (1) , f (2) , · · · f (C) } is the output feature map with C chan-
specifically, we focus on the wireless image transmission in nels. The inference result ŷ can be obtained by passing f x
this paper, and use x ∈ Rnc ×nh ×nw to denote the transmitted to the classifier Fϕcls (·) with parameter ϕ2 , which can be
2
source image, where nc , nh , and nw correspond to the number expressed as
of channels, height, and width of the image, respectively. To
simplify, let n = nc × nh × nw stand for the input dimension ŷ = Fϕcls
2
(f x ). (7)
of x.
From the above description, we can see that the semantic
The transmission process begins with the semantic encod-
encoder and decoder play a key role in semantic communi-
ing, which is used to extract the semantic information of x
cation. Moreover, preserving the semantic information in the
and directly realize the non-linear mapping from semantic
reconstructed image is crucial for the inference performance,
information into the k-dim complex-valued vector s̃ ∈ Ck ,
especially when the channel bandwidth is limited. Therefore,
given by
the architecture and training procedure of the semantic encoder
s̃ = Eθ1 (x), (1) and decoder require careful design.
where Eθ1 (·) represents the semantic encoding operation with
parameter θ1 . At this stage, it is important to consider the IV. CL BASED S EMANTIC C OMMUNICATION
relationship between the output dimension k and the input In this section, we will introduce the proposed CL-based
dimension n in the context of the bandwidth constraint. Typ- semantic communication framework. Specifically, we will first
ically, k < n should be satisfied to the bandwidth constraint, present the architecture of the semantic encoder and decoder,
where k/n is referred to as the bandwidth compression ratio. and then provide the details of SemCC and its training
In particular, a large bandwidth compression ratio indicates procedure.
a favorable communication condition, while a small one
indicates a limited use of bandwidth. In addition, a power A. Architecture of Semantic Encoder and Decoder
normalization layer [28] is used at the end of the semantic
coding network to satisfy the average power constraint of P The architecture of the proposed semantic encoder and
at the transmitter, given by decoder is presented in Fig. 1. The semantic encoder consists
of a 5×5 head convolution, two downsampling modules, and a
√ s̃
s = kP √ ∗ , (2) channel coding module. Each downsampling module contains
s̃ s̃ a basic block in ResNet [60] (we call it ResBlock) to capture

Authorized licensed use limited to: SRM University Amaravathi. Downloaded on December 23,2024 at 05:43:43 UTC from IEEE Xplore. Restrictions apply.
6332 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 72, NO. 10, OCTOBER 2024

from features to semantics. Specifically, the projected results


of f x and f x̂ can be represented as q x = Pψ (f x ) and
q x̂ = Pψ (f x̂ ), respectively, where q x is referred to as the
anchor, and q x̂ is called the positive. We can apply the widely
used cosine similarity between anchor and positive to define
the semantic distance between x and x̂ since it is suitable
for comparing the similarity between points in such high-
dimensional space. It is notable that when we focus on x̂,
we can regard it as the anchor, and x as the positive instead.
For the remaining samples m ∈ B/{x} within training
batch B, the same procedure will be followed. Specifically,
we can obtain the feature map f m = Fb (m) and f m̂ =
Fig. 1. Network architecture of the semantic encoder and decoder. Fb (m̂) by feeding m and m̂ into the backbone of pretrained
downstream model respectively. Then, we project them into
the semantic space using Pψ (·), where q m and q m̂ are
the spatial feature of the image, and a 4 × 4 convolution with referred to as the negative of x and x̂, respectively. Similarly,
stride 2 to downsample the image. The channel coding module the semantic distance among them can be defined as the cosine
is used to mitigate channel corruption and output the k-dim similarity between anchor and negative.
complex-valued channel input that satisfies the bandwidth and To simply the expression, we define B ∗ as the augmented
power constraints. version of B, which comprises both of the original samples
Furthermore, we adopt a symmetric architecture in the from B and the reconstructed ones, and |B ∗ | = 2|B| is
decoder, which consists of a 5 × 5 head convolution, two satisfied. We also define x∗ as the positive of x ∈ B ∗ . The
up-sampling modules, and an image coding module. In the objective of SemCC is to minimize the semantic distance
upsampling module, ResBlocks are also used as in the encoder, between the original and reconstructed images while maxi-
and we adopt the pixel shuffle technology [61] to upsample mizing the semantic distance among the original image and
the image, as it can provide a more efficient computing the irreverent images. Therefore, we can use the InfoNCE
paradigm and better reconstruction performance compared to function [13] to define the semantic contrastive loss, which
the transposed convolution used in [28]. The image coding can be expressed as
module consists of a 3 × 3 convolution followed by the  
exp(q x · q x∗ /τ )
sigmoid activated function to produce the reconstructed image. Lsem = Ex∈B∗ − log P , (8)
Notably, the batch normalization and parametric rectified m∈B∗ /{x} exp(q x · q m /τ )
linear unit (PReLU) activated function are followed with all where τ > 0 is the temperature coefficient used to smooth
convolutions if not specified. the probability distribution. Next, we will introduce how to
take into account the SemCC and semantic contrastive loss to
B. Semantic Contrastive Coding design the loss function and training procedure.
The details of the proposed SemCC are shown in Fig. 2.
The process begins with the semantic encoding and decoding C. Loss Function and Training Procedure
for a typical image x in a training batch B, where we can Based on the SemCC, we design a two-stage training
obtain the reconstructed x̂. We use this process to replace strategy to optimize the semantic encoder and decoder. The
the conventional data augmentation procedure in CL, and first stage is pre-training, where we use the SemCC approach
we regard x̂ as an augmented sample of x. The backbone to train the weights of the encoder θ1 , the decoder θ2 , and
of pretrained downstream mode Fϕb1 (·) is applied to x and the projection network ψ simultaneously. Since it is difficult
x̂, which generates the feature maps f x = Fϕb1 (x) and to achieve a fast convergence speed when we only optimize
f x̂ = Fϕb1 (x̂), respectively. Next, a fully connected projection the semantic contrastive loss, we combine the semantic con-
network Pψ (·) with learnable parameter ψ followed by a trastive loss with the reconstructed loss between x and x̂,
normalization operation maps the features into the semantic since reducing the reconstructed loss can help improve the
space defined as a hypersphere,1 where samples are repre- convergence speed in the early training rounds. Specifically,
sented as tensors based on their semantic information in this we use the Mean Square Error (MSE) function to evaluate
space. As mentioned earlier, samples with similar semantic the reconstruction loss for the training batch B, which can be
information are close together, while those with different expressed as
semantic information are farther apart in this space.  
During the training stage, Pψ (·) can be updated to enhance 1
Lrec = Ex∈B ||x − x̂||22 . (9)
the understanding of features, thereby learning the mapping n

1 The output of the projection network is typically represented as tensors, Then, the loss function in the first training stage can be
which can be straightforwardly normalized into a unit hypersphere. This summarized as the linear combination, given by
approach is widely used in the domain of representation learning, as it can
help improve training stability. More details about this can be found in [62]. L1 = α1 Lrec + (1 − α1 )Lsem , (10)

Authorized licensed use limited to: SRM University Amaravathi. Downloaded on December 23,2024 at 05:43:43 UTC from IEEE Xplore. Restrictions apply.
TANG et al.: CONTRASTIVE LEARNING-BASED SEMANTIC COMMUNICATIONS 6333

Fig. 2. Illustration of the proposed SemCC.

where α1 ∈ [0, 1] is a hyperparameter that controls the tradeoff Algorithm 1 SemCC Training Procedure
between the two parts of the loss function. For example, // Training Stage 1: Pre-training
we can set α1 = k/n in the practical semantic communication 1 for epoch ← 1 to Nepochs do
system. In this context, the system prioritizes the preservation 2 Sample a batch B from the dataset;
of semantic information over the reconstructed quality when 3 The transmitter encodes each x ∈ B with Eθ1 (·);
the bandwidth compression ratio is small. In contrast, as the 4 The receiver decodes and obtains x̂ with Dθ2 (·);
bandwidth compression ratio increases, the system shifts its 5 Extract feature maps f x and f x̂ using Fϕb1 (·) for
focus to preserving the reconstructed quality. x ∈ B;
In the second training stage, we aim to further optimize the 6 Project feature maps to semantic space using pψ (·);
performance of the semantic communication system by jointly 7 Calculate reconstruction loss Lrec using (9);
fine-tuning the encoder and decoder with a small learning rate 8 Calculate semantic contrastive loss Lsem based
to achieve significant inference performance and reconstructed on (8);
image quality, especially when the bandwidth compression 9 Calculate combined loss L1 based on (10);
ratio is low. The loss function of this stage can be expressed 10 Update θ1 , θ2 , and ψ.
as 11 end
L2 = α2 Lrec + (1 − α2 )LT ask , (11) // Training Stage 2: Fine-tuning
12 for epoch ← 1 to NFine-tuning do
where α2 ∈ [0, 1] is a hyper-parameter like α1 and LT ask
13 Sample a batch B from the dataset;
is the loss function of the downstream task. Specifically,
14 The transmitter encodes each x ∈ B with Eθ1 (·);
when the downstream task is a classification problem, the
15 The receiver decodes and obtains x̂ with Dθ2 (·);
cross-entropy function can be employed to model the loss,
16 Extract feature maps f x̂ using Fϕb1 (·) for x̂;
given by
17 Send the feature map to the classifier Fϕcls
2
(·);
Ncls
Calculate reconstruction loss Lrec using (9);
 
1 X 18
LT ask = Ex∈B − yx,i log(ŷx,i ) , (12) Calculate loss of the downstream task LT ask based
Ncls i=1 19
on (12);
where yx,i and ŷx,i represent the ground-truth and the pre- 20 Calculate combined loss L2 based on (11);
dicted probability of the i-th class, respectively, and notation 21 Update θ1 and θ2 .
Ncls denotes the number of classes in the dataset. The whole 22 end
training procedure is summarized in Algorithm 1, where
Nepochs and NFine-tuning represent the number of training and
fine-tuning epochs, respectively.
are not accessible. Specifically, we will introduce an alternative
V. S EMANTIC R E -E NCODING W ITH I NACCESSIBLE approach, namely SemRE, to address this issue, and then
D OWNSTREAM M ODEL present a soft update paradigm for the semantic encoder. After
In this section, we will discuss a more general scenario that, we will provide the updated loss function and the training
where the architecture and weights of the downstream network procedure.

Authorized licensed use limited to: SRM University Amaravathi. Downloaded on December 23,2024 at 05:43:43 UTC from IEEE Xplore. Restrictions apply.
6334 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 72, NO. 10, OCTOBER 2024

Fig. 3. Illustration of the proposed SemRE.

A. Semantic Re-Encoding for an ideal semantic communication system, which can be


characterized by idempotence in semantic communication, and
When the weights of the pre-trained downstream model are ensures consistency in semantic information across encoding
not accessible (i.e. the pre-trained downstream is black-box), and decoding stages.
we cannot use its pre-trained backbone to extract features We then provide a detailed description of the SemRE and
and subsequently map them to the semantic space, as back- the modification of the semantic contrastive loss. As shown in
propagation cannot be performed. A simple straightforward Fig. 3, in contrast to SemCC, where both the original x and
solution is to initialize a DNN model randomly and pre-train x̂ are fed into the backbone of the pre-trained downstream
it using the label information. This pre-trained random model model, the proposed SemRE approach only needs to perform
can then guide the training of the proposed SemCC or re-encoding at the receiver, since we have already obtained
DeepSC. However, this approach introduces additional sys- the encoding results at the sender. Let s̃r = Eθ1r (x̂) denote
tem overhead and training latency. Moreover, if the chosen the re-encoding operation at the receiver, where θ1r represents
architecture of the random DNN differs from that of the the parameters of the re-encoder. In particular, θ1r is updated
pre-trained downstream model, it may result in performance based on θ1 and we will introduce how this update is achieved.
degradation. Therefore, we propose to use only the label After that, the power normalization in (2) is used to obtain the
information to train the semantic encoder and decoder, which results sr .
provides a complementary technique between DeepSC and Next, we can use s and sr to perform CL. Specifically,
the proposed SemCC. Specifically, we propose to use the similar to SemCC, SemRE still uses projection meshes. For
semantic encoder to re-encode the reconstructed image instead simplicity, we use q x and q x∗ to denote the anchor and
of the feature extraction operation. This is motivated by positive projection results. To obtain the negative, we select
the concept of information bottleneck [63] in deep learning, samples from the same batch and use cosine similarity to
where the network acts as a bottleneck and the only useful evaluate the semantic distance. It is important to note that
information from the input is agreed to pass through itself. we do not consider all remaining samples within the same
In other words, the unimportant information is filtered out in batch as negative, because the ability of semantic extraction
the process. In essence, the ideal semantic encoder should in this scenario is limited and we cannot obtain rich and
play such a role, i.e., only the semantic information can fine-grained semantic information due to the lack of the pre-
be retained after semantic encoding, and the task-irrelevant trained downstream model. Blindly pulling samples away from
information is removed. Therefore, when the reconstructed each other within the same class would degrade the system
sample at the receiver is fed back to the semantic encoder, performance. In other words, it is advisable to consider the
the output should resemble the previous encoding results semantic similarities between different samples belonging to

 
1 X exp(q x · q x∗ /τ )
Lsem,R = Ex∈B∗ − log P . (13)
|Sx | ∗ m∈B∗ /{Sx } exp(q x · q m /τ )
x ∈Sx

Authorized licensed use limited to: SRM University Amaravathi. Downloaded on December 23,2024 at 05:43:43 UTC from IEEE Xplore. Restrictions apply.
TANG et al.: CONTRASTIVE LEARNING-BASED SEMANTIC COMMUNICATIONS 6335

the same class. For these reasons, we adopt a supervised Algorithm 2 Semantic Re-Coding Training Procedure
method in [17]. Specifically, the label of each sample is used to 1 for epoch ← 1 to Nepochs do
facilitate CL, and we define Sx as the positive set containing 2 Sample a batch B from the dataset;
samples belonging to the same class with x. Thus, we can 3 The transmitter encodes each x ∈ B with Eθ1 (·);
derive the semantic contrastive loss of the SemRE approach 4 The receiver decodes and obtains x̂ with Dθ2 (·);
as in (13), shown at the bottom of the previous page. Then we 5 The receiver re-encodes x̂ with Eθ1r (·);
can use (13) to replace (10) to perform the gradient descent. 6 Project the encoded results from both of the
transmitter and receiver to semantic space using
B. Soft Update Approach pψ (·);
In the training process of the proposed SemRE approach, 7 Calculate reconstruction loss Lrec based on (9);
the semantic encoder plays a key role. It first extracts the 8 Calculate semantic contrastive loss Lsem,R based
semantic information and encodes it before transmission. on (13);
Then, at the receiver, it extracts the semantic information again 9 Calculate combined loss L1 based on (10);
to evaluate the quality of the received semantic information. 10 Update θ1 , θ2 and ψ using SGD;
However, training such a semantic encoder is challenging in 11 if epoch mod Nupdate = 0 then
practice. Because, it is a self-guided process, i.e., the semantic 12 Update θ1r ← βθ1r + (1 − β)θ1
encoder evaluates its own performance and the weights of the 13 end
semantic encoder are also updated dynamically. This makes 14 end
the semantic encoder at the receiver fail to provide stable
evaluation, which complicates the optimization of the entire
process. For the network environment, we set the transmit power
Inspired by weight update strategies in deep reinforcement to unity and the transmit SNR to 20dB and 5dB for normal
learning, as exemplified by deep Q-Networks (DQN) [64] and and noisy environments, respectively. In addition, we assume
deep deterministic policy gradients (DDPG) [65], we propose that the receiver can estimate the channel parameters perfectly
a soft update approach for the semantic encoder to address in the case of Rayleigh channel. We compared the proposed
these challenges. This approach decouples the evaluation and approaches with the advanced DL-based semantic communi-
update steps in the training process. Specifically, the semantic cation approaches, which are listed as follows,
encoder at the receiver does not update its weights after each • SemCC: The proposed CL-based semantic communica-
training batch, as it does at the transmitter. Instead, its weights tion approach, where the pre-trained backbone of the
are updated periodically to achieve better training stability. The downstream task is used in the training process.
detailed soft-update approach can be expressed as • SemRE: The proposed SemRE strategy and no

θ1r ← βθ1r + (1 − β)θ1 , (14) pre-trained backbone is adopted in this case.


• DeepJSCC [28]: Deep learning-based source-channel
where β ∈ [0, 1] is a hyper-parameter which controls the joint coding that maps the original input to the channel
update magnitude. We finally summarize the whole training input through the structure of an autoencoder.
process of SemRE as shown in Algorithm 2, where Nupdate is • DeepSC [12]: The SOTA deep learning-based seman-
the update interval. tic communication framework to support downstream
inference task. DeepSC trains the semantic encoder and
VI. S IMULATIONS decoder with both semantic loss provided by the whole
A. Simulations Settings pre-trained ResNet-20 and observation loss in (11) to
achieve efficient semantic information transmission. Note
To verify the effectiveness of the proposed framework, that the hyper-parameter α2 is set to the same as it in the
we conduct experiments on CIFAR-10, which contains 60,000 fine-tuning stage of the proposed approaches.
32 × 32 color images divided into 10 classes. The training
For fair comparison, the architectures of the encoders and
set contains 50,000 images, while the test set contains 10,000
decoders in these approaches are set to be the same, and the
images. A pre-trained ResNet-20 [60] is used as the backbone
network environment settings are kept consistent across all
and classifier of the downstream model for inference.2
experiments, if not specified.
The projection network adopts a two-layer fully connected
Moreover, we also compare the performance of the proposed
structure with an output dimension of 32. The number of
approaches with conventional digital communication using
training epochs for the pre-training and fine-tuning is set to
separate source and channel coding under the same bandwidth
300 and 50, respectively, with a batch size of 128. We also
compression ratio. For the source coding, we leverage the
use the Adam optimizer with a learning rate of 0.001 for the
SOTA image compression algorithm named better portable
first pre-training stage and 0.0001 for the second fine-tuning
graphics (BPG),3 which is based on the intra-frame encoding
stage. These learning rates are adjusted every 50 epochs with
approach of the high-efficiency video coding (HEVC, aka
a decay factor of 0.5.
H.265) standard. As for the channel coding, we integrate
2 The pre-trained weights can be found at https://github.com/chenyaofo/
pytorch-cifar-models. 3 https://bellard.org/bpg/

Authorized licensed use limited to: SRM University Amaravathi. Downloaded on December 23,2024 at 05:43:43 UTC from IEEE Xplore. Restrictions apply.
6336 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 72, NO. 10, OCTOBER 2024

and BPG+Capcacity, respectively. In addition, the SemRE


approach is also superior to DeepJSCC, achieving an accuracy
gain of up to 25% at a bandwidth compression ratio of
1/24. It is important to note that both approaches do not
use a pre-trained backbone during the training process. These
results suggest that the proposed approaches can effectively
extract semantic information to meet the requirements of the
downstream task and remove irrelevant redundant information
to ensure that the semantic information can be successfully
transmitted. This is particularly beneficial in scenarios where
the channel bandwidth is limited.
Fig. 5 presents the peak signal-to-noise ratio (PSNR) com-
parison of the proposed SemCC and SemRE and the four
complementary approaches, where the SNR is set to 20dB
and the bandwidth compression ratio varies from 1/24 to
1/2.5. For the digital communication system, the combination
of 3/4 rate LDPC and 64QAM is used. As shown in the
figure, we can see that as the bandwidth compression ratio
increases, the PSNRs of all the approaches improve and the
Fig. 4. Test Accuracy versus the bandwidth compression ratio under AWGN conventional approach performs best when the bandwidth
channel, where SNR is 20dB.
compression ratio is larger than 1/3. Although SemCC and
SemRE sacrifice some image quality to prioritize semantic
information when the bandwidth compression ratio is low,
LDPC code configured according to the IEEE 802.16E stan-
they can quickly catch up with the PSNR of DeepJSCC at
dard (Mobile WIMAX), where the block length of 2304 and
higher compression ratios. Specifically, the proposed SemCC
rates of 1/2, 2/3 and 3/4 are adopted in our simulations.
achieves a PSNR of 38.31dB, which is close to the 39.07dB of
In addition, we use the quadrature amplitude modulation
DeepJSCC, and outperforms DeepSC with 37.11dB when the
(QAM) with orders of 4, 16 and 64. Notably, we only report
bandwidth compression ratio is 1/2.5. Moreover, the SemRE
the results of the optimal combination of LDPC rates and
approach achieves the same PSNR performance as the Deep-
modulation schemes for simplicity.
JSCC when the bandwidth compression ratio is greater than
In further, we present the upper bound performance of the
1/6. These results indicate that the proposed SemCC and
digital communication approach, denoted as BPG+Capacity,
SemRE approaches can prioritize the transmission of semantic
which realizes capacity-achieving transmission based on Shan-
information over irrelevant background information to ensure
non theorem for a given transmit SNR, with the assumption
the performance of the downstream task in bandwidth-limited
of error-free transmission. Hence, practical digital transmission
scenarios, and meanwhile transmit enough background infor-
schemes incorporating channel coding and modulation can not
mation to obtain good image quality when the bandwidth
outperform this upper bound.
is not a bottleneck. These results further demonstrate the
effectiveness of the proposed approaches.
B. Effectiveness Fig. 6 and Fig. 7 show the performance comparison of
Fig. 4 compares the accuracy performance of DeepJSCC, several approaches under low SNR in terms of accuracy
DeepSC, the conventional digital communication and the and PSNR, respectively. Specifically, both figures consider
proposed approaches including SemCC and SemRE under a low SNR of 5dB, and the bandwidth compression ratio
AWGN channel where the SNR is set to 20dB and the varies from 1/24 to 1/2.5. Moreover, 3/4 rate LDPC and
bandwidth compression ratio k/n varies from 1/24 to 1/2.5. 4QAM is used in this case. From Fig. 6, we can see that the
For the digital communication system, the combination of proposed SemCC still shows superiority in terms of accuracy
3/4 rate LDPC and 64QAM is used. This figure clearly compared to the competitive ones, indicating its robustness in
shows that the proposed SemCC consistently outperforms the low SNR scenarios. From Fig. 7, we can find that the proposed
compared ones in terms of accuracy. In particular, when the approaches can adaptively sacrifice the global information
compression ratio is 1/2.5, all approaches can transmit rich to obtain comparable semantic performance when the band-
semantic information to support the downstream task, resulting width compression ratio is low, and meanwhilebtain sufficient
in high accuracy levels of about 92.3%. As the bandwidth reconstructed quality in terms of PSNR as the bandwidth
compression ratio decreases, the proposed SemCC still main- compression ratio increases. These results in both figures
tains a comparable accuracy performance. For example, the further verify the effectiveness and robustness of the proposed
proposed SemCC can achieve accuracy levels of 89.85% and approaches in low SNR scenarios.
88.81% at bandwidth compression ratios of 1/12 and 1/24, To further evaluate the performance of several approaches,
respectively, which outperforms DeepSC by about 2% at the we perform a comparison under Rayleigh fading channels.
corresponding bandwidth compression ratios and also shows Fig. 8 shows the accuracy performance where the SNR
an accuracy gain of up to 40% and 26% over DeepJSCC is 20dB and the bandwidth compression ranges from 1/24

Authorized licensed use limited to: SRM University Amaravathi. Downloaded on December 23,2024 at 05:43:43 UTC from IEEE Xplore. Restrictions apply.
TANG et al.: CONTRASTIVE LEARNING-BASED SEMANTIC COMMUNICATIONS 6337

Fig. 5. PSNR versus the bandwidth compression ratio under AWGN channel, Fig. 7. PSNR versus the bandwidth compression ratio under AWGN channel,
where SNR is 20dB. where SNR is 5dB.

Fig. 6. Test Accuracy versus the bandwidth compression ratio under AWGN Fig. 8. Test accuracy versus the bandwidth compression ratio under Rayleigh
channel, where SNR is 5dB. fading channels, where SNR is 20dB.

to 1/2.5. For the digital communication system, we utilize and DeepSC, respectively, and even outperforms the upper
a combination of 2/3 rate LDPC and 16QAM. From this bound performance of the digital communication approach.
figure, we can see that all approaches experience performance Moreover, the SemRE also shows its superiority in exploiting
degradation under Rayleigh fading channels compared to the the Rayleigh fading characteristics compared to DeepJSCC
AWGN channel. However, the proposed SemCC still achieves and BPG+Capacity, achieving accuracy gains of up to 30.31%
a leading level of accuracy. Specifically, at a bandwidth and 10.27%, respectively. These results further demonstrate the
compression ratio of 1/2.5, the proposed SemCC achieves an effectiveness of the proposed SemCC and SemRE approaches
accuracy of 91.21%, which is approximately 1% higher than under Rayleigh fading channels.
DeepJSCC and 0.3% higher than DeepSC. As the bandwidth Fig. 9 shows the PSNR comparison under Rayleigh fading
compression ratio decreases, the proposed SemCC demon- channels with an SNR of 20dB. For the digital communication
strates its adaptability to Rayleigh fading channels and still system, a combination of 2/3 rate LDPC and 16QAM is
achieves an accuracy of about 90% when the bandwidth used. From this figure, we can see that both the proposed
compression ratio ranges from 1/4 to 1/24, which achieves SemCC and SemRE approaches prioritize the transmission
accuracy gains of up to 46.08% and 2.62% over DeepJSCC of semantic information at low bandwidth compression ratios

Authorized licensed use limited to: SRM University Amaravathi. Downloaded on December 23,2024 at 05:43:43 UTC from IEEE Xplore. Restrictions apply.
6338 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 72, NO. 10, OCTOBER 2024

Fig. 9. PSNR versus the bandwidth compression ratio under Rayleigh fading Fig. 10. Test Accuracy versus the bandwidth compression ratio under
channels, where SNR is 20dB. Rayleigh fading channels, where SNR is 5dB.

while preserving enough detail to improve image quality as the


bandwidth compression ratios increase. It is noteworthy that
the SemRE approach achieves a superior PSNR performance
compared to DeepJSCC when the bandwidth compression
ratio is greater than 1/6, and the proposed SemCC also outper-
forms DeepJSCC and the conventional digital communication
approach when the bandwidth compression ratio is 1/2.5. This
can be attributed to the introduced CL and the approach
of replacing the data augmentation with a practical wireless
channel, which helps to mitigate the effect of Rayleigh fading
channels. In addition, the presence of rich semantic informa-
tion plays a crucial role in image reconstruction at the receiver.
Fig. 10 and Fig. 11 show the performance comparison
under Rayleigh fading, and the SNR is set to 5dB. In this
scenario, the characteristics of fading and channel noise
pose even greater challenges to the semantic communication
system. From Fig. 10, we can see that the test accuracy
of all approaches deteriorates significantly. Compared to the
competing approaches, the accuracy gains of the proposed Fig. 11. PSNR versus the bandwidth compression ratio under Rayleigh fading
SemCC increase with a smaller bandwidth compression ratio, channels, where SNR is 5dB.
and it outperforms DeepSC and DeepJSCC by up to 18.65%
and 57%, respectively. Moving to Fig. 11, we find that the
SemRE approach still achieves superior image quality in terms channels, where the BCR is set to 1/24 and SNR is set to 5dB.
of PSNR compared to other approaches when the bandwidth As shown in Fig. 12, the test accuracy of the proposed SemCC
compression ratio is larger than 1/6. It achieves a gain of up to and DeepSC decreases as the PSNR increases, which indicates
1.5 dB over DeepJSCC. Moreover, the proposed SemCC also that both the proposed SemCC and DeepSC can balance the
outperforms DeepJSCC when the bandwidth compression ratio trade-off between the semantic information and image quality.
is 1/2.5. These results further demonstrate the effectiveness However, the test accuracy of DeepSC is more sensitive to the
of the proposed approaches in overcoming the challenges of trade-off parameter, while the proposed SemCC can maintain
channel environments and successfully balancing the semantic a higher accuracy level across a broad range of trade-off
information and image details according to the bandwidth parameters and the corresponding PSNR values. Specifically,
compression ratio. the proposed SemCC achieves the test accuracy of 87.13%
Next, we vary the trade-off parameters in (10) and (11) and 86.71% when the PSNR is about 15.87dB and 16.38dB,
across a broad range to adjust the PSNR value and observe the respectively, while DeepSC achieves the same-level accuracy
corresponding performance in terms of accuracy under AWGN with a lower PSNR value of 12-13dB. These results further

Authorized licensed use limited to: SRM University Amaravathi. Downloaded on December 23,2024 at 05:43:43 UTC from IEEE Xplore. Restrictions apply.
TANG et al.: CONTRASTIVE LEARNING-BASED SEMANTIC COMMUNICATIONS 6339

TABLE I
A BLATION S TUDY ON AWGN C HANNELS (SNR=5 D B)

Fig. 12. Test Accuracy on CIFAR-10 versus PSNR under AWGN channel Fig. 14. Test Accuracy on CIFAR-10 versus test SNR under Rayleigh
with SNR of 5dB, where the bandwidth compression ratio is set to 1/24. channel. The semantic encoder and decoder are trained at an SNR of 5dB,
and the lightweight ShuffleNet-V2 is utilized as the downstream model.

training as the baseline. We then remove the first stage of


CL pre-training (denoted as w/o. CL) and the second stage of
fine-tuning (denoted as w/o. FT), respectively, to evaluate their
individual impact. It is notable that for w/o. CL, we employ an
initial learning rate of 1×10−3 and then reduce it by a factor of
0.1 every 80 epochs to train the semantic encoder and decoder
from scratch, which can ensure the training performance. In
fact, SemCC degrades to DeepSC in this case. Specifically,
Table I provides the performance comparison, where AWGN
channel with SNR of 5dB is set and the bandwidth com-
pression k/n is set to 1/24, 1/12 and 1/6, respectively. From
this table, we can find that the baseline achieves the highest
accuracy and PSNR, while the baseline without fine-tuning can
still outperform the one without CL pre-training in most cases.
These results indicate that the gains of the proposed SemCC
mainly come from the CL pre-training, and the fine-tuning
also contributes to improved accuracy performance, especially
Fig. 13. Test Accuracy on CIFAR-10 versus test SNR under Rayleigh
channel. The semantic encoder and decoder are trained at an SNR of 5dB,
when the bandwidth compression ratio is 1/24.
and the high-performance RepVGG16 is utilized as the downstream model. Similar results of ablation studies on Rayleigh fading chan-
nels are presented in Table II, where SNR is set to 5dB. From
this table, we can observe that a larger gain is obtained by
demonstrate the effectiveness of the proposed SemCC in the the CL pre-training over training from scratch compared to
trade-off between semantic information and image quality. that under the AWGN channel, which further demonstrates the
We also conduct ablation studies, as shown in Table I and effectiveness of the proposed CL-based pre-training, as well
Table II. We consider our proposed SemCC with two-stage as the benefits of fine-tuning.

Authorized licensed use limited to: SRM University Amaravathi. Downloaded on December 23,2024 at 05:43:43 UTC from IEEE Xplore. Restrictions apply.
6340 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 72, NO. 10, OCTOBER 2024

Fig. 15. Visual comparison of the reconstructed image under AWGN channel with an SNR of 20dB and a bandwidth compression ratio of 1/48. The proposed
approaches effectively preserves the semantic information of the colorful macaws and removes the irrelevant background.

TABLE II
A BLATION S TUDY ON R AYLEIGH FADING C HANNELS (SNR=5 D B)

C. Robustness powerful deep model. From this figure, we can observe that
despite the mismatched SNR and the use of a more complex
To verify the robustness of the proposed SemCC and downstream model, the proposed approaches still demonstrate
SemRE approaches, we consider a more challenging scenario competitive test accuracy levels by providing enough semantic
where there is a mismatch between the training SNR and information to the downstream model and the ability to protect
the test SNR. In addition, the model architecture for the it from fading and noise. It is also important to note that
downstream task is different from the one used in the training the proposed SemRE, which utilizes only label information,
process. In particular, we consider the state-of-the-art (SoTA) demonstrates better performance compared to DeepSC when
RepVGG16 [66] and the lightweight ShuffleNet [67] as the the downstream model is unknown. These results suggest that
downstream models. RepVGG16 and ShuffleNet are more our semantic communication system maintains its robustness
powerful and less powerful, respectively, compared to the in the face of real-world variations.
ResNet-20 used in the training phase. Therefore, we can eval- In Fig. 14, we continue to evaluate the test accuracy under
uate the robustness of the proposed approaches by assessing Rayleigh fading channels, where we employ a lightweight
whether the preserved semantic information is general enough model architecture. Specifically, ShuffleNet [67] is employed
to work properly with a more powerful pre-trained model for the downstream model, and the bandwidth compression
downstream and whether it is sufficient and appropriate for ratio is set to 1/12 to simulate computational resource and
the lightweight model. We also provide the performance of bandwidth constraints. In this scenario, we observe a signifi-
the conventional digital communication system under the same cant degradation in the accuracy performance of DeepSC, as it
condition, where we use the serval combination of LDPC rates primarily emphasizes specific semantic information required
and modulation schemes to achieve the best performance. by the pre-trained backbone and shows its sensitivity to dif-
In Fig. 13, we present the test accuracy comparison under ferent model architectures of the downstream task. In contrast,
Rayleigh fading channels for different SNRs, where the our proposed approaches not only depend on the output
semantic encoder and the semantic decoder are both trained of the pre-trained backbone, but also take into account the
at the SNR of 5dB, and the bandwidth ratio is set to 1/6. intrinsic relationship among various samples, which provides
The model architecture for the downstream task is ResNet20 more general semantic information. As a result, the proposed
during the training phase, while in the test phase, we use approaches maintain competitive and robust test accuracy and
RepVGG16 [66]. This change may indicate an upgrade in the achieve gains of up to 40% over DeepSC. In further, the
GPU device of the receiver, which allows the use of a more proposed approaches can effectively mitigate the cliff effect

Authorized licensed use limited to: SRM University Amaravathi. Downloaded on December 23,2024 at 05:43:43 UTC from IEEE Xplore. Restrictions apply.
TANG et al.: CONTRASTIVE LEARNING-BASED SEMANTIC COMMUNICATIONS 6341

under various channel conditions. These results highlight the [4] E. Arikan, “Channel polarization: A method for constructing capacity-
robustness of our semantic communication system in scenarios achieving codes for symmetric binary-input memoryless channels,” IEEE
Trans. Inf. Theory, vol. 55, no. 7, pp. 3051–3073, Jul. 2009.
with limited computational and bandwidth resources. [5] K. B. Letaief, Y. Shi, J. Lu, and J. Lu, “Edge artificial intelli-
gence for 6G: Vision, enabling technologies, and applications,” IEEE
D. Visualization J. Sel. Areas Commun., vol. 40, no. 1, pp. 5–36, Jan. 2022, doi:
10.1109/jsac.2021.3126076.
We also provide visual comparisons of different approaches [6] X. D. Duan et al., “6G architecture design: From overall, logical
using the Kodak dataset in Fig. 15, where the encoder and and networking perspective,” IEEE Commun. Mag., vol. 61, no. 7,
pp. 158–164, Jul. 2023.
decoder are trained on the STL10 dataset, the SNR is 20dB, [7] C.-X. Wang et al., “On the road to 6G: Visions, requirements, key
and the bandwidth compression ratio is 1/48. From this figure, technologies, and testbeds,” IEEE Commun. Surveys Tuts., vol. 25, no. 2,
we can find that the quality of the reconstructed image pp. 905–974, 2nd Quart. 2023.
[8] J. Bao et al., “Towards a theory of semantic communication,” in Proc.
for DeepJSCC, DeepSC, and the proposed SemCC all get IEEE Netw. Sci. Workshop, Jun. 2011, pp. 110–117.
deteriorated in this large compression ratio, but the proposed [9] D. Gündüz et al., “Beyond transmitting bits: Context, semantics, and
SemCC can effectively preserve the semantic information. task-oriented communications,” IEEE J. Sel. Areas Commun., vol. 41,
For example, the proposed SemCC effectively preserves the no. 1, pp. 5–41, Jan. 2023.
[10] Z. Qin, X. Tao, J. Lu, W. Tong, and G. Ye Li, “Semantic communica-
semantic information of the colorful macaws, people, and tions: Principles and challenges,” 2021, arXiv:2201.01389.
rafters, and removes the irrelevant background. This is par- [11] Y. Shi, Y. Zhou, D. Wen, Y. Wu, C. Jiang, and K. B. Letaief, “Task-
ticularly beneficial in scenarios where the channel bandwidth oriented communications for 6G: Vision, principles, and technologies,”
IEEE Wireless Commun., vol. 30, no. 3, pp. 78–85, Jun. 2023, doi:
is limited and can explain the reasons for the superior perfor- 10.1109/MWC.002.2200468.
mance of the proposed SemCC in the downstream task. On the [12] H. Zhang, S. Shao, M. Tao, X. Bi, and K. B. Letaief, “Deep learning-
other hand, DeepJSCC treats all information as important and enabled semantic communication systems with task-unaware transmitter
and dynamic data,” IEEE J. Sel. Areas Commun., vol. 41, no. 1,
attempts to reconstruct the background, leading to the loss pp. 170–185, Jan. 2023.
of semantic information about colorful macaws and rafters. [13] T. Chen, S. Kornblith, M. Norouzi, and G. E. Hinton, “A simple
Although DeepSC can extract textual information, it still framework for contrastive learning of visual representations,” in Proc.
fails to preserve the colorful macaws and rafters, significantly 37th Int. Conf. Mach. Learn., vol. 119, 2020, pp. 1597–1607.
[14] X. Chen, H. Fan, R. Girshick, and K. He, “Improved baselines with
deteriorating image quality. These results further demonstrate momentum contrastive learning,” 2020, arXiv:2003.04297.
the effectiveness of the proposed approaches in preserving [15] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for
semantic information and removing irrelevant background unsupervised visual representation learning,” in Proc. IEEE/CVF Conf.
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 9726–9735.
information.
[16] T. Chen, S. Kornblith, K. Swersky, M. Norouzi, and G. E. Hinton, “Big
self-supervised models are strong semi-supervised learners,” in Proc.
VII. C ONCLUSION Adv. Neural Inf. Process. Syst. (Neural-IPS), 2020, pp. 1–13.
[17] P. Khosla et al., “Supervised contrastive learning,” in Proc. NIPS, 2020,
In this paper, we investigated a CL-based semantic com- pp. 18661–18673.
munication system. Our contribution was to introduce the [18] T. Gao, X. Yao, and D. Chen, “SimCSE: Simple contrastive learning of
concept of semantic contrastive loss, which provides a more sentence embeddings,” in Empirical Methods in Natural Language Pro-
cessing. Punta Cana, Dominican Republic: Assoc. Comput. Linguistics,
reasonable evaluation of semantic-level aspects during the Nov. 2021, pp. 6894–6910.
training process. Moreover, we modified the CL procedure [19] A. Radford et al., “Learning transferable visual models from natural
by replacing the traditional data augmentation with a prac- language supervision,” in Proc. Int. Conf. Mach. Learn., vol. 139, 2021,
tical wireless channel and proposed the SemCC approach, pp. 8748–8763.
[20] Y. Zeng et al., “CLIP2: Contrastive language-image-point pretraining
which allows us to comprehensively exploit the impact of from real-world point cloud data,” in Proc. IEEE/CVF Conf. Comput.
the channel on the transmission of semantic information. Vis. Pattern Recognit. (CVPR), Jun. 2023, pp. 15244–15253.
We also proposed the SemRE approach, which uses a copy [21] Q. Zhou, G. Pang, Y. Tian, S. He, and J. Chen, “AnomalyCLIP: Object-
agnostic prompt learning for zero-shot anomaly detection,” in Proc. Int.
of the semantic encoder to guide the whole training process, Conf. Learn. Represent. (ICLR), 2024, pp. 1–13.
to address the problem of an inaccessible downstream model. [22] Y. Tan, G. Long, J. Ma, L. Liu, T. Zhou, and J. Jiang, “Federated learning
Further, we designed training procedures for SemCC and from pre-trained models: A contrastive learning approach,” in Proc. Adv.
SemRE, respectively, which achieved a good trade-off between Neural Inf. Process. Syst. (NeurIPS), 2022, pp. 1–13.
[23] Z. Wang and W. Liu, “Robustness verification for contrastive learn-
preserving semantic information and retaining intricate details. ing,” in Proc. Int. Conf. Mach. Learn. (ICML), vol. 162, 2022,
Finally, we conducted simulations under various conditions, pp. 22865–22883.
including different bandwidth compression ratios, SNRs, and [24] Y. Zhong, H. Tang, J. Chen, J. Peng, and Y.-X. Wang, “Is self-
supervised learning more robust than supervised learning?” 2022,
downstream model configurations, to demonstrate the effec- arXiv:2206.05259.
tiveness and robustness of the proposed approaches. [25] Z. Wang and W. Liu, “RVCL: Evaluating the robustness of contrastive
learning via verification,” J. Mach Learn. Res. (JMLR), vol. 24, no. 396,
pp. 1–43, 2023.
R EFERENCES
[26] C. Chaccour and W. Saad, “Disentangling learnable and memorizable
[1] S. Tang, Q. Yang, L. Fan, X. Lei, Y. Deng, and A. Nallanathan, data via contrastive learning for semantic communications,” in Proc.
“Contrastive learning based semantic communication for wireless image 56th Asilomar Conf. Signals, Syst., Comput., Oct. 2022, pp. 1175–1179.
transmission,” in Proc. IEEE 98th Veh. Technol. Conf. (VTC-Fall), [27] Z. Tian, H. Vo, C. Zhang, G. Min, and S. Yu, “An asynchronous
Oct. 2023, pp. 1–6. multi-task semantic communication method,” IEEE Netw., early access,
[2] C. E. Shannon, “A mathematical theory of communication,” Bell Syst. Jun. 2024, doi: 10.1109/MNET.2023.3321547.
Tech. J., vol. 27, no. 3, pp. 379–423, 1948. [28] E. Bourtsoulatze, D. B. Kurka, and D. Gündüz, “Deep joint source-
[3] R. G. Gallager, “Low-density parity-check codes,” IRE Trans. Inf. channel coding for wireless image transmission,” IEEE Trans. Cogn.
Theory, vol. 8, no. 1, pp. 21–28, Jan. 1962. Commun. Netw., vol. 5, no. 3, pp. 567–579, Sep. 2019.

Authorized licensed use limited to: SRM University Amaravathi. Downloaded on December 23,2024 at 05:43:43 UTC from IEEE Xplore. Restrictions apply.
6342 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 72, NO. 10, OCTOBER 2024

[29] R. Carnap and Y. Bar-Hillel, “An outline of a theory of semantic [54] J. Shao, Y. Mao, and J. Zhang, “Task-oriented communication for mul-
information,” Res. Lab. Electron., Tech. Rep., 1952. tidevice cooperative edge inference,” IEEE Trans. Wireless Commun.,
[30] B. Juba and M. Sudan, “Universal semantic communication I,” in Proc. vol. 22, no. 1, pp. 73–87, Jan. 2023.
Fortieth Annu. ACM Symp. Theory Comput., 2008, pp. 123–132. [55] H. Xie, Z. Qin, and G. Y. Li, “Task-oriented multi-user semantic
[31] B. Güler, A. Yener, and A. Swami, “The semantic communication communications for VQA,” IEEE Wireless Commun. Lett., vol. 11, no. 3,
game,” in Proc. IEEE Trans. Cogn. Commun. Netw., Dec. 2018, vol. 4, pp. 553–557, Mar. 2022.
no. 4, pp. 787–802. [56] H. Xie, Z. Qin, X. Tao, and K. B. Letaief, “Task-oriented multi-user
[32] Y. Zhong, “A theory of semantic information,” China Commun., vol. 14, semantic communications,” IEEE J. Sel. Areas Commun., vol. 40, no. 9,
no. 1, pp. 1–17, Jan. 2017. pp. 2584–2597, Sep. 2022.
[33] Y. Shao, Q. Cao, and D. Gunduz, “A theory of semantic communication,” [57] X. Kang, B. Song, J. Guo, Z. Qin, and F. R. Yu, “Task-oriented image
2022, arXiv:2212.01485. transmission for scene classification in unmanned aerial systems,” IEEE
[34] J. Tang, Q. Yang, and Z. Zhang, “Information-theoretic limits on Trans. Commun., vol. 70, no. 8, pp. 5181–5192, Aug. 2022.
compression of semantic information,” China Commun., pp. 1–16, 2024. [58] W. Yang, X. Chi, L. Zhao, Z. Xiong, and W. Jiang, “Task-driven
[35] S. Kobus, T.-Y. Tung, and D. Gündüz, “Goal-oriented compression with semantic-aware green cooperative transmission strategy for vehicular
a constrained decoder,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), networks,” IEEE Trans. Commun., vol. 71, no. 10, pp. 5783–5798,
Jun. 2023, pp. 868–873. Oct. 2023.
[59] Y. Wu, S. Tang, L. Zhang, L. Fan, X. Lei, and X. Chen, “Resilient
[36] F. Pase, S. Kobus, D. Gündüz, and M. Zorzi, “Semantic communication machine learning-based semantic-aware MEC networks for sustainable
of learnable concepts,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Next-G consumer electronics,” IEEE Trans. Consum. Electron., vol. 70,
Jun. 2023, pp. 731–736. no. 1, pp. 2188–2199, Feb. 2024.
[37] D. B. Kurka and D. Gündüz, “DeepJSCC-f: Deep joint source-channel [60] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
coding of images with feedback,” IEEE J. Sel. Areas Commun., vol. 1, image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
no. 1, pp. 178–193, May 2020. (CVPR), Jun. 2016, pp. 770–778, doi: 10.1109/CVPR.2016.90.
[38] T. Tung, D. B. Kurka, M. Jankowski, and D. Gündüz, “DeepJSCC- [61] W. Shi et al., “Real-time single image and video super-resolution using
Q: Constellation constrained deep joint source-channel coding,” IEEE an efficient sub-pixel convolutional neural network,” in Proc. IEEE Conf.
J. Sel. Areas Inf. Theory, vol. 3, no. 4, pp. 720–731, Dec. 2022. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 1874–1883.
[39] M. Ding, J. Li, M. Ma, and X. Fan, “SNR-adaptive deep joint source- [62] T. Wang and P. Isola, “Understanding contrastive representation learning
channel coding for wireless image transmission,” in Proc. IEEE Int. through alignment and uniformity on the hypersphere,” in Proc. 37th Int.
Conf. Acoust., Speech Signal Process. (ICASSP), Toronto, ON, Canada, Conf. Mach. Learn., vol. 119, Jul. 2020, pp. 9929–9939.
Jun. 2021, pp. 1555–1559. [63] N. Tishby and N. Zaslavsky, “Deep learning and the information bottle-
[40] M. Yang and H.-S. Kim, “Deep joint source-channel coding for wireless neck principle,” in Proc. IEEE Inf. Theory Workshop (ITW), Apr. 2015,
image transmission with adaptive rate control,” in Proc. IEEE Int. Conf. pp. 1–5.
Acoust., Speech Signal Process. (ICASSP), May 2022, pp. 5193–5197. [64] V. Mnih, “Human-level control through deep reinforcement learning,”
[41] J. Xu, B. Ai, W. Chen, A. Yang, P. Sun, and M. Rodrigues, “Wireless Nature, vol. 518, pp. 529–533, Feb. 2015.
image transmission using deep source channel coding with attention [65] T. P. Lillicrap et al., “Continuous control with deep reinforcement
modules,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 4, learning,” in Proc. Int. Conf. Learn. Represent., 2016, pp. 1–56.
pp. 2315–2328, Apr. 2022. [66] X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, and J. Sun, “RepVGG:
[42] E. Erdemir, T.-Y. Tung, P. L. Dragotti, and D. Gündüz, “Generative joint Making VGG-style ConvNets great again,” in Proc. IEEE/CVF Conf.
source-channel coding for semantic image transmission,” IEEE J. Sel. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 13728–13737.
Areas Commun., vol. 41, no. 8, pp. 2645–2657, Aug. 2023. [67] N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “ShuffleNet V2: Practical
[43] T. Han, J. Tang, Q. Yang, Y. Duan, Z. Zhang, and Z. Shi, “Generative guidelines for efficient CNN architecture design,” in Proc. Eur. Conf.
model based highly efficient semantic communication approach for Comput. Vis. (ECCV), 2018, pp. 1–11.
image transmission,” in Proc. IEEE Int. Conf. Acoust., Speech Signal
Process. (ICASSP), Jun. 2023, pp. 1–5.
[44] W. Xia, Y. Zhang, Y. Yang, J.-H. Xue, B. Zhou, and M.-H. Yang, “GAN Shunpu Tang received the B.E. degree in com-
inversion: A survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, puter science and the master’s degree in cyberspace
no. 3, pp. 3121–3138, Mar. 2023. security from Guangzhou University, Guangzhou,
[45] Y. Bo, Y. Duan, S. Shao, and M. Tao, “Learning based joint coding- China, in 2020 and 2023, respectively. He is cur-
modulation for digital semantic communication systems,” in Proc. IEEE rently pursuing the Ph.D. degree with the College
Int. Conf. Wireless Commun. Signal Process. (WCSP), Nov. 2022, of Information Science and Electronic Engineering,
pp. 1–6. Zhejiang University, Hangzhou, China. His current
[46] N. Farsad, M. Rao, and A. Goldsmith, “Deep learning for joint source- research interests include semantic communication
channel coding of text,” in Proc. IEEE Int. Conf. Acoust. Speech Signal and edge intelligence.
Process. (ICASSP), Jul. 2018, pp. 2326–2330.
[47] H. Xie, Z. Qin, G. Y. Li, and B.-H. Juang, “Deep learning enabled
semantic communication systems,” IEEE Trans. Signal Process., vol. 69,
pp. 2663–2675, 2021. Qianqian Yang (Member, IEEE) received the
[48] X. Peng et al., “A robust deep learning enabled semantic communication B.Sc. degree in automation from Chongqing Uni-
system for text,” in Proc. IEEE Global Commun. Conf. (GLOBECOM), versity, Chongqing, China, in 2011, the M.S.
Oct. 2022, pp. 2704–27049. degree in control engineering from Zhejiang Uni-
versity, Hangzhou, China, in 2014, and the Ph.D.
[49] S. Guo, Y. Wang, S. Li, and N. Saeed, “Semantic importance-aware
degree in electrical and electronic engineering from
communications using pre-trained language models,” IEEE Commun.
Imperial College London, U.K. She has held vis-
Lett., vol. 27, no. 9, pp. 2328–2332, Sep. 2023.
iting positions at CentraleSupelec in 2016 and
[50] T. Han, Q. Yang, Z. Shi, S. He, and Z. Zhang, “Semantic-preserved New York University Tandon School of Engineering
communication system for highly efficient speech transmission,” IEEE from 2017 to 2018. After the Ph.D. studies, she was
J. Sel. Areas Commun., vol. 41, no. 1, pp. 245–259, Jan. 2023. a Post-Doctoral Research Associate with Imperial
[51] Z. Weng and Z. Qin, “Semantic communication systems for speech College London and a Machine Learning Researcher with Sensyne Health Plc.
transmission,” IEEE J. Sel. Areas Commun., vol. 39, no. 8, She is currently a tenure-tracked Professor with the Department of Information
pp. 2434–2444, Aug. 2021. Science and Electronic Engineering, Zhejiang University. Her main research
[52] J. Shao and J. Zhang, “BottleNet++: An end-to-end approach for feature interests include wireless communications, information theory, and semantic
compression in device-edge co-inference systems,” in Proc. IEEE Int. communications. She serves as a reviewer for IEEE T RANSACTIONS ON
Conf. Commun. Workshops (ICC Workshops), Jun. 2020, pp. 1–6. I NFORMATION T HEORY, IEEE T RANSACTIONS ON C OMMUNICATIONS, and
[53] J. Shao, Y. Mao, and J. Zhang, “Learning task-oriented communication IEEE T RANSACTIONS ON W IRELESS C OMMUNICATIONS. She has organized
for edge inference: An information bottleneck approach,” IEEE J. Sel. several workshops at conferences, such as IEEE ICC 2023, IEEE WCNC
Areas Commun., vol. 40, no. 1, pp. 197–211, Jan. 2022. 2022, IEEE VTC 2022, and IEEE HPCC 2021.

Authorized licensed use limited to: SRM University Amaravathi. Downloaded on December 23,2024 at 05:43:43 UTC from IEEE Xplore. Restrictions apply.
TANG et al.: CONTRASTIVE LEARNING-BASED SEMANTIC COMMUNICATIONS 6343

Lisheng Fan received the bachelor’s degree from Arumugam Nallanathan (Fellow, IEEE) has been a
the Department of Electronic Engineering, Fudan Professor in wireless communications and the Head
University, in 2002, the master’s degree from the of the Communication Systems Research (CSR)
Department of Electronic Engineering, Tsinghua Group, School of Electronic Engineering and Com-
University, China, in 2005, and the Ph.D. degree puter Science, Queen Mary University of London,
from the Department of Communications and Inte- since September 2017. He was with the Depart-
grated Systems, Tokyo Institute of Technology, ment of Informatics, King’s College London, from
Japan, in 2008. He is currently a Professor with the December 2007 to August 2017, where he was a
School of Computer Science, Guangzhou University. Professor in wireless communications from April
He has published many articles in international jour- 2013 to August 2017 and a Visiting Professor since
nals, such as IEEE T RANSACTIONS ON W IRELESS September 2017. He was an Assistant Professor with
C OMMUNICATIONS, IEEE T RANSACTIONS ON C OMMUNICATIONS, and the Department of Electrical and Computer Engineering, National University
IEEE T RANSACTIONS ON I NFORMATION T HEORY, and papers in confer- of Singapore, from August 2000 to December 2007. He has published nearly
ences, such as IEEE ICC, IEEE Globecom, and IEEE WCNC. His research 500 technical papers in scientific journals and international conferences. His
interests include wireless cooperative communications, physical-layer secure research interests include artificial intelligence for wireless systems, beyond
communications, intelligent communications, and system performance evalu- 5G wireless networks, and the Internet of Things (IoT).
ation. He was awarded as an exemplary reviewer by IEEE T RANSACTIONS He was a co-recipient of the Best Paper Awards from the IEEE Inter-
ON C OMMUNICATIONS and IEEE C OMMUNICATIONS L ETTERS . He was a national Conference on Communications 2016 (ICC’2016), IEEE Global
Guest Editor of many journals, such as Physical Communication, EURASIP Communications Conference 2017 (GLOBECOM’2017), and IEEE Vehicular
Journal on Wireless Communications and Networking, and Wireless Commu- Technology Conference 2018 (VTC’2018). He is an IEEE Distinguished
nications and Mobile Computing, and is an Editor of China Communications. Lecturer. He received the IEEE Communications Society SPCE Outstanding
Service Award in 2012 and the IEEE Communications Society RCC Out-
standing Service Award in 2014. He was an Editor of IEEE T RANSACTIONS
ON W IRELESS C OMMUNICATIONS from 2006 to 2011, IEEE T RANSAC -
TIONS ON V EHICULAR T ECHNOLOGY from 2006 to 2017, IEEE W IRELESS
C OMMUNICATIONS L ETTERS, and IEEE S IGNAL P ROCESSING L ETTERS.
He served as the Chair of the Signal Processing and Communication Elec-
tronics Technical Committee of the IEEE Communications Society and the
Technical Program Chair and member of Technical Program Committees at
numerous IEEE conferences. He has been selected as a Web of Science Highly
Cited Researcher in 2016. He is an Editor of IEEE T RANSACTIONS ON
C OMMUNICATIONS.

George K. Karagiannidis (Fellow, IEEE) is cur-


rently a Professor with the Electrical and Computer
Engineering Department, Aristotle University of
Thessaloniki, Greece, and the Head of Wire-
Xianfu Lei (Member, IEEE) received the Ph.D. less Communications and Information Processing
degree from Southwest Jiaotong University in 2012. (WCIP) Group. He is also a Faculty Fellow
From 2012 to 2014, he was a Research Fellow with the Cyber Security Systems and Applied AI
with the Department of Electrical and Computer Research Center, Lebanese American University. His
Engineering, Utah State University. He is currently research interests include wireless communications
a Professor with the School of Information Science systems and networks, signal processing, optical
and Technology, Southwest Jiaotong University, wireless communications, wireless power transfer
since 2015. His research interests include 5G/6G net- and applications, and communications and signal processing for biomedical
works, cooperative and energy harvesting networks, engineering. He received three prestigious awards, such as the 2021 IEEE
and physical-layer security. He received the Best ComSoc RCC Technical Recognition Award, the 2018 IEEE ComSoc SPCE
Paper Award from IEEE/CIC ICCC2020, the Best Technical Recognition Award, and the 2022 Humboldt Research Award from
Paper Award from WCSP2018, the WCSP 10th Anniversary Excellent Paper the Alexander von Humboldt Foundation. He is one of the highly-cited authors
Award, the IEEE C OMMUNICATIONS L ETTERS Exemplary Editor 2019, and across all areas of electrical engineering, recognized by Clarivate Analytics
the Natural Science Award of China Institute of Communications in 2019. as a Web of Science Highly Cited Researcher for nine consecutive years,
He served as a Senior/an Associate Editor for IEEE C OMMUNICATIONS L ET- from 2015 to 2023. He was a past editor in several IEEE journals and
TERS from 2014 to 2019. He is an Area Editor of IEEE C OMMUNICATIONS from 2012 to 2015, he was the Editor-in-Chief of IEEE C OMMUNICATIONS
L ETTERS and an Associate Editor of IEEE W IRELESS C OMMUNICATIONS L ETTERS. Since January 2024, he has been the Editor-in-Chief of IEEE
L ETTERS and IEEE T RANSACTIONS ON C OMMUNICATIONS. T RANSACTIONS ON C OMMUNICATIONS.

Authorized licensed use limited to: SRM University Amaravathi. Downloaded on December 23,2024 at 05:43:43 UTC from IEEE Xplore. Restrictions apply.

You might also like