ALiteDistributedSemanticCommunicationSystemforInternet_ofThings
ALiteDistributedSemanticCommunicationSystemforInternet_ofThings
Abstract—The rapid development of deep learning (DL) and to IoT devices. However, data transmitted over the air could
widespread applications of Internet-of-Things (IoT) have made be distorted by wireless channels, which may cause improper
the devices smarter than before, and enabled them to perform trained results, i.e., local optimum. Moreover, the large number
arXiv:2007.11095v3 [eess.SP] 25 Nov 2020
Received Recovered Extracted Take model is trained first, then pruned by a given threshold, and
symbols Traditional symbols Feature Extractor Features Effect Action
Receiver Network Network is fine-tuned to recover performance in terms of image classi-
(a) Transmission level fication. This approach could reduce the connections without
Received Recovered Take losing accuracy. Liu et al. [25] proposed to prune the filters in
symbols
Semantic Receiver
Features Effect Action CNN by training the model with the L1 regularization so that
Network
(b) Semantic level
redundancy weights converge to zero directly without sacrific-
Received Take ing the performance. By analyzing the connection sensitivity
symbols
Effect Receiver
Action among neurons and layers, Li et al. [26] remove the insensitive
(c) Effectiveness level
layers, which further increases inference speed. By applying
these pruning approaches, DL models can be compressed
Fig. 1. Illustration of three communication levels at the receiver. by 13 to 20 times. Quantization aims to represent a weight
parameter with lower precision (fewer bits), which reduces the
required bitwidth of data flowing through the neural network
Missing channel gradients becomes the bottleneck of training model in order to shrink the model size for memory saving
E2E communication systems. There are several works for and simplify the operations for computing acceleration [27].
mitigating this problem [15]–[17]. Dörner et al. proposed a With vector quantization, Gong et al. [28] quantize the DL
two-phase training processing [15] by training the transceiver models. Similarly, Zhou et al. [29] investigated an iterative
with a stochastic channel model firstly, and fine-toning the quantization, which starts with a trained full-resolution model
receiver over real channels. Aoudia et al. estimated the channel and then quantizes only a portion of the model followed by
gradients by sampling from a relaxed distribution based on several epochs of re-training to recover the accuracy loss
stochastic reinforce learning policy [16], where the transmitter from quantization. A mix precision quantization by Li et
and receiver are trained separately. Ye et al. proposed genera- al. [30] quantizes weights while keeping the activations at
tive adversarial network (GAN) to approximate the unknown full-resolution. The training algorithm by Jacob et al. [31]
channel model [17] so that the channel gradients can be preserves the model accuracy after post-quantization. With the
estimated by the GAN. quantization, the weights can generally be compressed from
There have been some initial works related to deep semantic 32-bit to 8-bit without performance loss. Similarly, pruning
communications [18]–[22]. Bourtsoulatze et al. [18] proposed and quantizing can be also used in DL-enabled communication
joint source-channel coding for wireless image transmission systems. For example, Guo et al. [32] have shown that model
based on the convolutional neural network (CNN), where peak compression can accelerate the processing of channel state
signal-to-noise ratio (PSNR) is used to measure the accuracy information (CSI) acquisition and signal detection in massive
of image recovery at the receiver. Taking image classifica- multiple-input multiple-output (MIMO) systems without per-
tion tasks into consideration, Lee et al. [19] developed a formance degradation.
transmission-recognition communication system by merging Through applying network slimmer into our existing work
wireless image transmission with the effect network as DNNs, DeepSC, the aforementioned two challenges in IoT networks
i.e., image classification, which achieves higher image classi- can be effectively addressed. Although the above works vali-
fication accuracy than performing them separately. For texts, date the feasibility, we still face the following issues to make
Farsad et al. [21] designed joint source-channel coding for it affordable for IoT devices:
erasure channel by using a recurrent neural network (RNN)
and a fully-connected neural network (FCN), where the system • Question 1: How to design semantic communication
recovers the text directly rather than perform channel and systems over wireless fading channels?
source decoding separately. In order to understand texts better • Question 2: How to form the constellation to make it
and adapt to dynamic environments, Xie et al. [22] developed a affordable for capacity-limited IoT devices?
semantic communication system based on Transformer, named • Question 3: How to compress semantic models for fast-
DeepSC, which clarifies the concepts of semantic information model transmission and low-cost implementation on IoT
and semantic error at the sentence-level for the first time. devices?
In brief, compared with traditional approaches, the semantic In this paper, we design a distributed semantic communi-
communication systems are more robust to channel variation cation system for IoT networks. Especially, a lite DeepSC is
and are able to achieve better performance in terms of source proposed (L-DeepSC) to address the above questions. Differ-
recovery and image classification, especially in the low signal- ent from our previous work [22], this work solves the training
to-noise (SNR) regime. DeepSC problem over fading channels with imperfect CSI
To deal with the second problem in reducing the number of and considers different wireless channel models to show the
parameters, network slimmer has attracted extensive attention generalization of our method. Moreover, this work extends
to compress DL models without degrading performance since [22] to a more practical IoT scenario, where two key problems,
neural networks are usually over-sized [23]. Parameters prun- model updating, and broadcasting, are solved. This work also
ing and quantization are two main approaches for DL model addresses the issue of the finite constellation sizes for capacity-
compression. Parameter pruning is to remove the unnecessary constrained IoT devices while [22] assumes infinite constella-
connections between two neurons or important neurons. Han tions. The main contributions of this paper are summarized as
et al. [24] proposed an iterative pruning approach, where the follows.
SUBMIT TO IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 3
• We design a distributed semantic communication network iterations by the received semantic features from IoT
under power and latency constraints, in which the receiver devices.
and feature extractor networks are jointly optimized by 2) Model Broadcasting: The cloud/edge platform broad-
overcoming fading channels. casts the trained DL model to each IoT device.
• By identifying the impacts of CSI on DL model training 3) Semantic Features Upload: The IoT devices constantly
over fading channels, we propose a CSI-aided semantic capture the text data, which are encoded by the proposed
communication system to speed up convergence, where semantic transmitter shown in Fig. 2(b). The extracted
the CSI is refined by a de-noise neural network. This semantic features are then transmitted to the cloud/edge
addresses the aforementioned Question 1. for model update and subsequent processing.
• To make data transmission and receiving affordable for
The aforementioned Questions 1-3 correspond to model ini-
capacity-constrained devices, we design a finite-bits con- tialization/update, semantic features uploading, and model
stellation to solve Question 2. broadcasting, respectively. Different from the traditional infor-
• Due to over-parametrization, we propose a model com-
mation transmission, semantic features can be not only used
pression algorithm, including network sparsification and for recovering the text at the semantic level accurately, but
quantization, to reduce the size of DL models by pruning also exploited as the input of other modules, i.e., emotion
the redundancy connections and quantizing the weights, classification, dialog system, and human-robot interaction, for
which addresses the aforementioned Question 3. training effect networks and perform various intelligent tasks
The rest of this paper is organized as follows. The dis- directly. The devices can also exchange semantic features,
tributed semantic communication system model is introduced which has been previously discussed in our work in [22]. We
and the corresponding problems are identified in Section II. focus on the communication between cloud/edge platforms
Section III presents the proposed L-DeepSC. Numerical results and local IoT devices to make the semantic communication
are used to verify the performance of the proposed L-DeepSC model affordable.
in Section IV. Finally, Section V concludes this paper.
N otation: Cn×m and Rn×m represent the sets of complex
and real matrices of size n × m, respectively. Bold-font A. Semantic Communication System
variables denote matrices or vectors. x ∼ CN (µ, σ 2 ) means
variable x follows the circularly-symmetric complex Gaussian The DeepSC shown in Fig. 2(b) can be divided into three
distribution with mean µ and covariance σ 2 . (·)T and (·)H parts mainly, transmitter network, physical channel, and re-
denote the transpose and Hermitian of a vector or a matrix, ceiver network, where the transmitter network includes se-
respectively. <{·} and ={·} refer to the real and the imaginary mantic encoder and channel encoder, and the receiver network
parts of a complex number. consists of semantic decoder and channel decoder.
We assume that the input of the DeepSC is a sentence,
s = [w1 , w2 , · · · , wN ], where wn represents the n-th word in
II. S YSTEM M ODEL AND P ROBLEM F ORMULATION
the sentence. The encoded symbol stream can be represented
Text is an important type of source data, which can be as
sensed from speaking and typing, environmental monitoring,
X = Cα (Sβ (s)) , (1)
etc. By training DL models with these text data at cloud/edge
platform, the DL models based IoT devices have the capability
where Sβ (·) is the semantic encoder network with parameter
to understand text data and generate semantic feature to be
set β and Cα (·) is the channel encoder with parameter set α.
transmitted to the center to perform intelligent tasks, i.e.,
If X is sent through a wireless fading channel, the signal
intelligent assistants, human emotion understanding, and envi-
received at the receiver can be given by
ronment humid and temperature adjustment based on human
preference [33].
Y = fH (X) = HX + N, (2)
As shown in Fig. 2(a), we focus on distributed seman-
tic communications for IoT networks. The considered sys- where H1 represents the channel gain between the transmitter
tem is consisted of various IoT networks with two layers,
and the receiver, and N ∼ N 0, σn2 is additive white
the cloud/edge platform and distributed IoT devices. The Gaussian noise (AWGN).
cloud/edge platform is equipped with huge computation power
The decoded signal can be represented as
and big memory, which can be used to train the DL model by
the received semantic features. The semantic communication −1
ŝ = Sχ Cδ−1 (Y) ,
(3)
enabled IoT devices to perform intelligent tasks by understand-
ing sensed texts, which are with limited memory and power where ŝ is the recovered sentence, Cδ−1 (·) is the channel
but expected long lifetime, i.e., up to 10 years. Particularly, decoder with parameter set δ and Sχ −1
(·) is the semantic
our considered distributed semantic communication system decoder network with parameter set χ, the superscript -1
consists of the following three steps: represents the decoding operation.
1) Model Initialization/Update: The cloud/edge platform
first trains the semantic communication model by initial 1 Here, we have omitted discussion of complex channels. If the complex
dataset. The trained model is updated in the subsequent channel is H̄, then H̄ = [< (H) , −= (H) ; = (H) , < (H)].
SUBMIT TO IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 4
Cloud/Edge Computing
Model Initialization/Update
Semantic
Channel
Source
Physical
Channel
Semantic s X Y ŝ
features
Transmitter Receiver
Devices
(a) Proposed distributed semantic (b) Semantic communication system
communication network.
The whole
Distributedsemantic communication
Semantic communication Network can be trained by the where η is the learning rate and ∂L
∂β is the gradient, computed
CE
are with finite storage and computation capability, which Back-propagation: It updates parameter WT by its gradi-
limits the size of DeepSC. Therefore, compressing DeepSC ent
not only reduces the latency of model transmission between ∂LCE (ŝ, s) T
the cloud/edge platform and local devices but also makes it = (FR WR HFT ) ∇ŝ LCE (ŝ, s) sT , (10)
∂WT
possible to run the DL model on local devices.
where FR ∼ diag (σ 0 (WR y + bR )) and FT ∼
0
diag (σ (WT s + bT )). In (10), the H is untrainable and
III. P ROPOSED L ITE D ISTRIBUTED S EMANTIC random, therefore it will cause perturbation for the weight
C OMMUNICATION S YSTEM updating, i.e., the weight updating with higher variance. If
the transmitter consists of very deep neural networks, the
To address the identified challenges in Section II, we pro-
perturbation will affect the back-propagation of the whole
pose a lite distributed semantic communication system, named
transmitter network, where the perturbation will propagate to
L-DeepSC. We analyze the effects of CSI in the model training
the whole transmitter network by the chain rule.
under fading channels and design a CSI-aided training process
Forward-propagation: With the received signal WR , the
to overcome the fading effects, which successfully deals with
source messages can be recovered by
Question 1. Besides, the weight pruning and quantization are
investigated to address Question 2. Finally, our finite-points Ŝ = σ (WR Y + bR )
(11)
constellation design solves Question 3, effectively. = σ (WR HX + WR N + bR ) .
In (11), WR has to learn how to deal with the channel
A. Deep De-noise Network based CSI Refinement and Can- effects and decode at the same time, which increases training
cellation burden and reduces network expression capability. Meanwhile,
the errors caused by channel effects also propagate to the
The most common method to reduce the effects of fading
subsequent layers for the L-DeepSC receiver with multiple
channels in wireless communication is to use the known
layers.
channel properties of a communication link, CSI. Similarly,
The impacts of channel can be mitigated by exploiting CSI
CSI can also reduce the channel impacts in training L-DeepSC.
at the cloud/edge. If channel H is known, then the received
Next, we will first analyze the role of CSI in L-DeepSC
symbol can be processed by
training. −1 H
In order to simplify the analysis, we assume the transmitter Ỹ = HH H H Y = X + Ñ, (12)
and the receiver are with one-layer dense with sigmoid activa- −1 H
tion, where transmitter has an additional untrainable embed- where Ñ = HH H H N. In (12), the channel effect
ding layer, and receiver also has an untrainable de-embedding is transferred from multiplicative noise to additive noise, Ñ,
layer. The IoT devices are with the trained transmitter model which provides the possibility of stable back-propagation as
and the cloud/edge platform works as the receiver, as shown well as the stronger capability of network representation.
in the system model Fig. 2. The IoT devices and cloud/edge With (12), back-propagation and forward-propagation can be
platform are equipped with the same number of antennas. performed by setting H = I in (10) and (11), respectively.
After the embedding layer, the source message, s, is embedded Therefore, the channel effects can be completely removed.
into, S. Then, encode S into The above discussion shows the importance of CSI in
model training. However, CSI can be only estimated generally,
X = σ (WT S + bT ) , (8) i.e., least-squared (LS), linear minimum mean-squared error
(LMMSE), or minimum mean-squared error (MMSE) estima-
where X2 is the semantic features transmitted from the IoT tors. Due to exploiting prior channel statistics, LMMSE and
devices to the cloud/edge platform. WT and bT are the train- MMSE estimators usually perform better than the LS estima-
able parameters to extract the features from source message tors. Thus, LMMSE and MMSE estimators are sensitive to the
s, and σ(·) is the sigmoid activation function. accuracy of channel statistic while the LS estimator requires
The received symbol at the cloud/edge platform is affected no prior channel information. Meanwhile, DL techniques can
by channel H and AWGN as in (2). From the received symbol, also be used to improve the performance of channel estimation
the cloud/edge platform recovers the embedding matrix by [36], [37].
For simplicity, we initially use the LS estimator. Then, we
Ŝ = σ (WR Y + bR ) , (9)
adopt the deep de-noise network to increase the resolution of
where the estimated source message, ŝ, can be obtained after the LS estimator as in [38] shown in Fig. 3. Particularly, the
de-embedding layer. WR and bR can learn to recover s. The rough CSI estimated by the LS estimator with few pilots first
L-DeepSC can be optimized by the loss function in (4). The denoted by
fading channels not only contaminates the gradients in the Hrough = Yp XH H
p = H + NXp , (13)
back-propagation, but also restricts the representation power
in the forward-propagation. where Yp = HXp +N, Yp is the received pilot signal, Xp is
the transmitted pilot signals. Then, (13) can be represented as
2 Here, we have avoided discussion of complex signal. If the complex signal
is X̄, then X̄ = [< (X) , = (X)] . Hrough = H + N,
b (14)
SUBMIT TO IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 6
Fig. 5. The comparison between the full-resolution constellation and 4-bits constellation.
1 100
0.8
10-1
0.6
BLEU Score
MSE
0.4
10-2
0.2
Fig. 6. The BLEU scores of different constellation sizes versus SNR under Fig. 7. The MSE for MMSE estimator, LS estimator, and the proposed ADNet
AWGN. based LS estimator.
region. As a result, the MSE of ADNet based LS estimator block-length, which can correct long bit sequences, however,
is significantly lower than that of LS and MMSE estimators turbo coding is convolution coding with short block-length,
when SNR is low. With increasing SNR, the MSE of ADNet where the coded bits only are related with previous m bits,
based LS estimator approaches to that of the LS and MMSE i.e., m = 3, so that the adjacent words result in higher error
estimators. Therefore, the ADNet based LS estimator can be rate. The performance of L-DeepSC is very close to that of
substituted by the LS estimator to reduce the complexity in DeepSC in terms of BLEU score, but requires much less
the high SNR region. bandwidth for communications. The system trained without
Fig. 8 and Fig. 9 illustrate the relationship between BLEU CSI performs worse than those trained with CSI, especially
score and SNR with the 4-bits constellation over the Rician under the Rayleigh fading channels, which also confirms the
and the Rayleigh fading channels, respectively, where DeepSC analysis of (10) and (11). Without CSI, the performance differ-
is trained with perfect CSI and the L-DeepSC is trained ence between the Rayleigh channels and the Rician channels is
with perfect CSI, rough CSI by (14), refined CSI by (15) caused by the line-of-sight (LOS), which can help the systems
and without CSI, respectively. The traditional approaches are recognize the semantic information during training. Besides,
Huffman coding with (5,7) RS and with turbo coding (rate with the aid of CSI, the effects of the fading channels are
1/2), both with 64-QAM. We observe that all DL-enabled mitigated significantly, as we have analyzed before. When
approaches are more competitive under the fading channels. SNR is low, the system with perfect CSI or refined CSI
RS coding is better than turbo coding in terms of BLEU score. outperforms that with rough CSI. As SNR increases, all these
This is because RS coding is linear block coding with long systems, L-DeepSC with perfect CSI, refined CSI, and rough
SUBMIT TO IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 10
1 1
0.8 0.8
BLEU Score
0.6 0.6
BLEU Score
0.4 0.4
DeepSC with perfect CSI
L-DeepSC with perfect CSI
L-DeepSC with refined CSI
L-DeepSC with rough CSI
0.2 L-DeepSC without CSI 0.2
SNR = 18dB
huffman + RS with perfect CSI
SNR = 12dB
huffman + Turbo with perfect CSI
SNR = 6dB
SNR = 0dB
0 0
0 3 6 9 12 15 18 0 0.1 0.3 0.5 0.7 0.9 0.99
SNR (dB)
Fig. 8. The BLEU scores versus SNR under Ricain fading channels, with Fig. 10. The BLEU scores of different SNRs versus sparsity ratio, γ, under
perfect CSI, rough CSI, refined CSI, and no CSI. Rician fadings channel with the refined CSI.
1 1
0.8 0.8
0.6
BLEU Score
BLEU Score
0.6
Fig. 9. The BLEU scores versus SNR under Rayleigh fading channels, with Fig. 11. The BLEU scores of different SNRs versus quantization level, m,
perfect CSI, rough CSI, refined CSI, and no CSI. under Rician fading channels with the refined CSI.
CSI, converge to similar performance gradually. for the high SNR cases, the model can be pruned directly
with only slight performance degradation. For the low SNR
C. Model Compression region, it is possible to prune 99% weights without significant
In this experiment, we investigate the performance of performance degradation when the system is sensitive to power
network slimmer, including network sparsification, network consumption.
quantization, and the combination of both. The pre-trained Fig. 11 demonstrates the relationship between the BLEU
model used for pruning and quantization is trained with 4- score and the quantization bit number, m, under the Rician
bits constellation under the Rician fading channels. fading channels, where m is defined in (19), and the system
Fig. 10 shows the influences of network sparsity ratio, γ, is quantized with QAT when the m is smaller than 2. The
on the BLEU scores with different SNRs under the Rician performance with m = 8 to m = 20 is similar, which indicates
fading channels, where the system is pruned directly when that the effectiveness of low-resolution neural networks. If
γ increases from 0 to 0.9 and is pruned with fine-tuning the system is more sensitive to power consumption and can
when γ increases to 0.99 continually. The proposed L-DeepSC tolerant to certain performance degradation, the resolution of
achieves almost the same BLEU scores when the γ increases the neural networks can be further reduced to 4-bits level.
from 0 to 0.9, which shows that there exists a mass of However, the BLEU score decreases dramatically from m = 4
weights redundancy in the trained DeepSC model. When the to m = 2 over the whole SRN range since most of the key
γ increases to 0.99, the BLEU scores still drop slightly due to information is removed in the low-resolution neural network.
the processing of fine-tuning, where the performance loss at Table II compares the BLEU scores and compression ratios
0 dB and 6 dB is larger than that at 12 dB and 18 dB. Thus, under different combinations of weights pruning and weights
SUBMIT TO IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 11
TABLE II
T HE BLEU SCORE AND COMPRESSION RATIO , ψ, C OMPARISONS VERSUS DIFFERENT SPARSITY RATIO , γ, AND QUANTIZATION LEVEL , m, IN SNR =
12dB.
BLEU score BLEU score BLEU score BLEU score BLEU score
Pruned Model ψ ψ ψ ψ ψ
with m = 4 with m = 8 with m = 12 with m = 16 with m = 32
γ=0 0.811194 8 0.906763 4 0.902354 2.667 0.903089 2 0.895602 1
γ = 0.3 0.838967 11.429 0.892745 5.714 0.908537 3.81 0.910184 2.857 0.89851 1.429
γ = 0.6 0.835863 20.0 0.897143 10.0 0.90815 6.667 0.900468 5.0 0.9093 2.5
γ = 0.9 0.810322 80.0 0.895306 40.0 0.898784 26.667 0.910554 20.0 0.89515 10
γ = 0.95 0.779685 160.0 0.875814 80.0 0.873426 53.333 0.877221 40.0 0.87653 20
TABLE III Specially, the receiver and feature extractor were designed
T HE COMPARISON BETWEEN L-D EEP SC AND D EEP SC TRANSCEIVER IN jointly for text transmission. Firstly, we analyzed the effec-
PARAMETERS , SIZE , RUNTIME , AND BLEU SCORE .
tiveness of CSI in forward-propagation and back-propagation
Parameters Size Runtime BLEU score during system training over the fading channels. The analyt-
γ = 0, ical results reveal that the fading channels contaminate the
3,333,120 12.3 MB 20ms 0.895602
m = 32
γ = 0.6,
weights update and restrict model representation capability.
1,333,247 1.28 MB 18ms 0.897143 Thus, a refined LS estimator with fewer pilot overheads
m=8
was developed to eliminate the effects of fading channels.
Besides, we map the full-resolution original constellation into
quantization with SNR = 12 dB, where the compression ratio finite bits constellation to lower the cost of IoT devices,
is computed by which was verified by simulation results. Finally, due to
M × 32 the limited narrow bandwidth and computational capability
ψ= , (25) in IoT networks, two model compression approaches have
Mpruned × m
been proposed: 1) the network sparsification to prune the
where M is the number of weights before pruning and Mpruned unnecessary weights, and 2) network quantization to reduce
is the number of weights remaining after pruning, 32 is the the weights resolution. The simulation results validated that
number of required bits for FP32 and m is the number of the the proposed L-DeepSC outperforms the traditional methods,
required bits after quantization. The performance decreases especially in the low SNR regime, and has provided insights
when γ increases or m decreases, which are consistent with into the balance among compression ratio, sparsity ratio, and
Fig. 10 and Fig. 11. From the table, different compression quantization level. Therefore, our proposed L-DeepSC is a
ratios could lead to similar performance. For example, the promising candidate for intelligent IoT networks, especially
BLEU score with γ = 30% and m = 8 is similar to that with in the low SNR regime.
γ = 90% and m = 12, but the compression ratio is about five
times different, i.e., 5.714 and 26.667. By properly choosing
a suitable sparsity ratio and a quantization level, the same R EFERENCES
performance can be achieved but with a high compression [1] L. Atzori, A. Iera, and G. Morabito, “The Internet of Thingss: a survey,”
ratio. Computer Netw., vol. 54, no. 15, pp. 2787–2805, Oct. 2010.
Table III compares the DeepSC and L-DeepSC with 60% [2] T. Qiu, N. Chen, K. Li, M. Atiquzzaman, and W. Zhao, “How can
heterogeneous Internet of Things build our future: A survey,” IEEE
weights sparsity and 8-bit quantization when SNR is 12 dB, Commun. Surv. Tutorials, vol. 20, no. 3, pp. 2011–2027, Feb. 2018.
where we mainly consider the transmission of the weights. [3] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press,
The simulation is performed in CPU by the computer with 2016.
[4] M. Mohammadi, A. Al-Fuqaha, S. Sorour, and M. Guizani, “Deep
Intel Core [email protected]. After network slimmer, learning for IoT big data and streaming analytics: A survey,” IEEE
the model size is reduced from 12.3 MB to 1.28 MB while Commun. Surv. Tutorials, vol. 20, no. 4, pp. 2923–2960, Jun. 2018.
achieving a similar BLEU score, which means the bandwidth [5] H. Li, K. Ota, and M. Dong, “Learning iot in edge: Deep learning for
the Internet of Things with edge computing,” IEEE Network, vol. 32,
resource can be saved significantly without degrading the no. 1, pp. 96–101, Jan. 2018.
performance. Besides, the runtime slightly decreases from [6] R. Carnap, Y. Bar-Hillel et al., An Outline of A Theory of Semantic
20ms to 18ms since the unstructured pruning method is Information. RLE Technical Reports 247, Research Laboratory of
Electronics, Massachusetts Institute of Technology., Cambridge MA,
employed, and there exists the communication time between Oct. 1952.
flash memory and some operation that can not be optimized. [7] D. Tse and P. Viswanath, Fundamentals of Wireless Communication.
If the model size is bigger, the L-DeepSC could save more Cambridge University Press, 2005.
runtime. [8] I. Guyon, S. Gunn, M. Nikravesh, and L. A. Zadeh, Feature Extraction:
Foundations and Applications. Springer, 2008, vol. 207.
[9] R. Szeliski, Computer Vision: Algorithms and Applications. Springer
V. C ONCLUSION Science & Business Media, 2010.
[10] N. Indurkhya and F. J. Damerau, Handbook of Natural Language
In this paper, we proposed a lite distributed semantic Processing. CRC Press, 2010, vol. 2.
communication system, named L-DeepSC, for the Internet [11] C. E. Shannon and W. Weaver, The Mathematical Theory of Communi-
cation. The University of Illinois Press, 1949.
of Things (IoT) networks, where the participating devices [12] D. Tse and P. Viswanath, Fundamentals Wireless Communication.
are usually with limited power and computing capabilities. Cambridge University Press, 2005.
SUBMIT TO IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 12
[13] Z. Qin, H. Ye, G. Y. Li, and B.-H. F. Juang, “Deep learning in physical [36] K. Thakkar, A. Goyal, and B. Bhattacharyya, “Deep learning and
layer communications,” IEEE Wireless Commun., vol. 26, no. 2, pp. channel estimation,” in Proc. Int’l Conf. on Adv. Comput. and Commun.
93–99, Apr. 2019. Systems (ICACCS), Coimbatore, India, Mar. 2020, pp. 745–751.
[14] T. O’Shea and J. Hoydis, “An introduction to deep learning for the [37] E. Balevi, A. Doshi, and J. G. Andrews, “Massive MIMO channel
physical layer,” IEEE Trans. Cogn. Comm. & Networking, vol. 3, no. 4, estimation with an untrained deep neural network,” IEEE Trans. Wireless
pp. 563–575, Oct. 2017. Commun., vol. 19, no. 3, pp. 2079–2090, Jan. 2020.
[15] S. Dörner, S. Cammerer, J. Hoydis, and S. ten Brink, “Deep learning [38] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a
based communication over the air,” IEEE J. Sel. Topics Signal Process- Gaussian denoiser: Residual learning of deep cnn for image denoising,”
ing, vol. 12, no. 1, pp. 132–143, Dec. 2018. IEEE Trans. Image Process., vol. 26, no. 7, pp. 3142–3155, Feb. 2017.
[16] F. A. Aoudia and J. Hoydis, “Model-free training of end-to-end com- [39] C. Tian, Y. Xu, Z. Li, W. Zuo, L. Fei, and H. Liu, “Attention-guided cnn
munication systems,” IEEE J. Sel. Areas Commun., vol. 37, no. 11, pp. for image denoising,” Neural Netw., vol. 124, pp. 117–129, Apr. 2020.
2503–2516, Aug. 2019. [40] R. Dorrance, F. Ren, and D. Marković, “A scalable sparse matrix-vector
[17] H. Ye, L. Liang, G. Y. Li, and B.-H. Juang, “Deep learning-based end-to- multiplication kernel for energy-efficient sparse-blas on fpgas,” in Proc.
end wireless communication systems with conditional gans as unknown ACM/SIGDA Int’l sym. Field-programmable gate arrays, Feb. 2014, pp.
channels,” IEEE Trans. Wireless Commun., vol. 19, no. 5, pp. 3133– 161–170.
3143, Feb. 2020. [41] L. Zhuo and V. K. Prasanna, “Sparse matrix-vector multiplication on
[18] E. Bourtsoulatze, D. B. Kurka, and D. Gündüz, “Deep joint source- fpgas,” in Proc. ACM/SIGDA Int’l sym. Field-programmable gate arrays,
channel coding for wireless image transmission,” IEEE Trans. Cogn. Feb. 2005, pp. 63–74.
Commun. Netw., vol. 5, no. 3, pp. 567–579, May 2019. [42] Y. Bengio, N. Léonard, and A. Courville, “Estimating or propagating
[19] C. Lee, J. Lin, P. Chen, and Y. Chang, “Deep learning-constructed joint gradients through stochastic neurons for conditional computation,”
transmission-recognition for internet of things,” IEEE Access, vol. 7, pp. arXiv:1308.3432, 2013. [Online]. Available: http://arxiv.org/abs/1308.
76 547–76 561, Jun. 2019. 3432
[20] M. Jankowski, D. Gündüz, and K. Mikolajczyk, “Joint device-edge in- [43] P. Koehn, “Europarl: A parallel corpus for statistical machine transla-
ference over wireless links with pruning,” in Prob. IEEE Int’l Workshop tion,” in MT Summit, vol. 5, 2005, pp. 79–86.
Signal Process. Advances Wireless Commun. (SPAWC), Atlanta, GA, [44] I. S. Reed and G. Solomon, “Polynomial codes over certain finite fields,”
USA, Aug. 2020, pp. 1–5. J. Society Industrial Applied Math., vol. 8, no. 2, pp. 300–304, Jan. 1960.
[21] N. Farsad, M. Rao, and A. Goldsmith, “Deep learning for joint source- [45] K. Papineni, S. Roukos, T. Ward, and W. Zhu, “Bleu: a method for
channel coding of text,” in Proc. IEEE Int’l. Conf. Acoustics Speech automatic evaluation of machine translation,” in Proc. Annual Meeting
Signal Process. (ICASSP), Calgary, AB, Canada, Apr. 2018, pp. 2326– Assoc. Comput. Linguistics (ACL), Philadelphia, PA, USA, Jul. 2002,
2330. pp. 311–318.
[22] H. Xie, Z. Qin, G. Y. Li, and B.-H. Juang, “Deep learning enabled
semantic communication systems,” arXiv:2006.10685, 2020. [Online].
Available: https://arxiv.org/abs/2006.10685
[23] E. L. Denton, W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus,
“Exploiting linear structure within convolutional networks for efficient
evaluation,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), Montreal,
Quebec, Canada, Dec. 2014, pp. 1269–1277.
[24] S. Han, J. Pool, J. Tran, and W. Dally, “Learning both weights and
connections for efficient neural network,” in Proc. Adv. Neural Inf.
Process. Syst. (NIPS), Montreal, Quebec, Canada, Dec. 2015, pp. 1135–
1143.
[25] Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan, and C. Zhang, “Learning
efficient convolutional networks through network slimming,” in Proc.
IEEE Int’l. Conf. on Comput. Vis. (ICCV), Venice, Italy, Oct. 2017, pp.
2755–2763.
[26] H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning
filters for efficient convnets,” in Proc. IEEE Int’l. Conf. on Learning
Representations (ICLR), Toulon, France, Apr. 2017.
[27] R. Krishnamoorthi, “Quantizing deep convolutional networks for
efficient inference: A whitepaper,” arXiv:1806.08342, 2018. [Online].
Available: http://arxiv.org/abs/1806.08342
[28] Y. Gong, L. Liu, M. Yang, and L. Bourdev, “Compressing deep
convolutional networks using vector quantization,” arXiv:1412.6115,
2014. [Online]. Available: http://arxiv.org/abs/1412.6115
[29] A. Zhou, A. Yao, Y. Guo, L. Xu, and Y. Chen, “Incremental network
quantization: Towards lossless cnns with low-precision weights,” in
Proc. IEEE Int’l. Conf. on Learning Representations (ICLR), Toulon,
France, Apr. 24-26, 2017.
[30] F. Li, B. Zhang, and B. Liu, “Ternary weight networks,”
arXiv:1605.04711, 2016. [Online]. Available: http://arxiv.org/abs/1605.
04711
[31] B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. G. Howard, H. Adam,
and D. Kalenichenko, “Quantization and training of neural networks for
efficient integer-arithmetic-only inference,” in Proc. IEEE Conf. Comput.
Vis. Pattern Recognit. (CVPR), Salt Lake City, UT, USA, Jun. 2018, pp.
2704–2713.
[32] J. Guo, J. Wang, C.-K. Wen, S. Jin, and G. Y. Li, “Compression and
acceleration of neural networks for communications,” IEEE Wireless
Commun., vol. 27, no. 4, pp. 110–117, July 2020.
[33] D. Gil, A. Ferrández, H. Mora-Mora, and J. Peral, “Internet of Things: A
review of surveys based on context aware intelligent services,” Sensors,
vol. 16, no. 7, p. 1069, Jul. 2016.
[34] “IEEE standard for floating-point arithmetic,” IEEE Std 754-2008, pp.
1–70, 2008.
[35] B. Zhu, J. Wang, L. He, and J. Song, “Joint transceiver optimization for
wireless communication phy using neural network,” IEEE J. Sel. Areas
Commun., vol. 37, no. 6, pp. 1364–1373, Mar. 2019.