Two End-To-End Quantum-Inspired Deep Neural Networks for Text Classification
Two End-To-End Quantum-Inspired Deep Neural Networks for Text Classification
Abstract—In linguistics, the uncertainty of context due to polysemy is widespread, which attracts much attention. Quantum-inspired
complex word embedding based on Hilbert space plays an important role in natural language processing (NLP), which fully leverages the
similarity between quantum states and word tokens. A word containing multiple meanings could correspond to a single quantum particle
which may exist in several possible states, and a sentence could be analogous to the quantum system where particles interfere with each
other. Motivated by quantum-inspired complex word embedding, interpretable complex-valued word embedding (ICWE) is proposed to
design two end-to-end quantum-inspired deep neural networks (ICWE-QNN and CICWE-QNN representing convolutional complex-
valued neural network based on ICWE) for binary text classification. They have the proven feasibility and effectiveness in the application of
NLP and can solve the problem of text information loss in CE-Mix [1] model caused by neglecting the important linguistic features of text,
since linguistic feature extraction is presented in our model with deep learning algorithms, in which gated recurrent unit (GRU) extracts the
sequence information of sentences, attention mechanism makes the model focus on important words in sentences and convolutional layer
captures the local features of projected matrix. The model ICWE-QNN can avoid random combination of word tokens and CICWE-QNN
fully considers textual features of the projected matrix. Experiments conducted on five benchmarking classification datasets demonstrate
our proposed models have higher accuracy than the compared traditional models including CaptionRep BOW, DictRep BOW and
Paragram-Phrase, and they also have great performance on F1-score. Eespecially, CICWE-QNN model has higher accuracy than the
quantum-inspired model CE-Mix as well for four datasets including SST, SUBJ, CR and MPQA. It is a meaningful and effictive exploration
to design quantum-inspired deep neural networks to promote the performance of text classification.
Index Terms—Complex-valued word embedding, text classification, deep neural network, deep learning, natural language processing
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
4336 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 4, APRIL 2023
TABLE 1
Notations
Notation Description
jfi Quantum superposition state
D Density matrix
M Projection matrix
M Complex conjugate of matrix M
My Hermitian conjugate of projection matrix M, M y ¼ M T
R Result matrix or projected matrix
Fig. 1. The similarity between quantum superposition state and polysemy. I Identity matrix
xt Word vector as the first input of GRU at time t.
ht1 Input vector contains the text information before time t
Java framework in another specific context corresponding to ht Hidden vector
the state j1i. In addition, a phrase, sentence or document
composed of words could be analogized to a quantum mixed
system containing multiple particles, which can be also rep- On the basis of ICWE, we design and construct the
resented by the form of a density matrix. In 2018, motivated first quantum-inspired deep neural network named
by the similarity of polysemy and the superposition of quan- ICWE-QNN for binary text classification. Compared
tum states [13], quantum-like and quantum-inspired models with the models completely based on quantum the-
like NNQLM [14] and complex neural network [1] were pro- ory such as CE-Sup and CE-Mix [1], it has better per-
posed to implement common tasks such as text classification formance on accuracy for classification task with
and question-answering [15], [16], which created a precedent injected sequence information.
for enhancing the model interpretability with quantum The second model called CICWE-QNN we propose
physics. Two end-to-end neural networks based on quantum is presented for more remarkable performance on
theory named CE-Sup and CE-Mix were proposed in Ref. [1] classification task by applying convolutional layer
to imply a combination of NLP with quantum computing, [22] on the projected matrix for considering the com-
which can obtain superior performance on accuracy in han- plete information and capturing local textual fea-
dling text classification task [10], [17] in comparison with tures, motivated by the extraction for the joint
several non-quantum models like DictRep BOW [18], since it representation of question-answer pairs [14].
can improve the model interpretability to a certain extent. Effectiveness of methods is further illustrated and verified
However, the sequence information among word-level in following parts, and the rest of our paper is organized as
tokens is ignored in their networks, where the density matrix follows: Related basic theory about quantum mechanics and
representation of the sentence-level text is just formed by the deep learning algorithms are introduced in Section 2. We
linear combination of several matrices for word representa- describe the details of complex-valued word embedding and
tion. It may affect the learning ability and accuracy for lack propose two quantum-inspired deep neural nerworks in
of consideration on the characteristics of human language. Section 3. Experimental results are shown in Section 4. In
Therefore, we involve deep learning methods including Section 5, we draw a conclusion on the achieved work and
RNN and GRU into the quantum theory-based neural net- introduce the prospect for future research.
work aiming to obtain the positional information of senten-
ces as textual features, which may enhance the learning
ability of the model on the basis of ensuring interpretability. 2 RELATED WORK
RNN which has excellent performance on dealing with In this section, we briefly review the related work about
sequential data such as sentence-level text composed of a NLP based on quantum theory or quantum-like theory. Ele-
series of words appearing in order, is applied to capturing mental notations used in our paper are shown in the Table 1
sequence information or textual feature. GRU as a variant of at the beginning of this section.
standard RNN [19], is employed for solving the long-term
dependency problems existing in the case of normal RNN
receiving and handling long length text data. Two end-to- 2.1 Preliminary
end quantum-inspired deep neural networks for text classi- In quantum mechanics, a single particle is often represented
fication are proposed in this paper, and experiments are by a superposition state jfi ¼ aj0i þ bj1i, where jaj2 þ
conducted on five benchmarking for binary text classifica- jbj2 ¼ 1, jaj2 (jbj2 ) denotes the probability for the state j0i
tion. The major work we have achieved can be summarized (j1i) with 0 jaj2 1; 0 jbj2 1, while a quantum physi-
as follows: cal system containing multiple particles can be represented
as a mixed state. In addition, quantum states including
A novel word embedding method named interpret- superposition state and mixed stated can be observed or
able complex-valued word embedding (ICWE) is determined as a concrete state by projection measurement,
proposed to improve the model interpretability of where a density matrix is exploited for representing a mixed
text classification. We specifically use GRU and self- state and a projection matrix is applied into projection
attention layer [20] to update the amplitude word measurement.
vectors for extracting more semantic features [21] and Density Matrix. In order to describe the amplitude and
position information, where the updated amplitude phase of quantum particles, a quantum superposition state
word vectors and phase vectors together form ICWE. can be denoted as follows:
SHI ET AL.: TWO END-TO-END QUANTUM-INSPIRED DEEP NEURAL NETWORKS FOR TEXT CLASSIFICATION 4337
Fig. 2. NNQLM-I. Density matrix including rq and ra could be obtained by the outer product of real word vectors and the composition of word matrices.
Then the similarity of question and answer pair could be calculated by the diagonal numbers of result matrix rq ra .
Fig. 3. NNQLM-II. It applies the convolution layer on the result matrix for extracting more abstract features on the basis of NNQLM-I.
y
of the observed system, Mm represents the Hermitian [24]
conjugate of Mm , and I stands for an identity matrix.
X
n
Activation Functions. Activation functions such as ReLU
jfi ¼ rj eicj jej i; (1)
and sigmoid are commonly used to enhance the nonlinear-
j¼1
ity of neural networks. In this paper, we exploit sigmoid
function described as s ¼ 1þe1x for activation and obtaining
where i means the imaginary unit, rj and cj stand for the
the output value between 0 and 1 in our work.
amplitude and phase of a single particle respectively, and
jej i represents the basic state from the Hilbert space. Then
2.2 Neural Network Based Quantum-Like Language
the density matrix representing a quantum physical system
Model (NNQLM)
can be described as
Neural Network based Quantum-like Language Model
X
m (NNQLM) as a cornerstone of combination for NLP and
D¼ pi jfi ihfi j; (2) quantum mechanics theory has been proposed for research-
i¼1 ing the linguistic subtask—question answering [25]. Ques-
Pm tion answering as a basic task in NLP is aimed at selecting
where pi satisfying i¼1 pi ¼ 1 represents the probability the most accurate answer of the proposed problem from the
property, jfi i is the superposition state in Eq. (1) and hfi j is candidates. The study of NNQLM for applying quantum
the transpose of jfi i. theory into NLP for solving the fundamental task provides
Projection Measurement. According to Gleason’s theory an original perspective. In Ref. [14], Zhang et al. build a close
[23], a matrix R as the measured result can be obtained with connection between quantum particles and word tokens by
the equation R ¼ DM, where D and M stand for the density expressing the text sentence including question answer
and projection matrix respectively. Projection P matrix can be pairs as density matrices like rq and ra which are embedded
y
described as Mm ¼ jxihxj, where Mm satisfies m Mm Mm ¼ into the end-to-end neural networks NNQLM-I (Fig. 2) and
I, jxi comes from the orthonormal basis states in the space NNQLM-II (Fig. 3). The most significant difference between
4338 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 4, APRIL 2023
Fig. 4. CE-Sup. Complex word vectors representing amplitude and phase are mapped through the complex-valued word embedding vocabulary.
Classificaiton label as a predictor for input sentence could be obtained by the diagonal values of projected matrix.
Fig. 5. CE-Mix. Different from CE-Sup, CE-Mix adopts the manner of mixed state for representing sentence density matrix.
the two models is that NNQLM-II applies the CNN on the CE-Sup. According to the concept of superposition state in
result matrix for local feature extraction [26]. With the assis- quantum mechanics, CE-Sup establishes the density matrix
tance of CNN, NNQLM-II performs better than NNQLM-I Dsup ¼ jSihSj which is the outer product of a sentenve-level
on two benchmarking QA datasets including TREC-QA and vector generated
Pm by the linear combination of word vectors,
WIKIQA. The study of NNQLM is novel and oriented, but i jti i Pm
where jSi ¼ Pi¼1m ð i¼1 i ¼ 1Þ .
it only depends on quantum-like theory and cannot repre- k ht jk
i¼1 i i
sent the complete application of quantum theory in NLP. CE-Mix. The density matrix Dmix of CE-Mix is con-
Therefore, it is necessary to apply the complex-valued word structed on the basis of mixed state instead, whose building
embedding for representing the amplitude and phase of method is described in Eq. (2).
quantum particles for sake of truly simulating the quantum Experiments show that CE-Mix performs better than CE-
states. Sup on the classification task. Specifically in CE-Mix, projec-
tion matrix initialized from the Hilbert space [11] is used for
2.3 Complex Embedding Network for Text measuring the sentence density matrix viewed as a mixed
Classification state and determining the polarity of the sentence text,
Compared with NNQLM, complex embedding networks which describes the overall mathematical structure of CE-
including Complex Embedding Superposition (CE-Sup) Mix model shown in Fig. 5. CE-Mix completely fits quan-
and Complex Embedding Mixture (CE-Mix) [1] are pro- tum theory by applying complex word vectors for simulat-
posed on the basis of complex word vectors for text classifi- ing quantum states and performs better than CE-Sup which
cation. The two models preserve the quantum properties maybe result from the construction of sentence density
with the complex-vauled word vectors where the real and matrix. However, there are still two defects existing in CE-
imaginary parts correspond to the amplitude and phase of Mix:
quantum particles respectively [1]. Therefore, the construc- Firstly, there is no feature extraction before constructing
tion of density matrix for sentence-level text is also based on the complex density matrix for input sentence where the
complex word vectors. From Figs. 4 and 5, we can observe sentence density matrix is just the sum of several word
the unique difference between the two models is the con- matrices and it is lack of the sequence information of lan-
struction of sentence density matrix. guage. We can use standard or variant RNN to capture
SHI ET AL.: TWO END-TO-END QUANTUM-INSPIRED DEEP NEURAL NETWORKS FOR TEXT CLASSIFICATION 4339
In this section, we present the interpretable complex-val- rt ¼ s ðWir xt þ bir þ Whr ht1 þ bhr Þ; (3)
ued word embedding method based on quantum states
zt ¼ s ðWiz xt þ biz þ Whz ht1 þ bhz Þ; (4)
and propose two end-to-end quantum-inspired deep neural
networks. nt ¼ tanhðWin xt þ bin þ rt ðWhn ht1 þ bhn ÞÞ; (5)
ht ¼ ð1 zt Þ nt þ zt ht1 ; (6)
3.1 Interpretable Complex-Valued Word Embedding
(ICWE) where rt and zt in Eqs. (3,4) corresponding to reset and
ICWE is a quantum-inspired complex-valued word embed- update gates respectively serve as the major functional
ding method based on GRU-attention manner, which can units, Wir ; Wiz ; Win 2 Rhd ; Whr ; Whz ; Whn 2 Rhh are train-
dramatically enhance the ability of feature extraction of neu- able weight matrices (h notes hidden size, and d notes
ral network models on the basis of guaranteeing the model embedding size), and bir ; biz ; bin ; bhr ; bhz ; bhn 2 Rh are train-
interpretability. Word embedding techniques are accepted able bias vectors. s and tanh represent different activation
methods used in NLP tasks [30]. Simultaneously, amplitude functions. stands for the hadamard product. From Eq. (4),
and phase are both required to describe quantum states in we know that ht , the updated word vecotr at time t, com-
quantum mechanics. We adopt two embedding layers where bines the previous information ht1 with the current infor-
one is used for generating amplitude vectors for representing mation nt from the reset gate rt . Thus we can use the
4340 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 4, APRIL 2023
Fig. 9. The structure of ICWE-QNN. GRU and self-attention layers are targeted at extracting semantic features together. The projected matrix DM is
the result matrix made up of double matrixes.
SHI ET AL.: TWO END-TO-END QUANTUM-INSPIRED DEEP NEURAL NETWORKS FOR TEXT CLASSIFICATION 4341
Ci ¼ s ðw xi þ bÞ: (7)
Different from the normal fully connected layer, convolution
layer needs much fewer parameters in the case of obtaining
the same amount of output units, which contributes to sup-
pressing overfitting and upgrading training speed.
In our opinion, it could make the model more robust and
Fig. 10. Extraction for local textual features of projected matrix with CNN. achieve more remarkable performance on text classification
by extracting more textual features. We adopt CNN with
strong ability of feature extraction, where we use 3 3 con-
sentence representation. Therefore, we make the best of
volutional kernel and pooling operation after projected
strength of the both deep learning techniques and exploit
matrix shown in Fig. 10 and propose the novel model in the
GRU-attention manner for extracting linguistic features of
Section 3.3.2.
sentences together in the complex-valued neural network.
Two max-pooling layer for obtaining main features of each
column can be conducted on the real part and imaginary part 3.3.2 Convolutional Complex-Valued Neural Network
of projected matrix respectively. Thus the concatenate vector Another model proposed in this paper called Convolutional
composed of real-imaginary part information of the projected Complex-valued Neural Network based on ICWE (CICWE-
matrix can be fed into a two-layer perceptron regarded as a QNN) can further improve the performance of ICWE-QNN
sentence classifier for computing the classified label. on classification task. 2-D convolutional structure in CICWE-
QNN is exploited for capturing the local features of the pro-
jected matrix, where non-diagonal values of matrix are
3.3 Convolutional Complex-Valued Neural Network involved into calculation, leading to fully consideration of
Based on ICWE (CICWE-QNN) real-part and imaginary-part textual features of quantum
In this section, we introduce the second proposed model mixed system. Compared with CE-Mix, our proposed model
Convolutional Complex-valued Neural Network based on considers non-diagonal values of projected matrix and there-
ICWE (CICWE-QNN) with convolutional structure captur- fore decreases the loss of text information. We verify the
ing local textual features of projected matrix. superior performance of CICWE-QNN in Section 4.
A series of operations including convolution on the pro-
3.3.1 Feature Extraction for Projected Matrix jected matrix are shown in Fig. 11, where two 2-D convolution
kernels are employed side by side in the original model. The
In order to solve the problem of text-related information loss
result matrix in fact consists of real-part and imaginary-part
in CE-Mix model caused by neglecting the non-diagonal ele-
matrices. Finally, the same follow-up layers as ICWE-QNN
ment of the projected matrix, we try to collect the ignored
including max-pooling and two-layer fully connected layers
useful text features [33] of projected matrix as much as possi-
are implemented on double obtained feature, mapping from
ble by involving convolutional structure in our model.
convolutional layer. 2-D CNN is applied rather than 1-D con-
Convolutional Structure. Convolutional structure has been
volutional structure for textual information extraction, since
applied in a large number of research fields of Artificial
the word information is encoded by a word-level density
Intelligence (AI) and deep algorithm models such as VGG
matrix which is an outer product of 1-D tensor in our work,
[34] and ResNet [35]. Afterwards, various convolution-
leading to 2-dimensional distribution of word features on pro-
based modles have been universally exploited for feature
jected matrix. Therefore, it is more reasonable to adopt 2-D
extraction and achieved remarkable experimental results in
convolutional kernel as local structural part of CICWE-QNN
AI field, especially in computer vision due to the unique
in our work compared with commonly employing 1-D CNN
and strong computation characteristics including parameter
for capturing semantic features of word vectors.
sharing and sparsity of connections shown in Eq. (7) which
stands for the computation in convolutional layer. It demon-
strates that the convolutional structure is essentially a spe- 4 EXPERIMENT AND DISCUSSION
cial fully connected layer, where w and b still represent the We carry out experiments on two proposed models includ-
related trainable weights. i.e., ing ICWE-QNN and CICWE-QNN designed in pytorch
TABLE 2 TABLE 4
Dataset Information After Split as Train, F1 Score of Two Proposed Models on Five
Validation and Test Data Benchmarking Datasets
Dataset Train Validation Test Total Labels Model MR SST SUBJ CR MPQA
MR 8530 1065 1067 10662 pos=neg ICWE-QNN 76.6 83.9 90.9 86.5 76.0
SST 67349 872 1821 70042 pos=neg CICWE-QNN 76.1 83.6 92.0 86.7 78.2
SUBJ 8000 1000 1000 10000 sub=obj
CR 3024 364 384 3772 pos=neg The best score of each dataset is in bold.
MPQA 8496 1035 1072 10603 pos=neg
are used to predict positive or negative sentences while Sub-
Numbers in the table indicate the size of each set. All of the datasets except SST jectivity dataset (SUBJ) [36] includes subjective or objective
are divided into 8:1:1.
sentences. For model training, we adopt binary cross
entropy as the loss function and Adam as the optimizer for
TABLE 3 back propagation. Pre-training Glove word vectors [39]
Accuracy of Seven Models on Five Benchmarking Datasets
whose dimension is selected as 100 are used for initial
Model MR SST SUBJ CR MPQA parameters of the embedding layer.
CaptionRep BOW 61.9 - 77.4 69.3 70.8
DictRep BOW 76.7 - 90.7 78.7 87.2 4.2 Results and Comparisons
Paragram-Phrase - 79.7 - - - We compare both the proposed models with the quantum
CE-Sup 78.4 82.6 92.6 80.0 85.7 theory-based models including CE-Sup and CE-Mix,
CE-Mix 79.8 83.3 92.8 81.1 86.6 between which there are different construction methods for
ICWE-QNN 78.6 84.2 92.6 82.6 86.8 a density matrix representing of the input sentence.
CICWE-QNN 78.3 85.0 93.2 83.3 87.2 Our models ICWE-QNN and CICWE-QNN are both pro-
posed on the basis of density matrix in the form of mixed
The best score of each dataset is in bold. state as CE-Mix outperforms CE-Sup in all of the datasets.
From Table 3 about accuracy, we observe that for the case of
framework 1.4 with RTX 2080 Ti, and verify the superior
first proposed model, ICWE-QNN surpasses CE-Mix in
performance of proposed models in comparison with the
three datasets, which proves that sentence-level density
complex neural network models proposed in Ref. [1]. Accu-
matrix integrated with positional information provided by
racy as the common metrics in classification tasks is ana-
GRU can help improve the performance of CE-Mix. How-
lyzed for evaluating the comparative models. We directly
ever, ICWE-QNN is inferior to CE-Mix in other two data-
choose the experimental results of CE-Sup, CE-Mix and sev-
sets, resulting from underutilizing the textual features of
eral genaral supervised learning models shown in Ref. [1]
projected matrix. Thus we apply convolution layer full of
for a comparison.
strong extraction ability into ICWE-QNN for obtaining
information maps, and the generated model CICWE-QNN
4.1 Datasets and Settings obtains superior performance in four out of five datasets in
Five binary classification (2-class) benchmarking datasets comparison with CE-Mix and two traditional supervised
are carefully selected and preprocessed for validating the learning models CaptionRep BOW and DictRep BOW,
performance of proposed models. Concrete information which verifies the effectiveness of convolutional layer for
about datasets we choose is shown in Table 2, where the textual feature extraction. In a word, ICWE-QNN and
division proportion of each dataset for training, validation CICWE-QNN validate that it is feasible to combine deep
and testing data is same as Ref. [1] for a fair comparison. learning algorithms with quantum theory for completing
There exist two categories of datasets used in the experi- text classification task. We guarantee the both models are
mental phase, where Movie Review dataset (MR) [36], Stan- designed and implemented in an identical progress and the
ford Sentiment Treebank dataset (SST), Customer Review experimental results are reliable and valid. In the end, we
dataset (CR) [37] and Opinion polarity dataset (MPQA) [38] show the F1 score values of two proposed models in Table 4
Fig. 12. F1 Score of two end-to-end classification models on three benchmarking datasets.
SHI ET AL.: TWO END-TO-END QUANTUM-INSPIRED DEEP NEURAL NETWORKS FOR TEXT CLASSIFICATION 4343
TABLE 5
Recall of two Proposed Models on Five Benchmarking Datasets
[18] F. Hill, K. Cho, A. Korhonen, and Y. Bengio, “Learning to under- [41] M. Wang, W. Fu, X. He, S. Hao, and X. Wu, “A survey on large-
stand phrases by embedding the dictionary,” Trans. Assoc. Com- scale machine learning,” IEEE Trans. Knowl. and Data Eng., early
put. Linguistics, vol. 4, pp. 17–30, 2016. access, Aug. 11, 2020, doi: 10.1109/TKDE.2020.3015777.
[19] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evalua-
tion of gated recurrent neural networks on sequence modeling,” Jinjing Shi received the BS and PhD degrees
2014, arXiv:1412.3555. from the School of Information Science and Engi-
[20] A. Vaswani et al., “Attention is all you need,” in Proc. Adv. Neural neering, Central South University, Changsha,
Inform. Process. Syst., 2017, pp. 5998–6008. China, in 2008 and 2013, respectively. She is cur-
[21] K. S. Tai, R. Socher, and C. D. Manning, “Improved semantic rep- rently an associate professor with the School of
resentations from tree-structured long short-term memory Computer Science and Engineering, Central
networks,” 2015, arXiv:1503.00075. South University. She was selected in the Shen-
[22] S. Lai, L. Xu, K. Liu, and J. Zhao, “Recurrent convolutional neural ghua lieying talent program of Central South Uni-
networks for text classification,” in Proc. 29th AAAI Conf. Artif. versity and Special Foundation for Distinguished
Intell., 2015, pp. 2267–2273. Young Scientists of Changsha in 2013 and 2019,
[23] A. M. Gleason, “Measures on the closed subspaces of a hilbert respectively. Her research interests include qua-
space,” J. Math. Mech., vol. 6, no. 6, pp. 885–893, 1957. ntum computation and quantum cryptography. She has presided over
[24] R. De Wolf , “Quantum computing: Lecture notes,” 2019, the National Natural Science Foundation Project of China and that of
arXiv:1907.09415. Hunan Province. There are 50 academic papers published in important
[25] M. Tan, C. Dos Santos , B. Xiang, and B. Zhou, “Improved repre- international academic journals and conferences. She was the recipient
sentation learning for question answer matching,” in Proc. 54th of the second prize of natural science and the outstanding doctoral dis-
Annu. Meeting Assoc. Comput. Linguistics, 2016, pp. 464–473. sertation of Hunan Province in 2015 and the Best Paper Award in the
[26] X. Zhu, S. Zhang, R. Hu, Y. Zhu, and J. Song, “Local and global struc- international academic conference MSPT2011 and Outstanding Paper
ture preservation for robust unsupervised spectral feature selection,” Award in IEEE ICACT2012.
IEEE Trans. Knowl. Data Eng., vol. 30, no. 3, pp. 517–529, Mar. 2018.
[27] Y. Zhang et al., “CFN: A complex-valued fuzzy network for sar-
casm detection in conversations,” IEEE Trans. Fuzzy Syst., early
access, Apr. 12, 2021, doi: 10.1109/TFUZZ.2021.3072492.
[28] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre- Zhenhuan Li received the BS degree from Xiang-
training of deep bidirectional transformers for language under- tan University. He is currently working toward the
standing,” 2018, arXiv:1810.04805. master’s degree with the School of Computer Sci-
[29] R. A. Potamias, G. Siolas, and A.-G. Stafylopatis, “A transformer- ence and Engineering, Central South University,
based approach to irony and sarcasm detection,” Neural Comput. Hunan, China. His research interests mainly inclu-
Appl., vol. 32, no. 23, pp. 17 309–17 320, 2020. de quantum computing, deep learning, and quan-
[30] X. Mao, S. Chang, J. Shi, F. Li, and R. Shi, “Sentiment-aware word tum neural networks.
embedding for emotion classification,” Appl. Sci., vol. 9, no. 7,
2019, Art. no. 1334.
[31] S. Hochreiter and J. Schmidhuber, “Long short-term memory,”
Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
[32] R. Rana, “Gated recurrent unit (GRU) for emotion classification
from noisy speech,” 2016, arXiv:1612.07778.
[33] S. Zhang, Z. Qin, C. X. Ling, and S. Sheng, “” Missing is useful”: Wei Lai received the BS degree from the East
Missing values in cost-sensitive decision trees,” IEEE Trans. China University of Technology. He is currently
Knowl. Data Eng., vol. 17, no. 12, pp. 1689–1693, Dec. 2005. working toward the master’s degree with the
[34] K. Simonyan and A. Zisserman, “Very deep convolutional net- School of Computer Science and Engineering,
works for large-scale image recognition,” 2014, arXiv:1409.1556. Central South University, Hunan, China. His resea-
[35] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for rch interests mainly include quantum computing,
image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog- deep learning, and quantum neural networks.
nit., 2016, pp. 770–778.
[36] B. Pang and L. Lee, “Seeing stars: Exploiting class relationships for
sentiment categorization with respect to rating scales,” 2005,
arXiv:cs/0506075.
[37] M. Hu and B. Liu, “Mining and summarizing customer reviews,”
in Proc. ACM SIGKDD Int. Conf.Knowl. Discov. Data Mining, 2004,
pp. 168–177. Fangfang Li received the MS degree in geo-
[38] J. Wiebe, T. Wilson, and C. Cardie, “Annotating expressions of graphic information system and the PhD degree in
opinions and emotions in language,” Lang. Resour. Eval., vol. 39, photogrammetry and remote sensing from Wuhan
no. 2, pp. 165–210, 2005. University, China, in 2006 and 2009, respectively.
[39] J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vec- Her research interests include machine learning
tors for word representation,” in Proc. Conf. Empirical Methods Nat. and textual mining.
Lang. Process., 2014, pp. 1532–1543.
[40] W. Meng, W. Fu, S. Hao, H. Liu, and X. Wu, “Learning on big
graph: Label inference and regularization with anchor hierarchy,”
IEEE Trans. Data Eng., vol. 29, no. 5, pp. 1101–1114, May 2017.
SHI ET AL.: TWO END-TO-END QUANTUM-INSPIRED DEEP NEURAL NETWORKS FOR TEXT CLASSIFICATION 4345
Ronghua Shi He received the BS, MS, and PhD Shichao Zhang (Senior Member, IEEE) received
degrees in computer application technology from the PhD degree in computer science from Deakin
Central South University (CSU) in 1986, 1989, and University, Australia. He is currently a China
2007, respectively. He is currently the supervisor of National-Title professor with the School of Com-
Ph.D. students, and the team leader of the commu- puter Science and Technology, Central South
nication system and network security group with University, China. His research interests include
the School of Computer Science and Engineering, information quality and pattern discovery. He
Central South University, Changsha 410083, was/is an associate editor for the ACM Transac-
China. He is also the executive director of Railways tions on Knowledge Discovery from Data, IEEE
Specialty Committee, chairman of Hunan Internet Transactions on Knowledge and Data Engineer-
of Things Committee, vice-chairman of the Hunan ing, Knowledge and Information Systems, and
Higher Education Computer Society Professional Committee, and the theIEEE Intelligent Informatics Bulletin. He is a senior member of the
executive director of Provincial Communication Society . He has received IEEE Computer Society and a member of the ACM.
the State Council Special Allowance. His professional field covers com-
puter science and technology, information and communication engineer-
ing, etc., He has authored or coauthored more than 80 articles in domestic " For more information on this or any other computing topic,
and foreign academic journals, which include about 40 SCI or EI articles. please visit our Digital Library at www.computer.org/csdl.
His research interests include network and information security, quantum
cryptography, quantum secure communications, etc. His teaching curricu-
lums contain Special Topics of information security technology, modern
cryptography theory, and application, computer network communications,
introductions to information science, etc. In recent years, he has hosted
more than 10 research projects such as the National 863 Project, Natural
Science Foundation Project, and the Ministry of Education Doctoral Fund
Project, etc. He was the recipient of the two Provincial and Ministerial
Appraisal projects, two projects for the Second Prize of Provincial Science
and Technology Progress and one project for that of Third Prize, and the
two articles for the Second Prize of Outstanding Papers of Natural Science
in Hunan Province.