Deep Finesse Network Model With Multichannel Syntactic and Contextual Features For Target-Specific Sentiment Classification
Deep Finesse Network Model With Multichannel Syntactic and Contextual Features For Target-Specific Sentiment Classification
https://doi.org/10.1007/s10489-021-02692-w
Abstract
Target-specific sentiment classification has a dependency over the target term extraction. The majority of current studies in
sentiment classification tasks do not utilize the complete linguistic and sentiment knowledge. Consequently, strenuous efforts are
to be made for expressing the implications of each word from the sentences, which have significant amount of contextual
dependencies. Hence, it leads to the problems like loss of semantics, missing of context-dependent information and also results
in poor classification of the models. In this paper, we propose a Deep Finesse Network (DFN) to address these limitations and
enhance the accuracy. The DFN employs a multichannel paradigm to exploit multi-grained sentiment features by leveraging the
existing linguistic and sentiment knowledge more effectively without any human involvement. In each channel, the model firstly
extracts the local features from the multi-grained sentiment features and then captures the global and spatial information of the
identified local features. Secondly, it directly models the contextual relationships with enriched semantic information from the
global features. Subsequently, the intra-sequence relations were also modeled among the contextual features to identify the target
features in order to understand and predict the sentiments of identified contextual features. Finally, the effectiveness of the DFN is
also evaluated on different datasets. The results proved that DFN outperforms all the current and advanced state-of-art models in
classification accuracy in most cases.
Keywords Multichannel features . Contextual features . Semantic and syntactic features . Deep learning . Deep finesse network .
Target-specific-sentiment classification
(SSC) by determining the polarity of each aspect instead of the sequence modeling task. In the present scenario, the most
entire sentence. The task of TSC includes two stages: Target significant information from the input text is identified and
Extraction (TE) and SC. Both the target and contextual fea- captured by the attention mechanisms. However, the same
tures are extracted by identifying the syntactic and semantic features may exhibit various levels of sentiments at various
relations of each aspect appearing in sentences. Later, SC positions at different subspaces. Therefore, all the information
classifies the sentiments based on these targets and contextual shapes up the long-term semantics of the entire sequence of
aspects. The main drawback of the traditional SA system is the input. However, if the input text is beyond the vector
SC. Many of the conventional SC models focus on the Rules- length, the loss of semantics arises, which leads to the misap-
based(RL) and ML methods. The RL methods entail human prehension of the text. In this study, we aimed to concentrate
intervention by effectively utilizing various lexicons, patterns, on the aforementioned problems as described below:
and statistical features with sentiment information to perform
SC on text data [11]. SA is a classification problem, while the a) We proposed a multitasking DL framework called “Deep
ML methods construct a classifier by labeling the training part Finesse Networks” (DFN) to tackle the shortcomings of
of data manually and then learns it into a classifier by both TSA and TSC tasks. To the best of our knowledge,
extracting patterns determined from the trained data set. we are the first to explore the DFN architecture.
Then the classifier classifies and predicts the class of the test b) We modeled the existing sentiment and linguistic re-
data set constituting of unidentified tags. However, these sources into the TSA task. Next, we shaped multiple
models depend on compound feature engineering and the channel inputs through SA by surpassing the contextual
characteristics of the annotated data [12].With the advance- and target vectors of the input sentence with semantic
ment in DL methods, neural network structures have become a feature vector, contextual feature vector, and syntactic
major part of SA. Compared to all the conventional ML feature vectors.
models, the DL models perform extremely well in SA as it c) We combined the entire sentiment target, contextual fea-
does not involve syntax analysis and construction of sentiment tures, and learned into the proposed DFN model to per-
dictionaries. A DL model produces higher classification accu- form the task of TSC. We also evaluated the proposed
racy when the training part of the data set attains an assured model on several benchmark datasets and achieved con-
level of generalization ability. In the field of SA, Recurrent siderable results by outperforming various models discov-
Neural Networks (RNN) and Convolution Neural Networks ered from the literature.
(CNN) [13, 14] are the popularly used DL methods. CNN d) We realized that modeling both syntactic and contextual
automatically learns the high dimension features among the knowledge can improve the result of TSC. Therefore, our
neighboring words based on the filter moves for the sentence model achieves greater level of efficiency by extracting
of vectors. But it cannot attain syntactic and semantic relation- and classifying the important sentiment target features
ship due to the poor capturing ability of long-term sequences; based on identifying both the syntactic and semantic re-
this is the only limitation of CNN. In contrast, RNN conserves lations by the effective utilization of existing linguistic
the order of long-term sequences by capturing the contextual and sentiment resources.
semantic relationships that can cover the memory content to
the present development in DL. However, it may cause a The rest of the manuscript is segmented as follows:
problem of gradient vanishing while dealing with long-term Section-2 presents an extensive literature survey of the latest
sequences and such a problem can be solved by two variants: trends in DL for ASA. Section-3 focuses on the proposed
Gated-Recurrent Unit (GRU) and Long Short-Term memory methodology. Section-4 explains about the experimental anal-
(LSTM) Networks [15–17]. Even though these methods attain ysis. Section-5 presents the results of the proposed method.
precision in the SC, there still exist some concerns regarding Section-6 presents the discussions. Section-7 presents the con-
performance enhancement of the method to perform SC task. clusion and future scope of this work.
At present, there are numerous lexical resources with dif-
ferent kinds of sentiment words, semantic vocabulary, degree
adverbs, etc. Such resources play an important role in conven- 2 Related work
tional SC models. Till date, LSTM and CNN models have not
used such unique sentiment information effectively in the There exists an extensive literature on the topic of SA. Several
Target-level SC. In addition, many of the current studies have studies have explored the effects of performing aspect-level
partially used sentiment lexicons for the task of SA. sentiment classification (ASC) tasks. Generally, traditional
Moreover, it is more complex to demonstrate the significance models for ASC train an ML model with handcrafted features
of each word from sentences with larger dependency. Hence, like lexical aspects, parse aspects, and bag-of-words aspects.
this may result in loss of context-dependent information and Yet, the efficacy of these models depends on the quality of
poor classification of the models. SC is observed as a aspects, and hence, they require an arduous effort to obtain
Deep finesse network model with multichannel syntactic and contextual features for target-specific...
those aspects. Similar work has been proposed in this paper architecture. Hence, the model learns more conceptual
for performing TSC. Many studies on SA have used recent contextual aspects and consistently dominates the classi-
novel DL models [18, 19], ConvLSTMConv [20] and fication performance of various state-of-art models.
Capsule Networks [21], which have attracted particular inter- Therefore, the proposed ReMemNN [24] is an end-to-
est in recent days. In the earlier studies, the main challenge of end network that can tackle the limitations of weak inter-
using TSC is modeling the semantic relations of the contextual action in attention mechanisms among particular aspects
target aspects from a sentence. This is because the diverse and contextual words. A multi-element attention mecha-
context features from the input sentences contain diverse ef- nism was developed to produce a more accurate sentiment
fects on shaping the sentiment polarity towards the target fea- representation of aspects by obtaining dominant attention
ture. Hence, it is enviable to incorporate the connections be- weights and then stored in an explicit memory module.
tween the context word and target word while developing a Moreover, the weakness of pre-trained embeddings was
learning system. Authors [20] proposed two target-dependent also addressed by designing an embedding adjustment
LSTM networks for capturing the sentiment information au- module. On experimenting using various English data
tomatically from the prior and subsequent contexts of an as- sets, ReMemNN was found to be language-independent
pect. They also utilized a standard back propagation algorithm and efficient than other baseline models. Now-a-days,
to train the model in an end-to-end way and then performed most of the studies focus on SA by identifying the explicit
SC on the benchmark data set. This model enhanced the clas- sentiments because they are conveyed either implicitly or
sification performance and predicted the sentiment polarity of explicitly. Explicit sentiments have gained considerable
each aspect more effectively than the other baseline models. attention and attained significant results from both indus-
Recently, several studies have identified that the DL try and academia [25]. Implicit sentiments are still a chal-
models automatically capture the significant aspects from lenge due to the lack of explicit sentiment words. In this
the input for ASC. However, most of these models only situation, the proposedBi-LSTM with multi-polarity or-
judge the target information rather than the aspect infor- thogonal attention [26] can tackle the problems of implicit
mation. In contrast, AT-LSTM and ATAE-LSTM [21] SA. When compared to the traditional attention mecha-
consider both target and aspect information by using an nism, Bi-LSTM identifies the important features with sen-
attention mechanism. The SC model with an attention timent information for implicit SA. Aspect-embedding
mechanism mainly focuses on an important part of a sen- hasbeen widely used for characterizing the aspect-
tence and improves the task of ASC. The experimental categories on ASA task. However, they fail to represent
results revealed that both AT-LSTM and ATAE-LSTM the relation between aspect-terms and aspect-categories.
outperform the other baseline models by effectively dis- This problem can be effectively improved by developing
covering the relations between the sentiment aspect and AAE method [27]. This method successfully tackles the
the context of a sentence for ASC. Ma et al., [22] misalignment constraint in aspect embedding by closely
discussed that both context and target features should be representing the aspect-categories to their connected
learned by their representations interactively and treated aspect-terms in the subspace. The result stated that the
uniquely. Thus, they proposed the IAN model to identify AAE method can enhance the performance of ASA task.
the interaction among the aspect and context features. Similarly, the proposed HieNN-DWE [28] can handle the
IAN feeds both the target and contextual features inde- extraction of long-range dependencies thatfail to develop
pendently and combines them as the final features to be the deep semantic information from the input text.
passed into the SoftMax layer. Further, this model obtain- HieNN-DWE is a two-layer network model consisting of
ed better SC accuracy due to its better performance in BiGRU with attention mechanism in the first layer and
representing both target and context. The experimental both BiGRUand CNN in the second layer. The first layer
results showed that the IAN model performed effectively encodes the sentences, while the second layer captures the
on SemEval 2014 data sets, confirming that the model can intrinsic features from input representations. The experi-
effectively learn significant features for both target and mental results showed that the HieNN-DWE approach
context-based ASA. Hence, the model with attention outperforms all the existing models on Yelp 2015, Yelp
mechanism can judge the importance of sentiment polar- 2014, Yelp 2013, and IMDB data sets.
ity of target features. Li et al., [23] developed TNet model Similarly, long-range dependencies were also handled with
to conquer the issue by modeling the connection between AttDR-2DCNN [29] by exploring the dependencies between
each word and context word within the aspect. TNet used features and semantics for sentences in a document. AttDR-
CNN model to resolve the drawbacks of simple attention 2DCNN is also depicted with two layers. At the first layer,
mechanism when identifying the local features. TSC was BiGRUis fed to the sentence feature vector, and the second
performed by incorporating the target information into the layer is fed with feature dimensions obtained from the docu-
word representa tions using deep transformation ment matrix. Moreover, convolution and max pooling
D. C. Edara et al.
operations were also applied to extract more dependencies, 3 Deep finesse network (dfn) model
and attention mechanisms were utilized to differentiate word
significance in the document. Extensive experiments were In this paper, we propose a novel deep learning architec-
conducted on IMDB and Yelp 2018, 2014, 2018 data sets to ture that effectively utilizes the existing linguistic and sen-
achieve better SC performance by obtaining compositional timent information in the target-level sentiment analysis
semantics of the document. Knowledge-Enhanced Neural tasks. This paper extracts three kinds of input features:
Networks [30] was proposed to identify the aspect-opinion position features, parts-of-speech(POS) features and de-
pairs for ASC task, based on the context features, extracted pendency syntax features. These input features were sepa-
from Chinese review sentences. Sentiment polarities of each rately modeled into three different input channels to form a
aspect-pair were determined by the sentiment knowledge novel Deep Finesse Network (DFN) model. The following
graph, which offered more comprehensive SA results by subsection-3.1 describes the about the multi-channel fea-
attaining better efficiency when compared with conventional tures in detail. In general, DFN is a multi-channel text
models on Chinese car review data set. To attain the target- based sentiment analysis model combining CNN, Capsule
dependent representation, TD-LSTM [31] was proposed by Network and Bidirectional GRU with Self-Attention mech-
employing double LSTMs on both the right and left contexts anism. The detailed description of the layers designed in
of an aspect. TC-LSTM [31] was also proposed to capture the the DFN model is discussed in the following subsection
semantic relation between target aspect and context features 3.2. The detailed structure of the proposed DFN is depicted
implicitly from the sentence in ASA task. For this, the target in the following Fig.1. The structure of DFN is composed
aspect word embeddings are appended into general word em- of multiple input channels where the features (wi) of each
beddings based on TD-LSTM. Attention Mechanism was in- input channel is concatenated with the sentiment features.
corporated to capture the interaction between the aspect and its The first layer of the DFN is the embedding layer, which
context. MemNet [33], a deep memory network, was pro- maps each sentiment feature into a vector representation.
posed to perform the SC task based on position encoding Here, the dimensions of the word embedding layer are set
and content attention mechanism. The proposed TNet-LF to 300 for 20,000 features with an input length of size 150.
[34] entailed the target-specific representations by incorporat- The rest of the layers are considered as the main body of
ing one LSTM, CNN and a distinct component known as CPT DFN that contains each of three CNN, Capsule with global
layer. CPT entailed target-specific representations by preserv- max-pooling activation and Bidirectional GRU blocks with
ing the context to discover the conjoint information between Self- Attention mechanisms. The three channels with CNN
the aspect and context features. MGAN [35] was implemented are mainly used to identify and capture the local patterns
with coarse and fine-attention mechanisms by incorporating from the embedded feature vectors (ei) .
LSTM at the hidden representations. Most of the DL methods Here, the dropout condition is added to prevent the over-
focus on employing the attention with the representation of fitting problem. The three capsule blocks extract the
each aspect, rather than focusing on the position information. dependent-feature vectors with spatial relations from the pre-
Modeling the position knowledge from various stages en- vious local patterns with the help of “dynamic routing” pro-
hances the performance of ASC by carrying the hierarchical cess. A global pooling layer is devised to extract more signif-
information from SSC. PAHT [36] was proposed to capture icant patterns from the dependent-feature vectors through the
the most important information toward a given aspect. The max-pooling operation. The bidirectional GRU blocks in each
experiments conducted on four benchmark data sets proved input channel captures the contextual and semantic informa-
that PAHT outperforms other models in enhancing ASC effi- tion in both forward and backward contexts among the signif-
ciency. TSN [37] is a two-stage model developed for improv- icant features identified by the previous layer, while the self-
ing the ASA task by incorporating a position attention mech- attention layer extracts the target-based sentiment features
anism that is used with the penalized feature to improve the from the contextual semantic informative features. Finally,
divergence of the attention weights closer to the aspect in a these target-based sentiment features are modeled into the
single sentence. To capture the inter-aspect dependencies and sigmoid-linear classifier to obtain the final classification re-
predict the sentiment polarity of an aspect, SDLSTM [38] was sult. To facilitate reading, the descriptions of notations are
proposed by learning temporal dependencies based on the described in Table 2.
corresponding representations of an aspect. AEN [39] was
proposed to capture the relations among the aspect terms 3.1 Multi-channel input features
and their context by using attention-based encoders. The sig-
nificance between each word and its context was captured by 3.1.1 Position ranked feature vector (P)
RAN [40] model that utilized a multiple attention mechanism
for the ASC task. The following Table 1 describes the recent Generally, the essential knowledge is hidden in the words and
developments in various ASA tasks. often identified based on their positions. It is observed that the
Deep finesse network model with multichannel syntactic and contextual features for target-specific...
Fig. 1 Overview of DFN model. SP, SR and SD are the multiple input significant features, BiGRU Blocks captures the contextual semantic re-
feature vectors, w1, w2,…, wi indicates the words with embedding vector lations, Self-Attention identifies the target specific features from the prior
representation ei, Conv Layers identifies the local and temporal features layer, while sigmoid-weighted linear unit classifies and outputs the target
with the drop out condition D, Capsule Blocks captures the global specific sentiment features
features with spatial relations, Global Max-Pooling identifies the
D. C. Edara et al.
same word appears in various positions and expresses unusual represents the value of each position ranked feature vector
sentiment knowledge. Using the TextRank [48–52] algorithm, Pd that is encoded into a multidimensional vector d:
the position of each sentiment aspect based on its co-
occurrence relation is obtained. The following equation pk ∈Pd ð1Þ
Deep finesse network model with multichannel syntactic and contextual features for target-specific...
where d represents the dimensions and pk denotes the kth po- is designed and connected to the embedding layer for each
sition ranked feature vector. input channel. These local patterns from the embedding fea-
tures are captured based on the n-grams from the given input.
3.1.2 Rule based parts-of-speech feature vector (R) This helps to capture both semantic and grammatical informa-
tion from the features of multiple channels at different posi-
Rule-based POS (R-POS) tagging is used to capture the con- tions through filter and kernel sizes. At this point, we used
textual features from an input sentence [53]. Usually, POS three filter sizes for multiple input channels SP, SR and SD
tagging is an annotation process for determining the context accordingly. As represented in Fig. 2, each row denotes one
in a sentence. By tagging the input sentence, the words with local feature. We applied the convolution process to the total
more sentiment weight can be learned into a model for SC. column, i.e., a sliding window of kernel size−h × l.
The following equation denotes the operation of obtaining Rd Here, h denotes the total words that are to be incorporated into
that is encoded into a multi-dimensional vector d: each convolution step (N-grams).This process benefits the
model in two ways: 1). It reduces the total number of param-
t k ∈Rd ð2Þ eters in the convolution process, 2). Sequential order of words
is well maintained, and the performance is improved. The
where d represents the dimension of the R-POS feature vector convolution operation performed by the convolution layer
and tk denotes the kthR-POS feature vector. for producing the features for a given filter Fh, l ∈ ℝh × l is
as follows:
3.1.3 Syntactical dependent feature vector (S)
N l
Syntactical dependency mainly captures word dependencies fi ¼ f ∑ ∑ F h;l Piþh;l þ bi ð7Þ
and relations based on their syntactic structure. This facilitates h¼1 l¼1
the model to determine the hidden sentiment knowledge and
Here, fi represents the number of features produced for
learns the existing knowledge for the SA task by identifying
i=1,2,3,…,N. f denotes the non-linear activation function
their syntactic relations. The following equation represents the
(ReLU), Fh, l represents the filter, Pi denotes the input feature
operation of obtaining Sd that is encoded into a multi-
vector for i=1,2,3,…,N. and b i is the bias term for
dimensional vector d:
i=1,2,3,…,N. Finally, the concatenation of features fi develops
sk ∈S d ð3Þ a column feature vector F in the first convolution stage. As
observed from Fig. 2, the equivalent convolution operations
where d represents the dimensions and sk denotes the kth syn- with different kernel sizes are incorporated. The core idea of
tactically parsed feature vector. this incorporation is to capture the specific local features based
In addition to this, the polarity is also computed with on their potential word dependencies. The output of the sec-
VADER sentiment lexicon [54] to obtain positive, negative, ond stage convolution operation is then linked together to
and neutral sentiment features from P, R, and S input generate m column vectors, where m represents the filter size.
channels.So, the model can be trained with variety of senti- Here, we use three filter sizes for SP, SR and SD.Let us assume
ment features from different perspectives and the hidden con- that X ∈ Sp, Y ∈ SR and Z ∈ SD. where, Sp ∈ Fh, l, SR ∈ Fh, l
textual knowledge from the whole data set can be discovered. and SD ∈ Fh, l be the filters applied to n-gram for the matrices
The following equations represent the operation of obtaining X, Y,and Z respectively. Therefore, this is applied to each
multichannel inputs along with sentiment vectors S1, S2 window [X1 : h, X1 : h + 1, …XN − h : N]. For the matrices
and S3that is encoded into a multidimensional vector d: X, Y and Z with X ∈ Fh, l, Y ∈ Fh, l and Z ∈ Fh, l the produced
features for matrices X, Y, and Z is as follows:
S P ¼ Pd þ S 1 ð4Þ
SR ¼ R þ S2
d
ð5Þ f X ¼ f X1 ; f X2 …: f XN ð8Þ
SD ¼ Sd þ S3 ð6Þ fY ¼ f Y1 ; f Y2 …:f YN ð9Þ
fZ ¼ f Z1 ; f Z2 …:f ZN ð10Þ
3.2 Model architectures
Moreover, these outputs are then made to transfer through
3.2.1 Convolutional layer other two stages of the convolution layer with the dropout
condition (D) among them. These convolution stages effec-
To capture the local patterns from the embedding vectors, a tively handle the high dimensionality problem by extracting
one-dimensional convolution neural network (1D-ConvNet) the most important patterns from the given input. This
D. C. Edara et al.
Fig. 3 Dynamic Routing process in the Capsule Network. Li denotes the matrix, Cij represents the coupling coefficient, Sj denotes the total input
number of input vectors outputted by the convolution layer. ui represents of the SentiCap Layer, Rj is the output capsule, and Rout is the final global
the number of feature capsules, Mij indicates the correlation weighted output feature vectors outputted by the SentiCap Layer
S ij ¼ ∑ C ij :b
u jji ð14Þ dimensional max-pooling (1D-MP) layer with pooling opera-
i
tion is to enhance the routing process by eliminating the insta-
2
S j Sj bilities of some noisy capsules. Thus, the pooling layer obtains
Rj ¼ 2 : 2 ð15Þ the most prominent features after a specific convolution oper-
1 þ Sj S j
ation is performed on the capsule layer. The following eq. (18)
2
S j Sj represents the output (bvÞ obtained from the max-pooling lay-
Rj ¼ kS j k :
S j
ð16Þ er.
1 þ exp S j
2 bv ¼ max fRoutg ð18Þ
S j S
Rj ¼ : j ð17Þ
1 þ exp S j S j where Rout is the output from the Capsule Layer.
The goal of this layer is to reduce the feature resolution maps To acquire the contextual dependencies between the semantic
by performing the pooling operation to various units in a local oriented features, a BiGRU layer is connected to the max-
feature space based on its pooling size. In our work, the max- pooling layer as shown in Fig. 4. The main reason for using
pooling layer is incorporated in the capsule layer to determine the BiGRU layer is to capture the context information based
the most prominent feature pairs based on their position from on sequence modeling from the sequences of feature maps.
each input channel. The main reason of utilizing this one- GRU is a unique type of RNN that has similar properties as
D. C. Edara et al.
zt ¼ σ w :bvt þ
T
wToz :ot−1
þ bz ð19Þ
bvz
rt ¼ σ wT :bvt þ wTor :ot−1 þ br ð20Þ
bvr
e
ot ¼ tanh wT :bvt þ wT :ðrt *ot−1 Þ þ b ð21Þ
bveo oe
o eo
In this case, we noticed that the performance was slightly representation (ut). Then, the significance of the aspects
improved with faster training time when compared to LSTM. from this representation is computed based on the higher
weight for (ft) and (ut) along with the aspect-level con-
text vector (uw). Here, (uw) is randomly initialized and
3.2.5 Self-attention layer observed as a high dimensional representation to assess
the significance of different aspects in the text, which
Initially, the conventional attention mechanism is introduced was learned jointly during the training process. Finally,
in the field of image processing for training a specific model weighted mean of the hidden vector (ft) is calculated by
based on the exact feature information [56–58].In addition to an activation function. Here, we utilize SoftPlus,
the contextual information, the polarity of the text is strongly SoftSign and Leaky-SoftMax activation functions for
correlated with the sentiment features and aspect-features. the multiple channels. Every step of the self-attention
However, not all the context words have the equal correlation process is described in the following eqs. (24–26):
to semantics of a text. To solve this issue, a customized self-
attention mechanism with joint learning process is designed to ut ¼ tanhðW w f t þ bw Þ ð24Þ
capture the intra-sequence relations of the identified contextu-
exp uTt uw
al features by assigning them a higher weight to improve their ∂t ¼ ð25Þ
∑ exp uTt uw
significance. Figure 5 describes the model of self-attention t
mechanism with joint learning process. Generally, the.
s ¼ ∑ ∂t h t ð26Þ
BiGRU networks outputs a hidden vector (ft). Initially, t
the hidden vector (ft) is learned as an input into the
Multi-Layer Perceptron [59, 60] to obtain a new hidden
D. C. Edara et al.
3.2.6 Output layer from SemEval 2015 Task 12. It includes a total of
1606train and 799 test samples.
Finally, we designed an output layer with a sigmoid-weighted 3. Restaurant-16 (R16)3:This dataset consists of 4572res-
linear unit (SiLU) to produce the higher-level representation taurant reviews (3002 positive, 697 neutral and 873 neg-
of semantic features from each input channel [61]. The SiLU ative) from SemEval 2016 Task 5. It contains total of
is a reinforcement learning based activation function that con- 2065 train and 2507 test samples.
verts the input values from 0 to 1. The advantage of utilizing 4. Laptop-14 (L14)1: It contains a total of 2951 (1328 pos-
the sigmoid activation function is it has a limited output range itive, 629 neutral and 994 negative) reviews of laptop
and can be optimized for constancy. To classify the obtained domain from SemEval 2014 Task 4. It includes a total
sequence based on long-range dependencies, the target specif- of 2313 train and 638 test samples.
ic sentiment vector from each input channel was obtained by 5. Laptop-15 (L15)2: This dataset includes a total of
computing its weight in the final hidden vector. The following 29,232,951 (1644 positive, 185 neutral and 1094 nega-
eqs. (27–28) show the calculation for sigmoid-weighted linear tive) reviews of laptop domain from SemEval 2015 Task
units (SiLU) activation function: 12. It contains a total of1974 train and 949 test samples.
6. Laptop-16 (L16)3: It contains 3605(2115 positive, 357
zk ¼ ∑wik si þ bk ð27Þ
neutral and 1133 negative) reviews of laptop domain
ak ðxÞ ¼ zk σðzk Þ ð28Þ from SemEval 2016 Task 5. It includes a total of 2909
train and 696 test samples.
where ‘zk’ is the input to the hidden unit, ‘wik’ denotes the 7. Twitter Reviews (TR) 4 : This dataset consists of
weight of the connecting state ‘si’ to ‘k’ hidden units, ‘bk’ is 6940(1734 positive, 3473 neutral and 1733 negative)
bias, and σ is the sigmoid function. tweets collected from Twitter domain. It contains total
of 6248 train and 692 test samples.
8. Amazon Consumer Reviews (ACR) 5 : It contains
4 Experimental analysis 22,771(8142 positive, 1172 neutral and 6626 negative)
consumer product reviews scraped from Amazon web-
4.1 Experimental setup site. It includes a total of 15,940 train and
6831testsamples.
The overall framework in this paper was implemented in 9. Movie Reviews (MR)6: This is a sentence polarity
Python Environment with Anaconda Framework by consider- dataset contains total of 10,662 (5331 positive and
ing the following system specifications: Intel Xeon Platinum 5331 negative) movie reviews. We manually combined,
@ 2.50GHz processor, 64GB memory, 32GB NVIDIA v100 shuffled and divided the entire dataset into70% for train-
Graphics Memory Unit, and Windows 10 Operating System. ing (7463) and 30% for testing (3199).
The average execution time for every iteration was 8 min, 10. Stanford Sentiment Treebank-2 (SST-2)7: It includes a
43 s. total of 9613 (3610 positive, 3310 negative) reviews
with neutral sentiment is being deleted. Here also, we
manually combined, shuffled and divided the complete
4.1.1 Data collection dataset into 70% for training (4844) and 30% for testing
(2076).
Table 3 presents the detailed statistics of the datasets used in
this study. The performance of any model can be tested with a
custom dataset or benchmark dataset. We conducted a series
of experiments on the ten widely used benchmark datasets of 4.1.2 Parameter settings
different domains to evaluate the aspect modeling and classi-
fication performance of DFN. We used VADER sentiment In this experiment, we transformed the text into vector repre-
lexicon for identifying the positive, negative and neutral as- sentation using one-hot encoding approach and fed it into a
pects. The description of each dataset is presented as follows. word embedding layer with 300 dimensions. We obtained
better parameter values by performing repeated training. For
1. Restaurant-14 (R14)1: This dataset includes a total of the convolution layer in the first channel, the filter size was
4722 (2892 positive, 829 neutral and 1001 negative) 128, the activation function used was ReLU; the kernel size
reviews of restaurant domain from SemEval 2014 Task was3, and the spatial dropout was 0.3. For the capsule layer,
4. It contains a total of 3602 train and 1120 test samples. the number of capsules was set to 64; the capsule dimension
2. Restaurant-15 (R15)2: It contains a total of 2405 (1615 was 50, and the number of DR iterations was5. The pool size
positive, 82 neutral and 708 negative) restaurant reviews was set to 2 in one-dimensional max pooling layer. In BiGRU
Deep finesse network model with multichannel syntactic and contextual features for target-specific...
layer, the neurons number was set to 32, and the recurrent 5 Results
dropout was0.3. In the self-attention layer, SoftPlus function
was used as the attention activation. Similarly, for the convo- 5.1 Model comparison
lution layer in the second input channel, the size of the filters
was set to 128, the activation function used was PReLU, the In this section, we compared the performance of our DFN
kernel size was6, and the spatial dropout is 0.3. For the capsule model with some of the state-of-the-art models like ATAE-
layer, the number of capsules was 64, the capsule dimension LSTM [21], IAN [22], TNet [23], TD-LSTM [31], TC-LSTM
was 50, and the number of DR iterations was4. The pool size [31], MEMNET [33], MGAN [34], PAHT [35], TSN [37],
was set to 2 in one-dimensional max-pooling layer. In BiGRU SD-LSTM [38], AEN [39], RAN [40], MAN [42], PBAN
layer, the neurons length was 32, and the recurrent dropout [43], GCN [44], AE-LSTM [45], GCAE [46], and AE-
was 0.3. In the self-attention layer, SoftSign was used as the DLSTM [47] that have been identified and reported in the
attention activation function. Further, the parameters for the literature. Additionally, we also developed some standard
convolution layer in the third channel also consisted of filter deep-learning models like CNN, LSTM, GRU, CNN-
size with 128 filters. In addition, the kernel size was 9, LReLU LSTM, BiLSTM, BiGRU, CapsNet and BiCapsNet with
activation function was used, and the spatial dropout was set to multi-channel to assess the the quality of aspect modeling in
0.3. For the capsule layer, the number of capsules was set to this work. We found that fine-tuning the multi-channel
64, the number of DR iterations was3, and the capsule dimen-
sion was50. The pool size was set to 2 in one-dimensional max
pooling layer. In BiGRU layer, the length of the neurons was Table 4 Comparison of Learnable parameters of the DFN and other
set to 32 with recurrent dropout 0.3. In the self-attention layer, baseline models
we used Leaky-SoftMax function for the attention activation.
Model Learnable Parameters
Further, all the layers of the three input channels were flattened
using a flatten layer and merged layer concatenation. The out- CNN 6,077,057
put layer was added to the model with sigmoid activation LSTM 6,022,689
function, L2 kernel regularizer, kernel and bias initializers. GRU 6,017,617
Furthermore, we chose Adam optimizer with 0.001 learning CNN-LSTM 6,164,209
rate to reduce the training loss. We also selected binary-cross BiLSTM 6,045,377
entropy and categorical-cross entropy for performing both bi- BiGRU 6,035,233
nary classification and ternary classification. Based on the ca- CapsNet 6,960,051
pacity of the provided resources, we set the number of epochs BiCapsNet 6,146,177
to 20 with initial batch size of 10. Moreover, we also utilized DFN (First-Channel) 6,498,242
the early stopping condition to ensure that the training can be DFN (Second-Channel) 6,534,498
halted in the case of over fitting. The iteration is stopped when DFN (Third Channel) 6,572,898
there is no progress in the verification loss after 12 epochs. DFN (First, Second Channels) 13,032,739
Table 4 shows the total number of learned parameters of the DFN (First, Third Channel) 13,071,139
DFN model during its training procedure on the datasets used DFN (Second, Third Channel) 13,107,395
in this work. DFN (Proposed)* 19,605,636
D. C. Edara et al.
features plays an important role in the TSC task. The experi- improved performance over IAN as it models the position
mental results are shown in Table 5. The best classification information by transferring.
accuracy is represented in bold. It can be observed that the the hierarchical knowledge from the resource rich
proposed model outperforms baseline and advanced models in sentences. SD-LSTM outperforms PAHT by capturing tem-
all the cases. Specifically, its accuracy demonstrates a great poral dependencies from the sentence representations.
improvement with effective utilization of multi-channel lin- MemNet also achieves better classification performance than
guistic features. For advanced models, TC-LSTM, TD- PAHT but cannot perform as efficiently as TSN. RAM over-
LSTM, and AE-LSTM exhibits poor performance in terms comes TSN with better classification accuracy, because RAM
of classification accuracy. GCAE has slight improvement over adopts multiple attention mechanisms for capturing the impor-
the LSTM based models due to the feature extraction incor- tant sentiment features from the difficult sentence structures.
porated with non-linear gating mechanism. However, the per- TNet produces better classification results than RAM on res-
formance remains below the mark as it fails to capture the taurant 2014, 2015, 2016 and Laptop 2014 datasets. MGAN
long-range dependencies. IAN performs better than LSTM has superior performance than TNet as it leverages the fine-
based models in terms of accuracy, as it captures the interac- grained and coarse-grained attention mechanisms to capture
tions between the target and contextual aspects. PAHT had the aspect-level interactions between aspects with same
“-” denotes that there is no significant literature and this database is not utilized by the model.
Deep finesse network model with multichannel syntactic and contextual features for target-specific...
context. AEN outperforms MGAN on restaurant and laptop attention mechanism. Additionally, AE-DLSTM and
domains of SemEval 2014 dataset, as it incorporates attention PBAN were only the models that obtained highest accura-
mechanism and BERT transformer to capture the introspec- cy on laptop and restaurant reviews of the SemEval 2014
tive and interactive semantics between target and context when compared to the models other than DFN. MAN was
words. GCN performs better than AEN in terms of classifica- the only model that obtained 82.65% accuracy on restau-
tion accuracy by modeling the semantic and syntactical infor- rant domain of the SemEval 2015 database. TNet, MGAN,
mation of word dependencies by establishing a dependency and MAN obtained 85.03%, 85.12%, and 85.87% on res-
tree on the sentence. PBAN also performs better than GCN as taurant domain of the SemEval 2016 database. Moreover,
it employs bidirectional attention network with position infor- we also noticed that the accuracy on the twitter domain
mation to capture the aspect position information and model remains poorer than that on laptop and restaurant domains
the relation between the aspect and the sentence mutually. in majority of the cases. This may be due to the text in
Specifically, on Twitter domain, it has minor improvement twitter convey complex emotions and larger number of
as it incorporates the convolution component to capture the unlisted words that emerge in twitter and can cause the
non-sequential informative features. There were no experi- problems while classifying the polarities. Our DFN also
ments conducted with these state-of-art models on ACR, achieved superior results on Twitter reviews dataset with
MR and SST-2 datasets as they were only designed for 92.73% accuracy by capturing the important information
aspect-level sentiment analysis task. from the complex sentences. On other hand, we also aimed
Specifically, we can also see that the models utilizing to perform target specific sentiment classification on rest of
multi-channel features had improved performance in terms the datasets like ACR, MR and SST-2 and achieved supe-
of classification accuracy. Among all the baseline models, rior results when compared to the baseline models devel-
CNN performs poorly on all the datasets. Recurrent oped with multi-channel features in terms of accuracy.
Network variants like LSTM and GRU performs better than Moreover, at the beginning of the experimentation, we
CNN as they capture the contextual information among the found that fine-tuning the multi-channel features plays an
aspects. The ensemble CNN-LSTM model outperforms CNN, important role in the TSC task. Moreover, we also tested
LSTM and GRU models in all the cases as it captures the long- our approach with the most commonly used pre-trained
range dependent information among local order features cap- word embeddings. Initially, we have considered Glove,
tured by CNN. BiLSTM and BiGRU are the bidirectional Fasttext, and ULMFit pre-trained models with different
variants of LSTM and GRU networks which performs better dimensional sizes in the following set {100, 200, and
than LSTM and GRU by capturing the semantic relationships 300}. Table 5 presents that the DFN with pre-trained
between aspect based on forward and backward contexts. word embeddings is inferior to the results of DFN model
However, these models suffer from modeling the spatial and obtained with one-hot encoding vectors on all the
hierarchical relations of the aspects in the text. CapsNet had datasets. The performance of the DFN model is increased
achieved better classification accuracy than the above models by using one-hot encoding compared to the results of pre-
as it can model the local and spatial hierarchical relationships trained word embeddings presented in Tables 6. The pre-
from the contextual representations. BiCapsNet also performs trained word embeddings fail to capture the important
better than conventional CapsNet by incorporating the bidi- semantic relatedness of words. They also fail to analyze
rectional mechanism which helps the model to capture the the words that present outside the large vocabulary and
semantic and the global aspect related representations of the gains more time complexity of the model. The DFN mod-
text. However, these models suffer with poor generalization el using one-hot encoding approach reduces the encoding
abilities and therefore failed to capture the aspect-level con- time in order to obtain the embedded feature vectors,
textual information for the target-specific sentiment classifica- when vectorization is performed. This improvement can
tion. In this scenario, our DFN model overcomes the above be achieved only when effective feature extraction is per-
challenges and achieved higher accuracies on different formed at the initial stages of model development.
datasets of various domains. The results also prove that the
efficiency can be increased by effectively utilizing various
language and sentiment resources. 5.2 Impact of each module of dfn model
The efficiency of the DFN also models open language
knowledge to produce various feature inputs. Further, the DFN model consists of three parts: multichannel features, dy-
model learns the sentiment information of each aspect at namic routing, and self-attention mechanisms. Every module
various angles based on importance. The DR and self- of DFN can be incorporated for achieving the final result.
attention mechanisms incorporated in this model helps to Therefore, a series of experiments is carried out in this section
gain better classification performance than MAN, IAN, to assess the performance impact of multichannel features and
MGAN and MemNet models that employs a single the self-attention mechanism of DFN model.
D. C. Edara et al.
5.2.1 Influence of multichannel linguistic features routing functions defined in each module. This is because the
DR results in an over fitting problem due to more than 5
The multichannel linguistic features in this study include SP, iterations.
Sr and SD as shown in Fig. 1. Further, we performed a few Table 8 presents the obtained results of DFN model incor-
fine-tuning experiments on DFN model to expose the weight porated with different routing mechanisms in the capsule layer
of the linguistic features on the three datasets. Table 7 shows in each feature channel. We considered Swish and Swift acti-
that the performance of the model relatively changes when vation functions that are combined with traditional Squash
complexity rises with the addition of linguistic aspects. routing function. DFN works better when the same routing
However, the performance of DFN is improved on adding function is used in the capsule layer, although it does not
some of the linguistic aspects of the approach. With the incor- capture the semantic characteristics of features and results in
poration of multichannel features, the effectiveness of the poor generalization ability. So, we chose Swish and Swift
DFN model is raised by 5–10% compared to models activation functions and incorporated them with Squash
employing word features. We also noticed that SP and SD routing function to enhance the capturing capability of DFN
played an important role in improving efficiency. It also re- in each input channel. However, the capturing capability of
veals that better performance can only be obtained by utilizing DFN was improved, and the accuracy was increased
multichannel features. from2.33% to 4.11% when three different routing mecha-
nisms were used in each capsule layer.
✓ ✗ ✗ 86.22 89.37 88.65 89.14 86.94 86.16 91.64 91.46 90.12 86.91
✗ ✓ ✗ 88.69 91.69 90.64 90.23 87.26 88.54 92.87 93.18 92.41 87.66
✗ ✗ ✓ 90.11 94.12 88.16 91.47 88.45 87.46 90.33 92.74 93.64 88.10
✓ ✓ ✗ 89.32 92.54 89.78 92.64 87.14 91.08 93.22 94.81 92.16 87.82
✓ ✗ ✓ 92.14 93.78 89.74 92.32 90.10 92.82 94.98 95.52 93.28 89.54
✗ ✓ ✓ 93.72 95.31 92.10 93.65 91.66 93.20 95.80 96.07 95.87 91.17
✓ ✓ ✓ 96.84 97.46 94.92 95.34 95.22 94.18 96.48 97.25 96.50 92.73
Deep finesse network model with multichannel syntactic and contextual features for target-specific...
Table 8 Performance of DFN with different DR Mechanisms on SemEval 2014, 2015, & 2016 datasets
Squash Squash Squash 93.54 95.24 92.62 93.25 93 92.58 95.24 95.48 94.74 89.53
Swish Swish Swish 91.69 92.16 90.35 89.33 90.74 90.74 92.16 93.21 91.36 88.12
Swift Swift Swift 90.45 92.34 91.86 90.14 91.35 89.87 92.34 92.60 90.14 87.68
Squash Swish Swift 96.84 97.46 94.92 95.34 95.22 94.18 96.48 97.25 96.50 92.73
the weight of self-attention. On the three datasets, we fine- of each aspect on sentiment tendency of the text. Therefore,
tuned the weights of self-attention on each module of the DFN learns more hidden feature information with the emotion
DFN model. This indicated that the SoftPlus, SoftSign, and strengthen features. The experimental results show that the
Leaky-SoftMax activations, which compute the weight of multichannel features, convolution, dynamic routing,
self-attention, greatly affect the performance of DFN. BiGRU and self-attention mechanisms play a significant role
Additionally, DFN with no self-attention is considerably less in improving the performance of DFN. At the same time, the
efficient than DFN with self-attention mechanism as it fails to size of the position, POS and dependent features from multi-
capture the sequence of the target features. Table 9 presents ple input channels also affects the performance of DFN.
the obtained results of DFN with and without using self- Moreover, the effective pre-processing and feature extraction
attention mechanism. Table 10 presents the self-attention of also affect classification accuracy. Furthermore, the way of
DFN with SoftPlus, SoftSign, and Leaky-SoftMax activation producing the embedding features also influences the classifi-
functions that are significantly inferior to the self-attention cation accuracy. Here, the features from the multiple input
with traditional activation functions. Therefore, the self- channels were vectorized using multi-hot encoding approach
attention mechanism has a certain influence on DFN model. [62]. Many studies [63, 64] have revealed that one-hot
So, the proposed DFN with self-attention weights computed encoding has limited abilities for capturing morphological
based on various activation functions offers the best results. and semantic information from text and can be resolved by
Further, it also reveals that all the factors in self-attention are pre-trained word embedding models [65–67]. However, these
helpful for the outcome of DFN model. pre-trained models suffer from high time complexity in word
encoding due to large-sized vocabulary. In comparison to the
pre-trained models, the results prove that the word embedding
6 Discussions vectors with multi-hot encoding generates better results with
less encoding time. Therefore, the process of generating word
For DFN, the convolution and capsule blocks can access the embedding vectors with multi-hot encoding is suitable for
local and global context information more effectively by cap- converting words into vector representation. Figures 7 and 8
turing the semantic relations from the text. With the effective represent the performance of DFN model using multi-hot
utilization of existing linguistic and sentiment resources, the encoding with {100, 200, and 300} dimensions on all the
multi-channel input features enable DFN to learn the senti- datasets. All the experimental results prove that the combina-
ment knowledge from the sentences from different perspec- tion of multi-channel features, dynamic routing and self-
tives. The current state-of-art models separately identify the attention mechanisms can significantly enhance the perfor-
aspects and contexts within the sentence structure, where DFN mance of the model with stronger generalization and classifi-
can replace this by identifying the interactions between target cation capabilities. On all the benchmark datasets, DFN out-
aspects and contexts through sequence modeling. The multi- performs the current state-of-art models and baseline models
ple self-attention mechanisms in DFN grasps the interactions with multi-channel features with the best classification accu-
between the concatenated local and global context aspects and racy. Finally, although DFN obtains better modeling and clas-
computes the direct relations based on the weight distribution sification performance, we some errors have been analyzed
Table 10 Performance of DFN with and without Activations in Self-Attention Mechanisms on various datasets
Self-Attention Activations R14 L14 R15 L15 R16 L16 TR ACR MR SST-2
SM SP SS L-SM
✓ ✗ ✗ ✗ 89.56 90.58 88.45 89.25 88.36 90.88 88.63 89.06 91.33 92.48
✗ ✓ ✗ ✗ 87.33 88.27 86.22 87.14 86.92 88.36 87.56 87.84 89.42 89.16
✗ ✗ ✓ ✗ 87.64 88.86 86.57 87.72 87.10 88.93 87.14 88 88.71 91.32
✗ ✗ ✗ ✓ 88.10 89.32 88.10 89.32 88.10 89.32 89.71 90.12 92.10 93.54
✓ ✓ ✓ ✗ 94.12 95.78 93.08 94.25 92.18 95.48 90 93.36 93.10 94.17
✓ ✓ ✗ ✓ 94.46 93.32 93.45 90.16 93.43 94.88 90.56 92.82 94.32 95.23
✓ ✗ ✓ ✓ 94.74 95.33 93.63 94.48 94.36 95.54 91.22 93.67 95 96.78
✗ ✓ ✓ ✓ 96.84 97.46 94.92 95.34 95.22 96.50 92.73 94.18 96.48 97.25
for better understanding. Particularly, we randomly picked representation with multi-channel features to address the chal-
100 instances of DFN error prediction from the test set of lenges on aspect and target specific sentiment classification.
restaurant 2016 and SST-2 datasets and have noticed some Here, each input text contains with multiple aspects and mixed
classification errors that were made by DFN. The multi- sentiments. This model is entirely based on the convolution
channel inputs with sparse features among the large number operation, dynamic routing, and self-attention mechanism to
of target words, DFN misjudges the input text and therefore it obtain the significant sequence based contextual information.
ignores the local and global information about the aspect and The main idea of DFN is to identify and classify the target
the context sequences. As a result, the weights of the self- specific aspects from the given text by leveraging the existing
attention are exaggerated and the classification accuracy is linguistic and sentiment knowledge. This approach comprises
affected. Moreover, the DFN requires more training time than of the following main parts: feature extraction, multichannel
other models, as it uses multiple input channels with more linguistic features, dynamic routing, and self-attention mech-
layers that significantly compute the weight parameters during anism. Feature extraction is effectively performed by
training. Figure 8 represents the training time of the proposed extracting the semantic relations of contextual features based
DFN model while converges to the solution. on positional, pos, and syntactic dependent features. Then, the
DFN identifies the target aspects and contexts through the
process of sequence modeling by employing the convolution
7 Conclusion operation, capsule layer with dynamic routing, BiGRU, and
self-attention mechanism. Experiments are performed exten-
In this paper, we propose a Deep Finesse Network model sively on variety domains of different datasets to validate the
which is an integrated aspect-context interactive sequence efficiency of the DFN. The achieved results show that our
DFN model outperforms all the state-of-the-art models iden- 5. Dragoni M, Tettamanzi AGB, da Costa Pereira C (2014) A fuzzy
system for concept-level sentiment analysis. Commun. Comput. Inf.
tified from the literature. Moreover, it also achieves superior
Sci. https://doi.org/10.1007/978-3-319-12024-9_2
classification performance than the standard deep learning 6. Do HH, Prasad PWC, Maag A, Alsadoon A (2019) Deep learning for
models with multi-channel features with stronger generaliza- aspect-based sentiment analysis: a comparative review. Expert Syst
tion and classification capabilities. Appl 118:272–299. https://doi.org/10.1016/j.eswa.2018.10.003
In the immediate future, we will focus on improving the 7. Cambria E (2016) Affective computing and sentiment analysis. IEEE
Intell Syst 31:102–107. https://doi.org/10.1109/MIS.2016.31
DFN model and applying it to the sentence-level and 8. Salas-Zárate MP, Medina-Moreira J, Lagos-Ortiz K, Luna-Aveiga
document-level sentiment analysis tasks. Moreover, training H, Rodríguez-García MÁ, Valencia-García R (2017) Sentiment
with enriched aspects can help DFN to capture more important Analysis on Tweets about Diabetes: An Aspect-Level Approach.
semantic relations from multiple perspectives without losing Comput. Math. Methods Med 2017:1–9. https://doi.org/10.1155/
2017/5140631
the order information. Concurrently, we will also apply our
9. Berka P (2020) Sentiment analysis using rule-based and case-based
model on several practical scenarios and will investigate in reasoning, 51–66
cross-domain and multi-lingual cognitive analysis. 10. Giannakopoulos A, Musat C, Hossmann A, Baeriswyl M (2018)
Unsupervised Aspect Term Extraction with B-LSTM and CRF
using Automatically Labelled Datasets, in: 2018. https://doi.org/
10.18653/v1/w17-5224
Declarations 11. Zhang B, Xu X, Li X, Chen X, Ye Y, Wang Z (2019) Sentiment
analysis through critic learning for optimizing convolutional neural
Conflict of interest The authors declare that they have no known com- networks with rules. Neurocomputing. 356:21–30. https://doi.org/
peting financial interests or personal relationships that could have ap- 10.1016/j.neucom.2019.04.038
peared to influence the work reported in this paper. 12. Yousif A, Niu Z, Chambua J, Khan ZY (2019)Multi-task learning
model based on recurrent convolutional neural networks for citation
sentiment and purpose classification. Neurocomputing. 335:195–
205. https://doi.org/10.1016/j.neucom.2019.01.021
13. Akhtar MS, Garg T, Ekbal A (2020)Multi-task learning for aspect
References term extraction and as pect sent iment classifica tion.
Neurocomputing. 398:247–256. https://doi.org/10.1016/j.neucom.
1. Chaturvedi I, Ragusa E, Gastaldo P, Zunino R, Cambria E (2018) 2020.02.093
Bayesian network based extreme learning machine for subjectivity 14. Ren L, Xu B, Lin H, Liu X, Yang L (2020) Sarcasm detection with
detection. J Frankl Inst 355:1780–1797. https://doi.org/10.1016/j. sentiment semantics enhanced multi-level memory network.
jfranklin.2017.06.007 Neurocomputing. 401:320–326. https://doi.org/10.1016/j.neucom.
2. Poria S, Hussain A, Cambria E (2018) Sentic Patterns: Sentiment 2020.03.081
Data Flow Analysis by Means of Dynamic Linguistic Patterns. In: 15. Jain DK, Jain R, Upadhyay Y, Kathuria A, Lan X (2020) Deep
Sentic patterns: sentiment data flow analysis by means of dynamic refinement: capsule network with attention mechanism-based sys-
linguistic patterns, in. https://doi.org/10.1007/978-3-319-95020-4_6 tem for text classification. Neural Comput Appl 32:1839–1856.
3. Wiebe J, Wilson T, Cardie C (2005) Annotating expressions of opin- https://doi.org/10.1007/s00521-019-04620-z
ions and emotions in language. Lang. Resour. Eval. https://doi.org/ 16. Yadav A, Vishwakarma DK (2020) Sentiment analysis using deep
10.1007/s10579-005-7880-9 learning architectures: a review. Artif. Intell. Rev. https://doi.org/
4. Ruppenhofer J, Somasundaran S, Wiebe J (2008) Finding the 10.1007/s10462-019-09794-5
sources and targets of subjective expressions, in: Proc. 6th Int. 17. Kulkarni A, Shivananda A, Kulkarni A, Shivananda A (2019) Deep
Conf. Lang. Resour. Eval. Lr. 2008 learning for NLP, in: Nat. Lang. Process. Recipes, https://doi.org/
10.1007/978-1-4842-4267-4_6
D. C. Edara et al.
18. Ghorbani M, Bahaghighat M, Xin Q, Özen F (2020) holistic recurrent attention on target-dependent memories.
ConvLSTMConv network: a deep learning approach for sentiment Knowledge-Based Syst 187:104825. https://doi.org/10.1016/j.
analysis in cloud computing. J Cloud Comput 9:1–12. https://doi. knosys.2019.06.033
org/10.1186/s13677-020-00162-1 37. Ma X, Zeng J, Peng L, Fortino G, Zhang Y (2019) Modeling multi-
19. Kim J, Jang S, Park E, Choi S (2020) Text classification using aspects within one opinionated sentence simultaneously for aspect-
capsules. Neurocomputing. https://doi.org/10.1016/j.neucom. level sentiment analysis. Futur Gener Comput Syst 93:304–311.
2019.10.033 https://doi.org/10.1016/j.future.2018.10.041
20. Tang D, Qin B, Feng X, Liu T (2016) Effective LSTMs for target- 38. Hazarika D, Poria S, Vij P, Krishnamurthy G, Cambria E,
dependent sentiment classification, COLING 2016 - 26th Int. Conf. Zimmermann R (2018) Modeling inter-aspect dependencies for
Comput. Linguist. Proc. COLING 2016 Tech. Pap. 3298–3307 aspect-based sentiment analysis, NAACL HLT 2018–2018 Conf.
21. Wang Y, Huang M, Zhao L, Zhu X (2016)Attention-based LSTM North Am. Chapter Assoc. Comput. Linguist. Hum. Lang.
for aspect-level sentiment classification, EMNLP 2016 - Conf. Technol. - Proc. Conf. 2 266–270. https://doi.org/10.18653/v1/
Empir. Methods Nat. Lang. Process. Proc. 606–615. https://doi. n18-2043
org/10.18653/v1/d16-1058 39. Song Y, Wang J, Jiang T, Liu Z, Rao Y (2019) Targeted Sentiment
22. Nguyen HT, Le Nguyen M (2018) Effective Attention Networks Classification with Attentional Encoder Network, Lect. Notes
for Aspect-level Sentiment Classification. Proc. 2018 10th Int. Comput. Sci. (Including Subser. Lect. Notes Artif. Intell. Lect.
Conf. Knowl. Syst. Eng. KSE 2018:25–30. https://doi.org/10. Notes Bioinformatics). 11730 LNCS 93–103. https://doi.org/10.
1109/KSE.2018.8573324 1007/978-3-030-30490-4_9
23. Li X, Bing L, Lam W, Shi B (2018) Transformation networks for 40. Chen P, Sun Z, Bing L, Yang W (2017) Recurrent attention net-
target-oriented sentiment classification, in: ACL 2018 - 56th Annu. work on memory for aspect sentiment analysis. EMNLP 2017 -
Meet. Assoc. Comput. Linguist. Proc. Conf. (Long Pap., https://doi. Conf. Empir. Methods Nat. Lang. Process. Proc:452–461. https://
org/10.18653/v1/p18-1087 doi.org/10.18653/v1/d17-1047
24. Liu N, Shen B (2020) ReMemNN: a novel memory neural network 41. Su J, Yu S, Luo D (2020) Enhancing aspect-based sentiment anal-
for powerful interaction in aspect-based sentiment analysis. ysis with capsule network. IEEE Access 8:100551–100561. https://
Neurocomputing. 395:66–77. https://doi.org/10.1016/j.neucom. doi.org/10.1109/ACCESS.2020.2997675
2020.02.018 42. Xu Q, Zhu L, Dai T, Yan C (2020)Aspect-based sentiment classi-
25. Liu B (2015) Sentiment analysis: mining opinions, sentiments, and fication with multi-attention network. Neurocomputing. 388:135–
emotions. https://doi.org/10.1017/CBO9781139084789 143. https://doi.org/10.1016/j.neucom.2020.01.024
26. Wei J, Liao J, Yang Z, Wang S, Zhao Q (2020)Bi-LSTM with 43. Gu S, Zhang L, Hou Y, Song Y (2018) A position-aware bidirec-
multi-polarity orthogonal attention for implicit sentiment analysis. tional attention network for aspect-level sentiment analysis, Proc.
Neurocomputing. 383:165–173. https://doi.org/10.1016/j.neucom. 27th Int. Conf. Comput. Linguist. 774–784. http://www.aclweb.
2019.11.054 org/anthology/C18-1066
27. Tan X, Cai Y, Xu J, Leung HF, Chen W, Li Q (2020) Improving 44. Zhao P, Hou L, Wu O (2020) Modeling sentiment dependencies
aspect-based sentiment analysis via aligning aspect embedding. with graph convolutional networks for aspect-level sentiment clas-
Neurocomputing. 383:336–347. https://doi.org/10.1016/j.neucom. sification. Knowledge-Based Syst 193. https://doi.org/10.1016/j.
2019.12.035 knosys.2019.105443
28. Liu F, Zheng L, Zheng J (2020) HieNN-DWE: a hierarchical neural 45. Wagner J, Arora P, Cortes S, Barman U, Bogdanova D, Foster J,
network with dynamic word embeddings for document level senti- Tounsi L (2015) DCU: Aspect-based Polarity Classification for
ment classification. Neurocomputing. 403:21–32. https://doi.org/ SemEval Task 4:223–229. https://doi.org/10.3115/v1/s14-2036
10.1016/j.neucom.2020.04.084 46. Xue W, Li T (2018) Aspect based sentiment analysis with gated
29. Liu F, Zheng J, Zheng L, Chen C (2020) Combining attention- convolutional networks, ACL 2018 - 56th Annu. Meet. Assoc.
based bidirectional gated recurrent neural network and two- Comput. Linguist. Proc. Conf. (Long Pap. 1 2514–2523. https://
dimensional convolutional neural network for document-level sen- doi.org/10.18653/v1/p18-1234
timent classification. Neurocomputing. 371:39–50. https://doi.org/ 47. Shuang K, Ren X, Yang Q, Li R, Loo J (2019) AELA-DLSTMs:
10.1016/j.neucom.2019.09.012 attention-enabled and location-aware double LSTMs for aspect-
30. Chen F, Huang Y (2019)Knowledge-enhanced neural networks for level sentiment classification. Neurocomputing. 334:25–34.
sentiment analysis of Chinese reviews. Neurocomputing. 368:51– https://doi.org/10.1016/j.neucom.2018.11.084
58. https://doi.org/10.1016/j.neucom.2019.08.054 48. Mihalcea R, Tarau P (2004) TextRank: bringing order into texts.
31. Tang D, Qin B, Feng X, Liu T (2015)Target-Dependent Sentiment Proc. EMNLP. https://doi.org/10.3115/1219044.1219064
Classification with Long Short Term Memory, ArXiv Prepr. 49. Mihalcea R (2004)Graph-based ranking algorithms for sentence
ArXiv1512.01100 extraction, applied to text summarization, in: https://doi.org/10.
32. Penghua Z, Dingyi Z (2019)Bidirectional-GRU based on attention 3115/1219044.1219064
mechanism for aspect-level sentiment analysis, in: ACM Int. Conf. 50. Yang K, Chen Z, Cai Y, Huang DP, Leung HF, Improved automat-
Proceeding Ser. https://doi.org/10.1145/3318299.3318368 ic keyword extraction given more semantic knowledge, in: Lect.
33. Yang C, Zhang H, Jiang B, Li K (2019)Aspect-based sentiment Notes Comput. Sci. (Including Subser. Lect. Notes Artif. Intell.
analysis with alternating coattention networks. Inf Process Manag Lect. Notes Bioinformatics), 2016. https://doi.org/10.1007/978-3-
56:463–478. https://doi.org/10.1016/j.ipm.2018.12.004 319-32055-7_10
34. Fan F, Feng Y, Zhao D (2020)Multi-grained attention network for 51. Wen Y, Yuan H, Zhang P, Research on keyword extraction based
aspect-level sentiment classification. Proc. 2018 Conf. Empir. on Word2Vec weighted TextRank, in: 2016 2nd IEEE Int. Conf.
Methods Nat. Lang. Process. EMNLP 2018:3433–3442. https:// Comput. Commun. ICCC 2016 - Proc., 2017. https://doi.org/10.
doi.org/10.18653/v1/d18-1380 1109/CompComm.2016.7925072
35. Zhou J, Chen Q, Huang JX, Hu QV, He L (2020)Position-aware 52. C. Mallick, A.K. Das, M. Dutta, A.K. Das, A. Sarkar (2018)Graph-
hierarchical transfer model for aspect-level sentiment classification. based text summarization using modified TextRank, in: Adv. Intell.
Inf. Sci. (Ny) 513:1–16. https://doi.org/10.1016/j.ins.2019.11.048 Syst. Comput., https://doi.org/10.1007/978-981-13-0514-6_14
36. Park HJ, Song M, Shin KS (2020) Deep learning models and 53. Alfred R, Mujat A, Obit JH, A Ruled-Based Part of Speech (RPOS)
datasets for aspect term sentiment classification: Implementing tagger for Malay text articles, in: Lect. Notes Comput. Sci.
Deep finesse network model with multichannel syntactic and contextual features for target-specific...
(Including Subser. Lect. Notes Artif. Intell. Lect. Notes Mr. Deepak Chowdary Edara
Bioinformatics), 2013: pp. 50–59. https://doi.org/10.1007/978-3- pursuing his Doctoral work in
642-36543-0_6 the department of Computer
54. Hutto CJ, Gilbert E, VADER: A parsimonious rule-based model for Science and Engineering,
sentiment analysis of social media text, in: Proc. 8th Int. Conf. Vignan’s Foundation for Science,
Weblogs Soc. Media, ICWSM 2014, 2014 Technology & Research. He is
currently working as Assistant
55. Yang M, Zhao W, Chen L, Qu Q, Zhao Z, Shen Y (2019)
Professor in the Department of
Investigating the transferring capability of capsule networks for text
Information, VFSTR. He has
classification. Neural Netw 118:247–261. https://doi.org/10.1016/j.
published more than 15 research
neunet.2019.06.014
articles in the areas of Natural
56. Saif H, He Y, Fernandez M, Alani H (2016) Contextual semantics
Language Processing, Data
for sentiment analysis of twitter. Inf Process Manag 52:5–19.
Mining etc.
https://doi.org/10.1016/j.ipm.2015.01.005
57. Letarte G, Paradis F, Giguère P, Laviolette F, Importance of Self-
Attention for Sentiment Analysis, in: 2019. https://doi.org/10.
18653/v1/w18-5429
58. Ambartsoumian A, Popowich F, Self-Attention: A Better Building
Block for Sentiment Analysis Neural Network Classifiers, in: 2019.
https://doi.org/10.18653/v1/w18-6219
59. Akhtar MS, Kumar A, Ghosal D, Ekbal A, Bhattacharyya P, A Dr. Venkatramaphanikumar
Multilayer perceptron based ensemble technique for fine-grained Sistla received his Doctoral degree
financial sentiment analysis, in: EMNLP 2017 - Conf. Empir. in Computer Science and
Methods Nat. Lang. Process. Proc., 2017. https://doi.org/10. Engineering from JNTU,
18653/v1/d17-1057 Hyderabad, Telangana. He is cur-
60. Al-Batah MS, Mrayyen S, Alzaqebah M (2018) Investigation of rently working as Associate
naive Bayes combined with multilayer perceptron for Arabic senti- Professor in the Department of
ment analysis and opinion mining. J. Comput. Sci. https://doi.org/ Computer Science & Engineering.
10.3844/jcssp.2018.1104.1114 He got more than 11 years of teach-
61. Nwankpa C, Ijomah W, Gachagan A, Marshall S, Activation ing and published more than 40 re-
Functions: Comparison of trends in Practice and Research for search articles in reputed National
Deep Learning, (2018) 1–20. http://arxiv.org/abs/1811.03378 and International Conferences &
Journals. His current research inter-
62. Tao J, Fang X (2020) Toward multi-label sentiment analysis: a
ests include Digital image process-
transfer learning based approach. J Big Data 7:1–26. https://doi.
ing, Text Analytics, Pattern
org/10.1186/s40537-019-0278-0
Recognition and Medical Imaging.
63. Hancock JT, Khoshgoftaar TM (2020) Survey on categorical data
for neural networks. J. Big Data 7. https://doi.org/10.1186/s40537-
020-00305-w
64. Wang B, Wang A, Chen F, Wang Y, Kuo CCJ (2019) Evaluating
word embedding models: methods and experimental results.
APSIPA Trans Signal Inf Process 8:1–13. https://doi.org/10.1017/
ATSIP.2019.12 Prof. Venkata Krishna Kishore
65. Pennington J, Socher R, Manning CD, GloVe: Global vectors for Kolli received his Doctoral degree
word representation, in: EMNLP 2014–2014 Conf. Empir. in Computer Science and
Methods Nat. Lang. Process. Proc. Conf., 2014. https://doi.org/10. E n g i n e e r i n g f r o m Ac h a r y a
3115/v1/d14-1162 Nagarjuna University, Guntur,
66. Joulin A, Grave E, Bojanowski P, Mikolov T, Bag of tricks for Andhra Pradesh. He is currently
efficient text classification, in: 15th Conf. Eur. Chapter Assoc. working as Professor and Head
Comput. Linguist. EACL 2017 - Proc. Conf., 2017. https://doi. of Information Technology. He
org/10.18653/v1/e17-2068 got more than 23 years of teaching
67. Howard J, Ruder S Universal language model fine-tuning for text and Research experience. His cur-
classification, in: ACL 2018 - 56th Annu. Meet. Assoc. Comput. rent research interests include
Linguist. Proc. Conf. (Long Pap., 2018. https://doi.org/10.18653/ Digital image processing, Text
v1/p18-1031 Analytics, Pattern Recognition
and Medical Imaging.
Publisher’s note Springer Nature remains neutral with regard to jurisdic-
tional claims in published maps and institutional affiliations.