Articulo Textos en Ingles FACPYA
Articulo Textos en Ingles FACPYA
https://www.scirp.org/journal/jcc
ISSN Online: 2327-5227
ISSN Print: 2327-5219
Keywords
Data Pre-Processing, Machine Learning Algorithms, Emotion Deduction,
Sentiment Analysis
1. Introduction
A significant amount of text data has accumulated as a result of the post-COVID
spike in social media. This textual information reservoir has the capacity to
DOI: 10.4236/jcc.2023.1111010 Nov. 30, 2023 183 Journal of Computer and Communications
T. Velmurugan, B. Jayapradha
2. Literature Survey
The significance of extracting emotions from text by using Natural Language
Processing (NLP) has kindled the research interests of many researchers in this
domain. While it’s impractical to analyse deep into every research study com-
prehensively in this section, some of the related works on the related area has
been discussed in this section. The main innovation in a study by Kantrowitz [4]
is the recommendation to use a dictionary-based stemmer, which is effectively a
perfect stemmer to analyse its impact on data retrieval. Its performance can be
selectively changed in terms of coverage and accuracy. The system designers can
more accurately evaluate the relative trade-offs between desired levels and in-
crease stemming accuracy by using this stemmer.
Another research work by Sridevi et al., titled “Impact of Preprocessing on
Twitter Based Covid-19 Vaccination Text Data by Classification Techniques “in
[5], takes up Twitter dataset and performs pre-processing on the data. It uses the
classification algorithms LIBLINEAR and Bayes Net to determine the most ef-
fective techniques for data for preprocessing purposes. It is determined that
pre-processed data results in greater performance and precision for the data
analysis in contrast to the raw data.
The Sentiment Analysis and Emotion Detection subfield, with a focus on
text-based emotion detection, is covered in detail in the article [6] by Acheam-
pong. It begins by outlining the fundamental ideas of text-based emotion detec-
tion, emotion models, and emphasising the accessibility of big datasets necessary
for the field of study. The article then describes the three main strategies fre-
quently used in the creation of text-based emotion detection systems, outlining
their advantages and disadvantages. The paper concludes by outlining current
difficulties and prospective future study avenues for academics and researchers
in the field of text-based data.
A research work titled, “Hierarchical Bi-LSTM based emotion analysis of tex-
tual data “by Mahto et al., [7] suggests an improved deep neural network (EDNN)
based on a hierarchical Bidirectional Long Short-Term Memory (Bi-LSTM) model
for emotion analysis. The findings show that, in comparison to the current CNN-
LSTM model, the suggested hierarchical Bi-LSTM technique achieves an average
accuracy of 89% for emotion analysis. Another work carried out by Kumar et al.
[8], has put forward the Emotion-Cause Pair Extraction (ECPE) technique to
preprocess the text data at the clause level. To create sets of emotion and cause
pairs for a document, it isolates cause clauses from emotion clauses, pairs them,
and filters them. The BERT model receives its input from these pre-processed
data. The classifier model performs at the cutting edge on a benchmark corpus
for emotion analysis. The ECPE-BERT emotion classifier beats previous models
on English sentences, obtaining a remarkable accuracy of 98%.
An article by Rashid et al. in [9], the researchers describe the Aimens system,
which analyses textual dialogue to identify emotions. The Long Short-Term
Memory (LSTM) model, which is based on deep learning, is employed by the sys-
tem to identify emotions like happiness, sadness, and anger in context-sensitive
speech. The system’s primary input is a mixture of word2vec and doc2vec em-
beddings. The output findings exhibit significant f-score changes from the base-
line model, where the Aimens system score is 0.7185. In the research article ti-
tled “An effective approach for emotion detection in multimedia text data using
sequence based convolutional neural network” by Shrivastava et al., in [10], the
authors offer a framework built upon Deep Neural Networks (DNN) for han-
dling the problem of emotion identification inside multimodal text data. A TV
show’s transcript was used to create a brand-new dataset that was carefully cu-
rated for the emotion recognition test. In order to extract pertinent characteris-
tics from the text dataset, a CNN model with an attention mechanism was
trained using the obtained information. The effectiveness of the suggested model
was assessed and contrasted with benchmark models like LSTM and Random
Forest classifiers.
COMMENTS
I AM FEELING QUITE SAD AND SORRY FOR MYSELF BUT I WILL SNAP OUT OF
IT SOON!!!
I feel like I am still looking at a blank canvas blank pieces of paper
I feel like a faithful servant
I am just feeling cranky and blue
I can have for a treat or if I’m feeling festive!!
I start to feel more appreciative of what god has done for me
I am feeling more confident that we will be able to take care of this baby
I feel incredibly lucky just to be able to talk to her
I feel less keen about the army every day
The pre-processing stages of the taken dataset are as follows. Initially all the
text that are involved in the research are converted to lower case. There might be
punctuations in the text data which would be unnecessary. This step is to re-
move the punctuations in the text data. Then the stop words like in, to, is are
removed. Secondly the entire sentence is broken into tokens. Thirdly the token-
ized words are stemmed (Running is changed as Run) and lastly the words are
lemmatized which draws the essence of the sentence without changing the
meaning of the words.
Table 2 depicts the pre-processed text data taken from Table 1. The pre-
processed text may change the actual spelling of the word that is involved but
the meaning of it would be the same and would contribute to a greater extent in
the process of analysing the given dataset. The resultant data is subject to various
techniques of machine learning for further identification of emotions. These
pre-processing methods play an essential role to extract the exact content needed
from the social media data [12].
The research also produces another file that compares the word count between
the original text and the pre-processed text.
COMMENTS PRE-PROCESSED
Table 3 illustrates the reduction in the number of words throughout the differ-
ent stages of pre-processing for the research. It is discovered that the total word
count in the original text has been reduced to approximately one-third of its
original size after completing the pre-processing steps.
1 25 13
2 16 8
3 23 11
4 9 5
5 13 6
6 26 8
7 8 5
8 18 11
9 24 11
10 43 22
Table 3 shows the comparison of the number of words that has been in the
original dataset and the number of words produced after pre-processing. It is
clearly evident that the number of words in each case of the comments gets re-
duced to one-third of the words in the original data which is taken for the re-
search. This makes the further work of predictions on the chosen dataset easier
and simple. It would be easy for applying any of the algorithms on the dataset
for further analysis of predicting the emotion.
The data from various social network platforms are useful for the real-world
language by using more training data and longer sequences than BERT [19].
3) No Next Sentence Prediction (NSP): Unlike BERT, RoBERTa does not
use the Next Sentence Prediction task during pre-training instead it focuses en-
tirely on MLM. This modification has been shown to boost performance in
downstream NLP tasks [20].
4) Hyperparameter Tuning: Roberta optimises hyperparameters and train-
ing procedures in order to improve the model performance [21].
4. Experimental Results
The dataset that is involved for this research work consists of individual persons
comments taken from a social media post. These comments are all in a direct
speech where the individual expresses his or her emotion at that moment of
time. These comments depict their mood and their thoughts at that particular
time while expressing their emotions. These emotional text data in the form of
comments help us to know about their current emotion at that particular time,
however it does not serve as a measure to know all about the persons on the
whole. The given dataset consists of data in an uncleaned format. In order to
process the data for better understanding and to withdraw insights from it by
applying data mining algorithms on it, the gathered data should be cleaned and
pre-processed. The dataset taken here consists of missing words, wrong spell-
ings, punctuations, conjunctions, prepositions and many more. This noisy in-
formation accumulated along with the data should be removed for better under-
standing of the textual data before any algorithm is applied on it for analysis.
After the process of pre-processing on the dataset, which is involved in this
research, the text is subject to a process of fine grinding by the RoBerta model.
As it is a fine grinded model, the model extracts around 28 emotions on the
whole. This is comparatively a larger number of emotion when compared to the
traditional methods which would extract only a fewer emotion from the text
data.
The result obtained is shown in Table 4 which extracts the emotion involved
in the text data taken for the research. Column 1 is “comments” which is the
original uncleaned text data. Column 2 is the result of pre-processing on the text
data. After the data is subject to the pre-processing pipeline, it is transformed as
a cleaned data eliminating the unnecessary words in the text, without changing
the actual meaning of the text. This would be efficient in extracting the emotion
I feel bitchy but not defeated yet [“feel”, “bitchi”, “defeat”, “yet”] Anger
I was dribbling on mums coffee table looking out of the [“dribbl”, “mum”, “coffe”, “tabl”, “look”, “window”, “feel”, joy
window and feeling very happy “happi”]
I woke up often got up around am feeling pukey radiation [“woke”, “often”, “got”, “around”, “feel”, “pukey”, “radiat”, neutral
and groggy “groggi”]
I walked out of there an hour and fifteen minutes later [“walk”, “hour”, “fifteen”, “minut”, “later”, “feel”, “like”, sadness
feeling like I had been beaten with a stick and then placed “beaten”, “stick”, “place”, “rack”, “stretch”]
on the rack and stretched
I never stop feeling thankful as to compare with others [“never”, “stop”, “feel”, “thank”, “compar”, “other”, gratitude
I considered myself lucky because I did not encounter “consid”, “lucki”, “encount”, “ruthless”, “pirat”, “wit”,
ruthless pirates and I did not have to witness the slaughter “slaughter”, “other”]
of others
I didn’t feel abused and quite honestly it made my day a [“feel”, “abus”, “quit”, “honestli”, “made”, “day”, “littl”, Joy
little better “better”]
I know what it feels like he stressed glaring down at her as [“know”, “feel”, “like”, “stress”, “glare”, “squeez”, “soap”, Neutral
she squeezed more soap onto her sponge “onto”, “spong”]
I also loved that you could really feel the desperation in [“also”, “love”, “could”, “realli”, “feel”, “desper”, “sequenc”, Love
these sequences and I especially liked the emotion “especi”, “like”, “emot”, “knight”, “squir”, “theyv”,
between knight and squire as they’ve been together in a “togeth”, “similar”, “fashion”, “batman”, “robin”, “long”,
similar fashion to batman and robin for a long time now “time”]
I had lunch with an old friend and it was nice but in [“lunch”, “old”, “friend”, “nice”, “gener”, “im”, “feel”, Joy
general I’m not feeling energetic “energet”]
from text data taken for this research. Column 3 is the result of applying the fine
grind model, RoBERTa on the text data to extract the emotion hidden in the
text. The result obtained may serve as a great key in disclosing the emotion hid-
den in an individual just by analysing the textual content of the person. This
would serve as an essential factor to understand the person and their attitude. It
would also be of great value for various stakeholders in various fields as in to-
day’s scenario it becomes very crucial and necessary to understand the attitude
of an individual for various reasons.
Table 5 contains a sample of the various emotions extracted from text data of
the dataset taken for this research. It shows around 15 emotions in column 1,
which is obtained after the process of extracting the emotion by the RoBERTa
model. Column 2 shows the count of the number of emotions in each of the ex-
tracted emotion.
The research uncovers a number of emotions from text data. The traditional
models and algorithms would be able to produce only fewer emotions which
would be not sufficient to know the attitude of the person who has commented
through text data. The result of the research would be of immense help for vari-
ous stakeholders to know and understand the attitude of the individual. This
would not only serve as a factor to know about the person but also understand
the person’s inner feeling or emotion at that particular moment of commenting
through text data.
Emotions Count
Remorse 2
Neutral 4
Approval 1
Annoyance 2
Joy 5
Admiration 1
Caring 1
Realization 2
Embarrassment 3
Anger 1
Sadness 3
Gratitude 1
Love 2
Optimism 1
Disappointment 1
Figure 3 depicts a sample of the various emotions extracted from the dataset
taken for this research. A sample of the emotion extracted from the result is
shown. It shows various emotions like joy, anger, care, sadness and many more.
It also gives the count of the number of people with the particular emotion.
Unlike sentimental analysis which predicts whether the given text is positive,
negative or neutral, this emotional extraction model, goes deeper and assesses
the emotion of the person. The result obtained would serve as a boon for various
stakeholders to know the attitude of a person.
5. Conclusion
Finding and analysing the emotions in social media text data is not an easy task.
A crucial analysis is required to find the emotions hidden in text data. This re-
search work is carried out to extract the hidden emotions from textual data and
has yielded remarkable results, enabling us to identify approximately 28 distinct
emotions within the text. These findings hold great promise for a wide range of
applications. These applications include enhancing our understanding of an in-
dividual’s personality and attitude. It also provides valuable insights for various
stakeholders. Educators can utilize this information to better comprehend the
attitudes of their students. This can enable more enhanced and effective teaching
strategies to improve the understanding of the student’s community. Parents can
gain insight of the emotional states of their children, which would aid in the
prevention of mental health problems, attitudes and suicidal thoughts in chil-
dren. Additionally, interviewers can use this knowledge to gain a deeper under-
standing of the mental and emotional position of potential and eminent candi-
dates. This would improve the selection and placement process. In essence, the
Conflicts of Interest
The authors declare no conflicts of interest regarding the publication of this pa-
per.
References
[1] Mason, A.N., Narcum, J. and Mason, K. (2021) Social Media Marketing Gains Im-
portance after Covid-19. Cogent Business & Management, 8, Article ID: 1870797.
https://doi.org/10.1080/23311975.2020.1870797
[2] Feldkamp, J. (2021) The Rise of TikTok: The Evolution of a Social Media Platform
during COVID-19. In: Hovestadt, C., Recker, J., Richter, J. and Werder, K., eds.,
Digital Responses to Covid-19: Digital Innovation, Transformation, and Entrepre-
neurship during Pandemic Outbreaks, Springer, Cham, 73-85.
https://doi.org/10.1007/978-3-030-66611-8_6
[3] Chong, W.Y., Selvaretnam, B. and Soon, L.-K. (2014) Natural Language Processing
for Sentiment Analysis: An Exploratory Analysis on Tweets. 2014 4th International
Conference on Artificial Intelligence with Applications in Engineering and Tech-
nology, Kota Kinabalu, Malaysia, 03-05 December 2014.
https://doi.org/10.1109/ICAIET.2014.43
[4] Kantrowitz, M., Mohit, B. and Mittal, V. (2000) Stemming and Its Effects on TFIDF
Ranking. Proceedings of the 23rd Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval, Athens, Greece, 24-28 July
2000, 357-359. https://doi.org/10.1145/345508.345650
[5] Sridevi, P.C. and Velmurugan, T. (2022) Impact of Preprocessing on Twitter Based
Covid-19 Vaccination Text Data by Classification Techniques. 2022 International
Conference on Applied Artificial Intelligence and Computing (ICAAIC), Salem, In-
dia, 9-11 May 2022. https://doi.org/10.1109/ICAAIC53929.2022.9792768
[6] Acheampong, F.A., Chen, W.Y. and Nunoo-Mensah, H. (2020) Text-Based Emotion
Detection: Advances, Challenges and Opportunities. Engineering Reports, 2, Article
ID: e12189. https://doi.org/10.1002/eng2.12189
[7] Dashrath and Subhash Chandra Yadav (2022) Hierarchical Bi-LSTM Based Emotion
Analysis of Textual Data. Bulletin of the Polish Academy of Sciences, Technical Sci-
ences, 70, Article No. e141001.
[8] Kumar, A. and Jain, A.K. (2022) Emotion Detection in Psychological Texts by Fine-
Tuning BERT Using Emotion-Cause Pair Extraction. International Journal of Speech
Technology, 25, 727-743. https://doi.org/10.1007/s10772-022-09982-9
[9] Rashid, U., Iqbal, M.W., Skiandar, M.A., Raiz, M.Q., Naqvi, M.R. and Shahzad, S.K.
(2020) Emotion Detection of Contextual Text Using Deep Learning. 2020 4th Inter-
national Symposium on Multidisciplinary Studies and Innovative Technologies
(ISMSIT), Istanbul, Turkey, 22-24 October 2020, 1-5.
https://doi.org/10.1109/ISMSIT50672.2020.9255279
[10] Shrivastava, K., Kumar, S. and Jain, D.K. (2019) An Effective Approach for Emotion
Detection in Multimedia Text Data Using Sequence Based Convolutional Neural
Network. Multimedia Tools and Applications, 78, 29607-29639.
https://doi.org/10.1007/s11042-019-07813-9
[11] Jayapradha, B. and Velmurugan, T. (2003) Pre-Processing Emotional Text Data for