Effective Prediction of Fake News Using A Learning Vector Quantization
Effective Prediction of Fake News Using A Learning Vector Quantization
Measurement: Sensors
journal homepage: www.sciencedirect.com/journal/measurement-sensors
A R T I C L E I N F O A B S T R A C T
Keywords: It never happened before in human history the spreading of fake news; now, the development of the Worldwide
Fake news Web and the adoption of social media have given a pathway for people to spread misinformation to the world.
Passive aggressive classifier Everyone is using the Internet, creating and sharing content on social media, but not all the information is valid,
LS-SVM
and no one is verifying the originality of the content. It is sometimes complicated for researchers and intelligence
Novel learning vector quantization (LVQ)
LSTM
to identify the essence of the content. For example, during Covid-19, misinformation spread worldwide about the
outbreak, and much false information spread faster than the virus. This misinformation will create a problem for
the public and mislead people into taking the proper medicine. This work will help us to improve the prediction
rate. The proposed algorithm is compared with three existing algorithms, and the result is better than the other
three current algorithms. The prediction rate of impact for the proposed algorithms is 93.54%
1. Introduction community detect this misinformation from social media. The auxiliary
information produced by fake news is also challenging to exploit since
Today the development of www, everyone has access to social the data is noisy, incomplete, and unstructured due to the user’s
media, and not all the information is not accurate in social media; some engagement with the stories. A false statement can negatively affect
data is valid, but it is challenging to identify. Covid-19 started in China individuals and society. Fake news secondly aims to persuade consumers
in 2019 and slowly began spreading worldwide. In August 2022, the to believe inaccurate, biased information. It often misled by politicians
whole world will be affected by Covid-19; many people have been to further their political agendas.
affected, and it caused many deaths worldwide [WHO] [1]. This A machine learning ensemble model based on the Decision Tree (DT)
pandemic caused many problems worldwide; many institutions were classifier, the Random Forest (RF) algorithm, and the Extra Tree (ET)
closed, and many countries were affected economically and imple algorithm is used to classify the extracted features further. As a result of
mented social distancing and lockdowns. Due to this pandemic, many this paper, we have identified the most critical features of fake news
people and students are started using more Internet. Everyone will classification using feature extraction. Ensemble models were selected to
create and share content on social media; sometimes, such information optimize classification accuracy. The ensemble classifier trained more
will not be accurate. Many of them will share the content without quickly. Authors of the article used computer models for analyzing the
checking the originality. verification of news extracted from Twitter is considered for the
Today, social media is a powerful medium to share information and expository demonstrations for fake news recognition. The deep learning-
ideas, allowing everyone to share their thoughts, but some will take a based model that identifies phoney information is derived from the su
negative perspective. For example, during the pandemic, some of them pervised models [13]. The primary algorithms study showed that even
started spreading that Covid-19 came because of the 5G technology, and on such an essential issue as spreading fake news worldwide, even the
some of them shared that the vaccines will kill people rather than save simplest algorithms could find a decent outcome. As a consequence, the
people. This type of false information will cause many problems in so results of this study prove even more that systems like these might come
ciety and confuse the people ready to take medicine [2]. in very handy and be effectively used to deal with this critical issue. We
The main aim of this misinformation is to destroy the original in present an algorithm for identifying fake news based on well-known
formation and mislead the people around it. This research will help the Twitter strings. By expanding their credibility decisions, such a model
* Corresponding author.
E-mail addresses: [email protected] (S. M), [email protected] (K.P. Kaliyamurthie).
https://doi.org/10.1016/j.measen.2022.100601
Received 30 September 2022; Received in revised form 16 November 2022; Accepted 21 November 2022
Available online 5 December 2022
2665-9174/© 2022 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
S. M and K.P. Kaliyamurthie Measurement: Sensors 25 (2023) 100601
could be precious for many social media users [3]. Our research used a 2.1. Dataset
tree-based Ensemble Machine Learning framework (Gradient Boosting)
to discover fake news by combining content and context-based features. The University of Victoria created the ISOT dataset [5]. It contains
A recent study has derived gradient descent algorithms for adaptive 23,481 fake news articles and 21,417 accurate news articles.
boosting methods. A single objective function is optimized, which is the
rationale for crucial elements in the processes. Various machine learning 2.2. Methodologies
models are applied to classify data using a multiclass dataset (FNC).
Based on experimental results, the ensemble framework is more effec Tokenization is the first step. It is the process of splitting the string
tive than existing benchmarks. For the multiclass classification of fake into a list of tokens [5]. Then, the stop words are removed from the text
news having four classes, we achieved an accuracy of 86% using the after tokenizing it. When used as features in text classification, stop
Gradient Boosting algorithm, an ensemble machine learning framework words create noise as they are little words in the language. The familiar
[14]. It demonstrates how fake news can be detected and categorized. comments are removed, such as a, about, an, are, as, at, be, by, for, from,
The SVM was used to aggregate information and compare these pro how, in, is, of, on, or, that, the these, this, too, was, what, when, where,
posed outcomes to establish whether the communication was authentic who, will, and so on. Then Streaming is applied to reduce the words to
or pure fabrication. The proposed model’s results are accurate up to their root. And feature extraction is used to extract more relevant fea
93.6%, and they might also make restrictive assumptions about partic tures. Hence, we selected less valuable features to improve the
ular cases, limiting their applicability [4]. The main aim of this study is performance.
to propose an efficient neural network model for effectively predicting
fake news.
2.3. Algorithms
2. Materials and methods
Proposed Learning Vector Quantization (LVQ).
Our proposed LVQ is an AI-based neural network algorithm. This
In this phase, we will preprocess the data, select the features for the
proposed algorithm will use to measure the distance measures and to
data, select a new model, tune hyper parameters and train the model. In
find the closest neuron; here, in this algorithm, hamming distance is
the proposed approach, the ISOT datasets that were common among
used instead of Euclidean Distance. The Proposed Algorithm consists of
fake news datasets were identified and analyzed. The tokenizing will has
three significant steps (Fig. 1).
done after the completion of preprocessing. This one will help us to
Input:
reduce the text size to small words.
Many neurons, weights for each, and the corresponding labels.
The main aim of this research objective is to extract the features from
the text and then use these for fake news detection instead of text. Once
the components are evoked, we train the model using the machine 2.4. Passive Aggressive classifier
learning algorithms such as LVQ, LSTM, LS-SVM and Aggressive Clas
sifiers. The proposed model is going to evaluate using the different types Most large-scale learning algorithms use passive-aggressive tech
of evaluation metrics. niques. Machine learning algorithms that can learn online receive data
sequentially and update the machine learning model step-by-step
instead of batch learning using the entire training dataset simulta
neously. As passive-aggressive algorithms don’t require a learning rate,
they are somewhat similar to Perceptron models. Regularizes are
2
S. M and K.P. Kaliyamurthie Measurement: Sensors 25 (2023) 100601
Algorithm Steps:
Step 1: For each input, find the closest neuron (using hamming distance algorithm)
Step 2: Update the respective weights for the neurons.
Step 3: Label each neuron with the corresponding weights.
Step 4: Train the neural network’ until it gets the optimized result.
Step 5: Evaluate the trained model.
3
S. M and K.P. Kaliyamurthie Measurement: Sensors 25 (2023) 100601
Table 4
Multiple comparison test.
Multiple Comparisons
Dependent Variable:
Bonferroni
(I) ALGORITHM Mean Difference (I–J) Std. Error Sig. 95% Confidence Interval
LEARNING VECTOR QUANTIZATION PASSIVE AGGRESSIVE CLASSIFIER 31.130* 4.397 .000 18.85 43.41
LS-SVM 21.940* 4.397 .000 9.66 34.22
LSTM 38.216* 4.397 .000 25.94 50.49
PASSIVE AGGRESSIVE CLASSIFIER LEARNING VECTOR QUANTIZATION − 31.130* 4.397 .000 − 43.41 − 18.85
LS-SVM − 9.190 4.397 .262 − 21.47 3.09
LSTM 7.086 4.397 .695 − 5.19 19.36
LS-SVM LEARNING VECTOR QUANTIZATION − 21.940* 4.397 .000 − 34.22 − 9.66
PASSIVE AGGRESSIVE CLASSIFIER 9.190 4.397 .262 − 3.09 21.47
LSTM 16.276* 4.397 .004 4.00 28.55
LSTM LEARNING VECTOR QUANTIZATION − 38.216* 4.397 .000 − 50.49 − 25.94
PASSIVE AGGRESSIVE CLASSIFIER − 7.086 4.397 .695 − 19.36 5.19
LS-SVM − 16.276* 4.397 .004 − 28.55 − 4.00
4. Conclusion
4
S. M and K.P. Kaliyamurthie Measurement: Sensors 25 (2023) 100601
[2] S. Yang, J. Jiang, A. Pal, K. Yu, F. Chen, S. Yu, Analysis and insights for myths [8] Pritika Bahad, Preeti Saxena, Raj Kamal, Fake news detection using Bi-directional
circulating on twitter during the covid-19 pandemic, IEEE Open J. Comp Soc. 1 LSTM-recurrent neural network, Procedia Comput. Sci. (2019), https://doi.org/
(2020) 209–219. 10.1016/j.procs.2020.01.072.
[3] Xin Li, Peixin Lu, Lianting Hu, Xiaoguang Wang, Long Lu, A novel self-learning [9] Yunrong Xiang, Liangzhong Jiang, Water quality prediction using LS-SVM and
semi-supervised deep learning network to detect fake news on social media, particle swarm optimization, in: 2009 Second International Workshop on
Multimed. Tool. Appl. (June) (2021) 1–9. Knowledge Discovery and Data Mining, 2009, https://doi.org/10.1109/
[4] Karishma Sharma, Feng Qian, He Jiang, Natali Ruchansky, Ming Zhang, Yan Liu, wkdd.2009.217.
"Combating fake news." ACM transactions on intelligent systems and technology. [10] Kai Shu, Huan Liu, Detecting Fake News on Social Media, Morgan & Claypool
https://doi.org/10.1145/3305260, 2019. Publishers, 2019.
[5] Muhammad Zubair Asghar, Fazli Subhan, Hussain Ahmad, Wazir Zada Khan, [11] Raed Alharbi, Minh N. Vu, My T. Thai, “Evaluating fake news detection models
Saqib Hakak, Thippa Reddy Gadekallu, Mamoun Alazab, Senti-eSystem : a from explainable machine learning perspectives, in: ICC 2021-IEEE International
sentiment-based eSystem -using hybridized fuzzy and deep neural network for Conference on Communications, 2021, https://doi.org/10.1109/
measuring customer satisfaction, Software Pract. Ex. (2021), https://doi.org/ icc42927.2021.9500467.
10.1002/spe.2853. [12] Dibyajyoti Baishya, Joon Jyoti Deka, Gaurav Dey, Pranav Kumar Singh, SAFER:
[6] Naresh Manwani, Mohit Chandra, Exact passive-aggressive algorithms for ordinal sentiment analysis-based FakE review detection in E-commerce using deep
regression using interval labels, IEEE Transact. Neural Networks Learn. Syst. 31 (9) learning, SN Comp. Sci. (2021), https://doi.org/10.1007/s42979-021-00918-9.
(2020) 3259–3268. [13] Nello Cristianini, Shawe-Taylor John, Department of Computer Science Royal
[7] David M.Q. Nelson, Adriano C.M. Pereira, Renato A. de Oliveira, Stock market’s Holloway John Shawe-Taylor, An Introduction to Support Vector Machines and
price movement prediction with LSTM neural networks, in: 2017 International Other Kernel-Based Learning Methods, Cambridge University Press, 2000.
Joint Conference on Neural Networks, IJCNN, 2017, https://doi.org/10.1109/ [14] Rohit Kumar Kaliyar, Anurag Goswami, Pratik Narang, Multiclass fake news
ijcnn.2017.7966019. detection using ensemble machine learning, in: 2019 IEEE 9th International
Conference on Advanced Computing, IACC, 2019, https://doi.org/10.1109/
iacc48062.2019.8971579.