0% found this document useful (0 votes)
14 views

Effective Prediction of Fake News Using A Learning Vector Quantization

The document discusses methods for effectively predicting fake news using machine learning techniques. It proposes a novel learning vector quantization algorithm using hamming distance measures and compares its performance to existing algorithms, achieving a prediction rate of 93.54%.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Effective Prediction of Fake News Using A Learning Vector Quantization

The document discusses methods for effectively predicting fake news using machine learning techniques. It proposes a novel learning vector quantization algorithm using hamming distance measures and compares its performance to existing algorithms, achieving a prediction rate of 93.54%.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Measurement: Sensors 25 (2023) 100601

Contents lists available at ScienceDirect

Measurement: Sensors
journal homepage: www.sciencedirect.com/journal/measurement-sensors

Effective prediction of fake news using a learning vector quantization with


hamming distance measure
Sudhakar M *, K.P. Kaliyamurthie
Department of Computer Science and Engineering, Bharath Institute of Higher Education and Research, Chennai, India

A R T I C L E I N F O A B S T R A C T

Keywords: It never happened before in human history the spreading of fake news; now, the development of the Worldwide
Fake news Web and the adoption of social media have given a pathway for people to spread misinformation to the world.
Passive aggressive classifier Everyone is using the Internet, creating and sharing content on social media, but not all the information is valid,
LS-SVM
and no one is verifying the originality of the content. It is sometimes complicated for researchers and intelligence
Novel learning vector quantization (LVQ)
LSTM
to identify the essence of the content. For example, during Covid-19, misinformation spread worldwide about the
outbreak, and much false information spread faster than the virus. This misinformation will create a problem for
the public and mislead people into taking the proper medicine. This work will help us to improve the prediction
rate. The proposed algorithm is compared with three existing algorithms, and the result is better than the other
three current algorithms. The prediction rate of impact for the proposed algorithms is 93.54%

1. Introduction community detect this misinformation from social media. The auxiliary
information produced by fake news is also challenging to exploit since
Today the development of www, everyone has access to social the data is noisy, incomplete, and unstructured due to the user’s
media, and not all the information is not accurate in social media; some engagement with the stories. A false statement can negatively affect
data is valid, but it is challenging to identify. Covid-19 started in China individuals and society. Fake news secondly aims to persuade consumers
in 2019 and slowly began spreading worldwide. In August 2022, the to believe inaccurate, biased information. It often misled by politicians
whole world will be affected by Covid-19; many people have been to further their political agendas.
affected, and it caused many deaths worldwide [WHO] [1]. This A machine learning ensemble model based on the Decision Tree (DT)
pandemic caused many problems worldwide; many institutions were classifier, the Random Forest (RF) algorithm, and the Extra Tree (ET)
closed, and many countries were affected economically and imple­ algorithm is used to classify the extracted features further. As a result of
mented social distancing and lockdowns. Due to this pandemic, many this paper, we have identified the most critical features of fake news
people and students are started using more Internet. Everyone will classification using feature extraction. Ensemble models were selected to
create and share content on social media; sometimes, such information optimize classification accuracy. The ensemble classifier trained more
will not be accurate. Many of them will share the content without quickly. Authors of the article used computer models for analyzing the
checking the originality. verification of news extracted from Twitter is considered for the
Today, social media is a powerful medium to share information and expository demonstrations for fake news recognition. The deep learning-
ideas, allowing everyone to share their thoughts, but some will take a based model that identifies phoney information is derived from the su­
negative perspective. For example, during the pandemic, some of them pervised models [13]. The primary algorithms study showed that even
started spreading that Covid-19 came because of the 5G technology, and on such an essential issue as spreading fake news worldwide, even the
some of them shared that the vaccines will kill people rather than save simplest algorithms could find a decent outcome. As a consequence, the
people. This type of false information will cause many problems in so­ results of this study prove even more that systems like these might come
ciety and confuse the people ready to take medicine [2]. in very handy and be effectively used to deal with this critical issue. We
The main aim of this misinformation is to destroy the original in­ present an algorithm for identifying fake news based on well-known
formation and mislead the people around it. This research will help the Twitter strings. By expanding their credibility decisions, such a model

* Corresponding author.
E-mail addresses: [email protected] (S. M), [email protected] (K.P. Kaliyamurthie).

https://doi.org/10.1016/j.measen.2022.100601
Received 30 September 2022; Received in revised form 16 November 2022; Accepted 21 November 2022
Available online 5 December 2022
2665-9174/© 2022 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
S. M and K.P. Kaliyamurthie Measurement: Sensors 25 (2023) 100601

could be precious for many social media users [3]. Our research used a 2.1. Dataset
tree-based Ensemble Machine Learning framework (Gradient Boosting)
to discover fake news by combining content and context-based features. The University of Victoria created the ISOT dataset [5]. It contains
A recent study has derived gradient descent algorithms for adaptive 23,481 fake news articles and 21,417 accurate news articles.
boosting methods. A single objective function is optimized, which is the
rationale for crucial elements in the processes. Various machine learning 2.2. Methodologies
models are applied to classify data using a multiclass dataset (FNC).
Based on experimental results, the ensemble framework is more effec­ Tokenization is the first step. It is the process of splitting the string
tive than existing benchmarks. For the multiclass classification of fake into a list of tokens [5]. Then, the stop words are removed from the text
news having four classes, we achieved an accuracy of 86% using the after tokenizing it. When used as features in text classification, stop
Gradient Boosting algorithm, an ensemble machine learning framework words create noise as they are little words in the language. The familiar
[14]. It demonstrates how fake news can be detected and categorized. comments are removed, such as a, about, an, are, as, at, be, by, for, from,
The SVM was used to aggregate information and compare these pro­ how, in, is, of, on, or, that, the these, this, too, was, what, when, where,
posed outcomes to establish whether the communication was authentic who, will, and so on. Then Streaming is applied to reduce the words to
or pure fabrication. The proposed model’s results are accurate up to their root. And feature extraction is used to extract more relevant fea­
93.6%, and they might also make restrictive assumptions about partic­ tures. Hence, we selected less valuable features to improve the
ular cases, limiting their applicability [4]. The main aim of this study is performance.
to propose an efficient neural network model for effectively predicting
fake news.
2.3. Algorithms
2. Materials and methods
Proposed Learning Vector Quantization (LVQ).
Our proposed LVQ is an AI-based neural network algorithm. This
In this phase, we will preprocess the data, select the features for the
proposed algorithm will use to measure the distance measures and to
data, select a new model, tune hyper parameters and train the model. In
find the closest neuron; here, in this algorithm, hamming distance is
the proposed approach, the ISOT datasets that were common among
used instead of Euclidean Distance. The Proposed Algorithm consists of
fake news datasets were identified and analyzed. The tokenizing will has
three significant steps (Fig. 1).
done after the completion of preprocessing. This one will help us to
Input:
reduce the text size to small words.
Many neurons, weights for each, and the corresponding labels.
The main aim of this research objective is to extract the features from
the text and then use these for fake news detection instead of text. Once
the components are evoked, we train the model using the machine 2.4. Passive Aggressive classifier
learning algorithms such as LVQ, LSTM, LS-SVM and Aggressive Clas­
sifiers. The proposed model is going to evaluate using the different types Most large-scale learning algorithms use passive-aggressive tech­
of evaluation metrics. niques. Machine learning algorithms that can learn online receive data
sequentially and update the machine learning model step-by-step
instead of batch learning using the entire training dataset simulta­
neously. As passive-aggressive algorithms don’t require a learning rate,
they are somewhat similar to Perceptron models. Regularizes are

Fig. 1. Prosed architecture.

2
S. M and K.P. Kaliyamurthie Measurement: Sensors 25 (2023) 100601

Algorithm Steps:

Step 1: For each input, find the closest neuron (using hamming distance algorithm)
Step 2: Update the respective weights for the neurons.
Step 3: Label each neuron with the corresponding weights.
Step 4: Train the neural network’ until it gets the optimized result.
Step 5: Evaluate the trained model.

incorporated into these algorithms [6]. Table 2


It consists of two steps: If the prediction is correct, then step 1 is Comparison of precision for fake news prediction.
passive, which means you don’t need to change the model. The second PROPOSED PASSIVE LS-SVM LSTM
step is to be aggressive, i.e., change the model if the forecast is incorrect. LEARNING AGGRESSIVE ALGORITHM ALGORITHM
VECTOR CLASSIFIER (PRECISION IN (PRECISION IN
QUANTIZATION (PRECISION PERCENTAGE) PERCENTAGE)
2.5. LSTM (Long Short-Term Memory) [7] (PRECISION IN
IN PERCENTAGE) PERCENTAGE)
Time-series models with LSTMs are compelling. LSTMs are capable
1 92.68 65.74 72.23 55.93
of predicting arbitrary future events. (“Long Short-Term Memory”, n.d.) 2 92.71 67.67 71.23 50.17
An LSTM module (or cell) contains five essential components that allow 3 93.04 69.27 70.20 51.20
it to model long-term and short-term data. State of the cell (CT). The 4 93.04 72.1 69.27 54.34
5 92.4 72.03 72.11 54.00
state of the cell stores both short-term and long-term memories. HT
6 93.01 64.00 71.00 56.73
(Hidden state) - It manipulates the previous hidden state and current cell 7 93.01 65.22 72.00 58.67
input to make the following prediction; the hidden state can retrieve 8 92.1 72.15 69.00 53.45
either the short-term or long-term memory stored in the cell state. The 9 92.36 65.74 68.20 58.81
input gate (it) determines how much information from the current input 10 93.54 67.67 73.25 52.43
flows into the cell’s state. A forget gate (ft) determines how much in­
formation is published from the current input into the cell state. Table 3: shows the significance of the four groups, demonstrating
Depending on the cell state, the output gate might decide how much that of the importance of the proposed Learning Vector Quantization
data will be fetched from the hidden state, such that the LSTM can only algorithm has better performance with an effective rate of 0.000, which
access long-term memories or short-term memories and long-term is lesser than P < 0.05. ANOVA t-test method is used to test and find the
memories if necessary [8]. statistical difference between their means for Our proposed algorithm
Learning Vector Quantization, the passive accumulation classifier, the
2.6. LS-SVM (Least squares support vector machines) [9] LS-SVM algorithm, and the LSTM algorithm.
Table 4: represents the results of the Bonferroni comparison for four
A set of support vector machines, including most tiny squares ver­ groups based on the precision rate obtained for each sample. It also
sions of these machines (LS-SVM). Methods for classifying and predict­ proves that the performance of the proposed model is significantly better
ing data and patterns, such as supervised learning. As opposed to the than the other three groups considered for this experiment.
convex quadratic programming (QP) problem for classical SVMs, this Fig. 2: shows the graphical representation of the comparison of mode
version relies on solving linear equations to find the solution. precision obtained among four algorithm groups. The precision rate for
A typical algorithm step involves selecting the proper inputs and the LR is 68%, LSTM is 50%, the Passive Aggressive classifier is 66%, and
layers and covering the minimum and maximum range displacement. LVQ is 93%. It has proven that the proposed Learning Vector Quanti­
Repeat until results are optimized. The model is then tested. zation technique performs better, having the highest mode precision
rate.
3. Results and discussions Compared to other research, this research will be pretty exciting
results compared to the sample size and accuracy. This comparison
Table 1 shows the classification report for the proposed learning demonstrates how different algorithms perform in detecting fake news
vector quantization algorithm. It shows the precision as 93% and 90% since datasets keep growing over time. With a 92.8% accuracy rate,
for fake news and trustworthy news, respectively; similarly, the recall SVMs in conjunction with TF-IDF Vectorizer proved to be the most
value is 92% and 91%, while f1-score is 93% and 91%, respectively, effective combination. The logistic regression also performed very well;
supporting the weight. the result was 91%. Whenever the sample size increased, neither Naive
Table 2 compares the predicted rate obtained by four algorithms for Bayes nor Decision Tree significantly improved scores, resulting in ac­
fake news prediction. These experiments consider the sample size to be curacy rates of 85% and 81%, respectively. In contrast, neural networks
N = 10; therefore, n number of tests will be conducted for each algo­ performed worst and consistently yielded the lowest accuracy of 49%
rithm. Finally, our proposed Learning Vector Quantization algorithm
will provide the highest precision rate of 93.54, and other algorithms
will provide as follows, LS-SVM 73.25, LSTM 58.81, and Passive Table 3
Aggressive Classifier will provide 72.15. ANOVA T-test.
ANOVA
Table 1 PRECISION
Classification report of proposed model.
Sum of Squares df Mean Square F Sig.
Class Precision Recall f1-Score Support
Between Groups 8276.197 3 2758.732 28.539 .000
0 0.93 0.92 0.93 3389 Within Groups 3479.917 36 96.664
1 0.90 0.91 0.91 2646 Total 11756.115 39

3
S. M and K.P. Kaliyamurthie Measurement: Sensors 25 (2023) 100601

Table 4
Multiple comparison test.
Multiple Comparisons

Dependent Variable:

Bonferroni

(I) ALGORITHM Mean Difference (I–J) Std. Error Sig. 95% Confidence Interval

Lower Bound Upper Bound

LEARNING VECTOR QUANTIZATION PASSIVE AGGRESSIVE CLASSIFIER 31.130* 4.397 .000 18.85 43.41
LS-SVM 21.940* 4.397 .000 9.66 34.22
LSTM 38.216* 4.397 .000 25.94 50.49
PASSIVE AGGRESSIVE CLASSIFIER LEARNING VECTOR QUANTIZATION − 31.130* 4.397 .000 − 43.41 − 18.85
LS-SVM − 9.190 4.397 .262 − 21.47 3.09
LSTM 7.086 4.397 .695 − 5.19 19.36
LS-SVM LEARNING VECTOR QUANTIZATION − 21.940* 4.397 .000 − 34.22 − 9.66
PASSIVE AGGRESSIVE CLASSIFIER 9.190 4.397 .262 − 3.09 21.47
LSTM 16.276* 4.397 .004 4.00 28.55
LSTM LEARNING VECTOR QUANTIZATION − 38.216* 4.397 .000 − 50.49 − 25.94
PASSIVE AGGRESSIVE CLASSIFIER − 7.086 4.397 .695 − 19.36 5.19
LS-SVM − 16.276* 4.397 .004 − 28.55 − 4.00

*. The mean difference is significant at the 0.05 level.

achieves an accuracy of 94.21%. The prediction accuracy of our model is


good when a news article is unrelated to a headline, agreed to, or dis­
agreed to, but low when the stance is opposed to (44%). The second
model compared to BoW- DNN is very surprising since pre-trained word
embeddings always yielded low accuracy scores. The size of the article
can be a contributing factor to this phenomenon. Hence, the Word2Vec
model may struggle to capture word semantic level importance if the
news article length is exceptionally long [4].

4. Conclusion

It is challenging to manually classify or identify whether the news is


fake or real. In this research, we discussed the classifying fake news
articles using machine learning models and we proposed a supervised
machine learning-based algorithm to organize the information. Fake
news detection has many open issues that require attention of re­
searchers. In order to reduce the spread of fake news, we need to identify
Fig. 2. Graph represents the comparison of mode precision rate. the key elements involved in the spread of news. In this proposed LVQ
achieved the result is 93.54% of the precision rate. In future we can
[10]. The Proposed framework reveals that only a tiny fraction of the research about the real time fake news identification in videos.
models achieve an AUC greater than 0.85. Based on these results, models
with specific combinations of features are usually indicative of fake CRediT authorship contribution statement
news detection. There can be no single solution to all types of fake news
stories. Analysis like this shows the complexity of the problem. Various Sudhakar M: Conceptualization, Methodology, Software, Valida­
models of clusters based on random combinations of features are pre­ tion, Formal analysis, Investigation, Resources, Data curation, Writing –
sented in this work. The ensemble technique that combines models from original draft, Writing – review & editing, Visualization, Supervision,
different collections appears to be a promising method of investigation Project administration.
[11]. A performance analysis is performed on three datasets to deter­
mine the effectiveness of different approaches. On a dataset comprising
fewer than 100k news articles with a precision rate of, Naive Bayes with Declaration of competing interest
n-gram can achieve similar results to neural network-based models. Data
size and information the user provides greatly influence the perfor­ The authors declare the following financial interests/personal re­
mance of LSTM-based models. If done correctly, Based on an article in lationships which may be considered as potential competing interests:
the news, LSTM-based models are more likely to overcome overfitting. M. Sudhakar reports was provided by Bharath Institute of Higher
Furthermore, advanced models such as the C-LSTM, Conv-HAN, and Education and Research. M. Sudhakar reports a relationship with
character-level C-LSTM have shown high promise in detecting fake news Bharath Institute of Higher Education and Research that includes.
and require further attention [10]. According to an evaluation of the
Bidirectional LSTM model, it is trained with 10,235,010 trainable pa­ Data availability
rameters. The trained result for the three epochs with a batch size of 32
is considered. Initial results show 95.69% accuracy and a loss of 13.49% The authors do not have permission to share data.
for this model. In epoch two, accuracy increased to 95.94%, with a loss
of 10.53%. In epoch three, accuracy increased to 96.30%, and the loss References
was 9.65%. Therefore, it proved that 96.30% accuracy and 9.65% value
[1] W. H. Organization, Weekly epidemiological update on covid-19-10 august 2022
loss were better results using the Bidirectional LSTM model [12]. After [Online]. Available, https://www.who.int/publications/m/item/weekly-epidemi
parameter tuning, the Tf-IDF - Dense neural network (DNN) model ological-update-on-covid-19–-10-august-2022, 2022.

4
S. M and K.P. Kaliyamurthie Measurement: Sensors 25 (2023) 100601

[2] S. Yang, J. Jiang, A. Pal, K. Yu, F. Chen, S. Yu, Analysis and insights for myths [8] Pritika Bahad, Preeti Saxena, Raj Kamal, Fake news detection using Bi-directional
circulating on twitter during the covid-19 pandemic, IEEE Open J. Comp Soc. 1 LSTM-recurrent neural network, Procedia Comput. Sci. (2019), https://doi.org/
(2020) 209–219. 10.1016/j.procs.2020.01.072.
[3] Xin Li, Peixin Lu, Lianting Hu, Xiaoguang Wang, Long Lu, A novel self-learning [9] Yunrong Xiang, Liangzhong Jiang, Water quality prediction using LS-SVM and
semi-supervised deep learning network to detect fake news on social media, particle swarm optimization, in: 2009 Second International Workshop on
Multimed. Tool. Appl. (June) (2021) 1–9. Knowledge Discovery and Data Mining, 2009, https://doi.org/10.1109/
[4] Karishma Sharma, Feng Qian, He Jiang, Natali Ruchansky, Ming Zhang, Yan Liu, wkdd.2009.217.
"Combating fake news." ACM transactions on intelligent systems and technology. [10] Kai Shu, Huan Liu, Detecting Fake News on Social Media, Morgan & Claypool
https://doi.org/10.1145/3305260, 2019. Publishers, 2019.
[5] Muhammad Zubair Asghar, Fazli Subhan, Hussain Ahmad, Wazir Zada Khan, [11] Raed Alharbi, Minh N. Vu, My T. Thai, “Evaluating fake news detection models
Saqib Hakak, Thippa Reddy Gadekallu, Mamoun Alazab, Senti-eSystem : a from explainable machine learning perspectives, in: ICC 2021-IEEE International
sentiment-based eSystem -using hybridized fuzzy and deep neural network for Conference on Communications, 2021, https://doi.org/10.1109/
measuring customer satisfaction, Software Pract. Ex. (2021), https://doi.org/ icc42927.2021.9500467.
10.1002/spe.2853. [12] Dibyajyoti Baishya, Joon Jyoti Deka, Gaurav Dey, Pranav Kumar Singh, SAFER:
[6] Naresh Manwani, Mohit Chandra, Exact passive-aggressive algorithms for ordinal sentiment analysis-based FakE review detection in E-commerce using deep
regression using interval labels, IEEE Transact. Neural Networks Learn. Syst. 31 (9) learning, SN Comp. Sci. (2021), https://doi.org/10.1007/s42979-021-00918-9.
(2020) 3259–3268. [13] Nello Cristianini, Shawe-Taylor John, Department of Computer Science Royal
[7] David M.Q. Nelson, Adriano C.M. Pereira, Renato A. de Oliveira, Stock market’s Holloway John Shawe-Taylor, An Introduction to Support Vector Machines and
price movement prediction with LSTM neural networks, in: 2017 International Other Kernel-Based Learning Methods, Cambridge University Press, 2000.
Joint Conference on Neural Networks, IJCNN, 2017, https://doi.org/10.1109/ [14] Rohit Kumar Kaliyar, Anurag Goswami, Pratik Narang, Multiclass fake news
ijcnn.2017.7966019. detection using ensemble machine learning, in: 2019 IEEE 9th International
Conference on Advanced Computing, IACC, 2019, https://doi.org/10.1109/
iacc48062.2019.8971579.

You might also like