0% found this document useful (0 votes)
4 views

A Hybrid Transformer Model for Fake News Detection Leveraging Bayesian Optimization and Bidirectional Recurrent Unit

This paper presents a novel hybrid Transformer model for fake news detection that combines Bayesian optimization with a Bidirectional Gated Recurrent Unit (BiGRU). The model achieves high accuracy rates of 100% on the training set and 99.67% on the test set, with the Bayesian algorithm enhancing test accuracy to 99.73%. The approach demonstrates effective feature extraction and rapid convergence, making it a robust tool for combating misinformation in the digital age.

Uploaded by

Yousef Zahran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

A Hybrid Transformer Model for Fake News Detection Leveraging Bayesian Optimization and Bidirectional Recurrent Unit

This paper presents a novel hybrid Transformer model for fake news detection that combines Bayesian optimization with a Bidirectional Gated Recurrent Unit (BiGRU). The model achieves high accuracy rates of 100% on the training set and 99.67% on the test set, with the Bayesian algorithm enhancing test accuracy to 99.73%. The approach demonstrates effective feature extraction and rapid convergence, making it a robust tool for combating misinformation in the digital age.

Uploaded by

Yousef Zahran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

A Hybrid Transformer Model for Fake News

Detection: Leveraging Bayesian Optimization and


Bidirectional Recurrent Unit
1st Tianyi Huang∗ 1st Zeqiu Xu
Department of Electrical Engineering and Computer Sciences Information Networking Institute
University of California Carnegie Mellon University
Berkeley, CA 94720 Pittsburgh, PA 15213
[email protected] [email protected]
arXiv:2502.09097v2 [cs.CL] 2 Mar 2025

2nd Peiyang Yu 2nd Jingyuan Yi 3th Xiaochuan Xu


Information Networking Institute Information Networking Institute Information Networking Institute
Carnegie Mellon University Carnegie Mellon University Carnegie Mellon University
Pittsburgh, PA 15213 Pittsburgh, PA 15213 Pittsburgh, PA 15213
[email protected] [email protected] [email protected]

Abstract—In this paper, we propose an optimized Transformer crises. For instance, during elections, fake news influences
model that integrates Bayesian algorithms with a Bidirectional voter decision-making and undermines democratic integrity.
Gated Recurrent Unit (BiGRU), and apply it to fake news Consequently, automated fake news detection has become a
classification for the first time. First, we employ the TF-IDF
method to extract features from news texts and transform them critical research focus in information science, data science,
into numeric representations to facilitate subsequent machine and computational social science [2].
learning tasks. Two sets of experiments are then conducted for Among the various approaches explored, machine learning
fake news detection and classification: one using a Transformer (ML) has emerged as a key solution, enabling automated
model optimized only with BiGRU, and the other incorporating analysis and classification of misinformation. ML models ex-
Bayesian algorithms into the BiGRU-based Transformer. Ex-
perimental results show that the BiGRU-optimized Transformer tract distinguishing features from historical and spatiotemporal
achieves 100% accuracy on the training set and 99.67% on the data [3], including linguistic patterns (e.g., sentiment analysis,
test set, while the addition of the Bayesian algorithm maintains word frequency), user interaction metrics (e.g., engagement
100% accuracy on the training set and slightly improves test-set levels, virality), and source credibility [4]. Common ML-
accuracy to 99.73%. This indicates that the Bayesian algorithm based classifiers include support vector machines (SVM) ,
boosts model accuracy by 0.06%, further enhancing the detection
capability for fake news. Moreover, the proposed algorithm decision trees , random forests, and neural networks, while
converges rapidly at around the 10th training epoch with ac- deep learning architectures such as convolutional neural net-
curacy nearing 100%, demonstrating both its effectiveness and works (CNN), recurrent neural networks (RNN), RoBERTa,
its fast classification ability. Overall, the optimized Transformer DeBERTa, and T5 have demonstrated superior performance
model, enhanced by the Bayesian algorithm and BiGRU, exhibits in handling complex textual data [5]. Compared to traditional
excellent continuous learning and detection performance, offering
a robust technical means to combat the spread of fake news in approaches, these models offer enhanced accuracy and robust-
the current era of information overload. ness in identifying misinformation.
Index Terms—Bayesian algorithm; fake news detection; trans- To address the data scarcity challenge in fake news detec-
former; BiGRU. tion, semi-supervised learning and transfer learning techniques
have been employed to leverage both labeled and unlabeled
I. I NTRODUCTION data. Additionally, recent advancements in large language
models (LLMs) have significantly improved detection capabil-
T HE rapid expansion of the Internet and social media has
significantly accelerated the spread of fake news, pos-
ing serious challenges across social, political, and economic
ities by integrating multimodal learning, adversarial training,
and chain of reasoning [6]–[9]. Retrieval-Augmented Gener-
domains [1]. Defined as misleading or fabricated content ation (RAG) was proposed for further improve performance
designed to attract attention, manipulate opinions, or serve of LLMs. [10]. Pre-trained LLMs, for example, GreenPLM,
specific agendas, fake news propagates rapidly through digital not only showed great performance but also had low cost
communication networks, often leading to misinformation for training. [11] However, challenges remain in adapting to
evolving misinformation trends and avoiding uncertainty of
∗ Corresponding author: [email protected] Large Language Models [12].
In this paper, we propose an optimized Transformer-based preprocessing (such as word segmentation, removal of stop
model, incorporating Bayesian inference and bidirectional words and desiccation), then calculating word frequency and
gated recurrent units (Bi-GRUs) to enhance fake news classifi- inverse document frequency for each document, and finally
cation accuracy. To the best of our knowledge, this is the first obtaining a sparse matrix representing the TF-IDF value of
application of this approach in misinformation detection [13]. each word in all documents. [16] These values are used as
input features of subsequent machine learning models. The
II. DATA S OURCES
features converted to numerical types are used for subsequent
The data set selected in this paper comes from the Kaggle machine learning classification.
open source data set, which contains 5000 rows of data,
including two categories of true news and fake news. The IV. M ETHOD
data set has been tested by numerous experimenters in kaggle,
and can significantly distinguish and compare the advantages A. Bayesian algorithm
and disadvantages of the algorithm. Select some data sets for
Bayesian algorithm is a statistical inference method based
display, and the results are shown in Table I.
on Bayes’ theorem for classification and probabilistic model
construction. The core idea is to evaluate the probability of
TABLE I
S OME OF THE DATA an event by updating the prior information and combining
the new observation data. The principle diagram of Bayes
Text Type algorithm is shown in Fig. 1. The Bayesian approach, which
Trump says healthcare reform push may need additional money
WASHINGTON (Reuters) - President Donald Trump on emphasizes adjusting our beliefs by observing new evidence
Tuesday said that the Republican push to repeal Obamacare may during reasoning, is flexible and efficient.
require additional money for healthcare, but he did not specify
how much more funding would be needed or how it might be
Real
used. Trump told Republican Senators joining him for lunch at
the White House that their planned healthcare reform bill would
need to be “generous” and “kind.” “That may be adding
additional money into it,” Trump said, without offering further
details. [14]
China’s Xi, Trump discuss ‘global hot-spot issues’: Xinhua
BEIJING (Reuters) - Chinese President Xi Jinping and U.S.
President Donald Trump on Saturday discussed “global hot-spot
Real
issues” on the sidelines of the G20 summit in the German city
of Hamburg, state news agency Xinhua said. It did not
immediately give any other details.
Trump has talked to top lawmakers about immigration reform:
White House WASHINGTON (Reuters) - U.S. President Donald
Trump has spoken to congressional leaders about immigration
reform and is confident that Congress will take action to deal
with the status of illegal immigrants who have grown up in the Fig. 1. The principle diagram of Bayes algorithm.
United States, the White House said on Tuesday. “We have
confidence that Congress is going to step up and do their job,”
White House spokeswoman Sarah Sanders told a briefing shortly Real In a Bayesian framework, we usually start with prior
after the administration scrapped a program that protected from knowledge, which may be derived from historical data or the
deportation some 800,000 young people who grew up in the
United States. “This is something that needs to be fixed
experience of domain experts. A prior probability quantifies
legislatively and we have confidence that they’re going to do our initial belief that an event will occur in the absence
that,” Sanders said, adding that Trump was willing to work with of observational data. When new observational data appear,
lawmakers on immigration reform, which she said should
include several “big fixes,” not just one tweak to the system.
Bayesian algorithms are used to update this belief, producing
a posterior probability. This updating process attaches impor-
tance to the information provided by the data, which means
III. T EXT F EATURE E XTRACTION that even if the prior knowledge is poor, the prediction ability
Term Frequency-inverse Document Frequency (TF-IDF) is of the model will gradually improve with the addition of more
probably the most common feature extraction method applied data [17].
to text features in both Natural Language Processing and The Bayesian methods of reasoning specifically have to
Information Retrieval. TF-IDF attempts to provide a measure deal with problems having uncertainty and complexities. Most
of importance of a word in a particular document in context practical problems involve usually incomplete knowledge of
of its universality across the document set. More precisely, TF existing and observational data. The Bayesian algorithms can
(word frequency) calculates the frequency of a word within a then become very strong, using prior distributions in a manner
document, while IDF (inverse document frequency) calculates such that small data may provide a very accurate mainte-
the rarity of a word across the total set of documents [15]. nance of performance. The flexibility makes Bayesian methods
The score in TF-IDF comes to some value for every word bound for application in domains of wide variance, including
just by multiplying both and may show how relevant each but not limited to medicine, finance, machine learning, and
word is in that given document. This process includes data natural language processing.
B. Bidirectional gated cycle unit V. T RANSFORMER
Bi-gated loop Unit (Bi-GRU) is an improved recurrent Transformer is a deep learning model for processing se-
neural network (RNN) structure for the processing of sequence quence data, first proposed in the year 2017 by Vaswani et
data, such as natural language processing and time series al. Since its proposition, it has entirely changed model design
prediction. Fig. 2 shows the schematic diagram of bidirectional in the field of natural language processing (NLP), especially
gated cycle unit. Unlike traditional one-way recurrent neural on machine translation tasks. Unlike traditional recurrent
networks, the bidirectional gated recurrent structure improves neural networks (RNN), Transformer is completely based on
the understanding of the model in context by considering both mechanisms of self-attention and abandons series-dependent
the forward and reverse information of the sequence [18]. limitations [13], [20]. This makes parallel processing possible
and significantly improves training efficiency. A schematic
diagram of Transformer is presented in Fig. 3.

Fig. 2. The schematic diagram of the bidirectional gated cycle unit.

The working mechanism of bidirectional GRUs is divided


into two major parts: forward and reverse. In the forward part,
the model is designed to take input data in steps in a normal
chronological manner, passing information from beginning Fig. 3. A schematic diagram of Transformer.
to end. In the backward part, it models the information
from the end to the beginning to capture the very ending The self-attention mechanism forms the core of the Trans-
information that influences the overall context. By this means, former model for its ability to capture correlations between
the bidirectional flow of information improves the ability of the positions in an input sequence. Computing similarities for a
model to capture deeper relationships of contexts and enriches word against all other words in the sequence, it assigns atten-
the ability of extracting features in a sequence. tion weights variably, helping it catch contextual information
GRU itself is a variant of the RNN, which was proposed effectively. This approach effectively helps in capturing long-
to alleviate the problem of disappearing gradients while doing range dependencies because every word can directly interact
long-term dependency learning for traditional RNNS. Unlike with every other word in the sequence, without necessarily
traditional RNNS, GRUs guide the flow of information by relying on sequential layers to pass information.
introducing a gating mechanism that makes decisions on what The two major components of the Transformer model are
it needs to retain and what should be forgotten; hence, this the encoder and decoder. The encoder mainly translates the
makes the GRU much more efficient and robust in learning input sequence into higher levels of context-based representa-
the long sequences. Bidirectional GRU, at each step, goes tions, which are then used by the decoder for generating the
both forward and backward to provide the last state of output; target output sequence. The encoder consists of a stack of iden-
hence, bidirectional GRU has some advantages over a standard tical layers, each having a multi-head self-attention mechanism
one during decision making within a larger context [19]. and a feed-forward neural network as two major components.
Bi-directional GRUs do find an extremely important appli- This is the architecture that results in parallelization along the
cation in NLP domain tasks, such as Language Modeling, processing of multiple representation subspaces, which intrin-
Sentiment Analysis, and Machine Translation, where it has sically improves generalization capability and expressiveness.
become usual to capture the context with meanings on both In some ways, the architecture of decoder is related to
sides around certain phrases to decide the meaning of a phrase that of the encoder with an added manner of autoregressive
as a sentence wholesaler. By modeling contextual relationships generation, which means for every step it generates, there
in both directions of context simultaneously, it allows for more is dependence from previously generated ones. Multi-headed
accurate modeling of text, which is useful in the performance self-attention plays an essential role in achieving this by
of such a task. considering all previously created words and encompassing
information emanating from all the encoder outputs at each
individual generation step.
Another important feature in the Transformer model is
positional encoding. Since the self-attention mechanism itself
doesn’t capture position information of words in a sequence,
positional encoding introduces either relative or absolute po-
sition information. This would allow the model to know that
the input comes in some sort of sequence, which it could keep
in word order in its representations.

A. Transformer algorithm based on Bayesian algorithm and


bidirectional gated cycle unit optimization
The Transformer algorithm, which integrates Bayesian in-
ference and Bidirectional Gated Recurrent Unit (Bi-GRU)
optimization, enhances the performance and uncertainty quan-
tification of models processing sequential data by combining
self-attention mechanisms in deep learning with Bayesian
inference. By incorporating Bi-GRU into the Transformer’s
encoding and decoding modules, the model can capture
contextual information from both preceding and succeeding
elements. Concurrently, the application of Bayesian algorithms
for parameter inference allows the model to effectively manage
uncertainties and improve generalization capabilities. This re-
sults in higher accuracy and robustness in tasks such as natural
language processing and sequence prediction. The workflow
of the algorithm is illustrated in Fig. 4.
Bayesian algorithm is mainly used to optimize the un-
certainty estimation of model parameters. By introducing
Bayesian inference, the model is able to model the probability
Fig. 4. The working flow chart of the algorithm.
distribution of the parameters rather than a single point esti-
mate. Specifically, the Bayesian algorithm computes the poste-
rior distribution of the parameters through the prior distribution
and the likelihood function of the observed data, thereby integrated and the posterior distribution updated based on the
dynamically adjusting the uncertainty of the parameters during input data, which gives the model greater reasoning power.
training. This uncertainty estimation helps to improve the 5) Self-attention mechanism: In Transformer, the self-
robustness of the model, especially when the data is sparse attention mechanism is used to calculate the attention weight
or noisy. Combined with Bayesian optimization, Transformer of the input sequence. This step combines the contextual
can better capture potential patterns in input sequences while information obtained by BiGRU with the representation of
reducing the risk of overfitting and improving generalization. each word to get richer information.
1) Data preprocessing: The input sequence is preprocessed 6) Multiple layers of attention: On the basis of self-
as necessary, including word segmentation, conversion to attention, multiple attention layers are constructed to focus
vector representation, and padding and truncation to fixed on different parts of the sequence at the same time through
length. multiple independent attention mechanisms. Each head learns
2) Location coding: Since Transformer cannot handle se- a different representation, enhancing the power of the model.
quence locations directly, add location encoding to retain 7) Feedforward neural network: The output of multi-head
sequence information. attention is passed into the feedforward neural network to
3) Bidirectional GRU encoder: The input sequence is en- further process the information and introduce the nonlinearity
coded using a bidirectional GRU. The bidirectional GRU reads through the activation function.
the entire sequence, taking the context representation in both 8) Decode: In the decoding phase, the output is generated
the forward and backward directions, and generating hidden sequentially, relying on the autoregressive model structure,
states. while taking into account the contextual information provided
4) Bayesian reasoning: Based on the generated hidden by the encoder and the previously generated output.
state, Bayesian inference is used to update the model parame- 9) Training and optimization: The model is trained by
ters or generate a probability distribution for the sequence. gradient descent optimization algorithm to minimize the loss
With a Bayesian approach, the prior distribution can be function. At the same time, the parameters are updated by
Bayesian inference to ensure that the model can still make
correct predictions in the case of high uncertainty.

VI. R ESULT
In terms of parameter Settings, Adam Optimizer was used
in the experiment, the maximum training rounds is set to 200,
the number of batches is set to 256, the initial learning rate
is 0.001, the learning rate decline factor is set to 0.1, and the
gradient clipping threshold is set to 10. The Nvidia 4090 GPU
is used for running experiments with Matlab R2024a.
In the division of data sets, this experiment divided the
training set and the testing set according to the ratio of 7:3.
In the category of binary classification, the proportion of data
sets is balanced.
This paper presents experiments using two variations of
the Transformer algorithm: one optimized with a bidirectional
gated cycle unit and another optimized with a bidirectional
gated cycle unit based on Bayesian optimization. To compare
their performance, we analyze the confusion matrices gener-
ated for both the training and testing datasets. Fig. 5 shows
the confusion matrix of the Transformer model optimized
with a bidirectional gated cycle unit, while Fig. 6 shows the Fig. 6. The experiment of fake news detection and classification based on
confusion matrix of the model with Bayesian optimization and Bayesian algorithm and bidirectional gated cycle unit optimization Trans-
former algorithm.
the bidirectional gated cycle unit.

Transformer algorithm, the accuracy of the training set is


100%, and the accuracy of the test set is 99.73%.
The accuracy of both algorithms on the training and testing
sets is compared and presented in Table II.

TABLE II
T HE ACCURACY OF TWO ALGORITHMS ON TRAINING SET AND TESTING
SET

Training Training
Method accuracy accuracy
(%) (%)
Bidirectional Gated Cycle Unit Optimization
100 99.67
Transformer
Bidirectional gated Cycle unit with Bayesian
100 99.73
Algorithm Optimization Transformer

According to the experimental results, the accuracy rate


of the two algorithms on the training set is 100%, and the
classification performance of two algorithms on the testing
set is more than 99%, showing great fake news detection and
classification accuracy. In addition, after adding Bayesian algo-
rithm optimization, the accuracy of the Transformer algorithm
Fig. 5. The experiment of fake news detection and classification with the
Transformer algorithm based on bidirectional gated cycle unit optimization. based on bidirectional gated cycle unit optimization increases
by 0.06%, which further improves the prediction accuracy.
In the experiment of fake news detection with the Trans- Fig. 7 shows the loss and accuracy curves of the algorithm.
former algorithm based on bidirectional gated cycle unit From the curves, we found that the algorithm used in this
optimization, the accuracy of the training set is 100%, and experiment converges around the 10th epoch, achieving nearly
the accuracy of the testing set is 99.67%. 100% accuracy. This indicates that the algorithm have excel-
In the experiment of fake news detection based on Bayesian lent classification speed, allowing it to quickly and accurately
algorithm and bidirectional gated cycle unit optimization detect and classify fake news.
R EFERENCES
[1] C.-M. Lai, M.-H. Chen, E. Kristiani, V. K. Verma, and C.-T. Yang, “Fake
news classification based on content level features,” Applied Sciences,
vol. 12, no. 3, p. 1116, 2022.
[2] D. Rohera, H. Shethna, K. Patel, U. Thakker, S. Tanwar, R. Gupta,
W.-C. Hong, and R. Sharma, “A taxonomy of fake news classification
techniques: Survey and implementation aspects,” IEEE Access, vol. 10,
pp. 30 367–30 394, 2022.
[3] Z. Zhang, X. Wang, X. Zhang, and J. Zhang, “Simultaneously detecting
spatiotemporal changes with penalized poisson regression models,”
arXiv preprint arXiv:2405.06613, 2024.
[4] X. Xu, P. Yu, Z. Xu, and J. Wang, “A hybrid attention framework
for fake news detection with large language models,” arXiv preprint
arXiv:2501.11967, 2025.
[5] X. Huang, Y. Wu, D. Zhang, J. Hu, and Y. Long, “Improving academic
skills assessment with nlp and ensemble learning,” in 2024 IEEE 7th
International Conference on Information Systems and Computer Aided
Education (ICISCAE). IEEE, 2024, pp. 37–41.
[6] J. Yi, Z. Xu, T. Huang, and P. Yu, “Challenges and innovations in
llm-powered fake news detection: A synthesis of approaches and future
directions,” arXiv preprint arXiv:2502.00339, 2025.
Fig. 7. The change curve of loss and accuracy of the algorithm. [7] J. He, M. D. Ma, J. Fan, D. Roth, W. Wang, and A. Ribeiro, “Give:
Structured reasoning with knowledge graph inspired veracity extrapola-
tion,” arXiv preprint arXiv:2410.08475, 2024.
[8] H. Guo, T. Huang, H. Huang, M. Fan, and G. Friedland, “A systematic
review of multimodal approaches to online misinformation detection,”
VII. C ONCLUSION in 2022 IEEE 5th International Conference on Multimedia Information
Processing and Retrieval (MIPR). IEEE, 2022, pp. 312–317.
This study is the first to apply Bayesian algorithm and [9] P. Yu, X. Xu, and J. Wang, “Applications of large language models
in multimodal learning,” Journal of Computer Technology and Applied
Bidirectional gated loop unit (Bi-GRU) optimization Settings Mathematics, vol. 1, no. 4, pp. 108–116, 2024.
to the classification and prediction task of fake news. We [10] W. Liu, J. Chen, K. Ji, L. Zhou, W. Chen, and B. Wang, “Rag-
utilized TF-IDF method to extract features from news texts instruct: Boosting llms with diverse retrieval-augmented instructions,”
arXiv preprint arXiv:2501.00353, 2024.
and convert them into numerical features used by subsequent [11] Q. Zeng, L. Garay, P. Zhou, D. Chong, Y. Hua, J. Wu, Y. Pan,
machine learning models. In the experiment, we designed two H. Zhou, R. Voigt, and J. Yang, “Greenplm: cross-lingual transfer of
different models: Transformer algorithm based on Bi-GRU monolingual pre-trained language models at almost no cost,” arXiv
preprint arXiv:2211.06993, 2022.
optimization, and Transformer algorithm combining Bayes [12] Q. Zeng, M. Jin, Q. Yu, Z. Wang, W. Hua, Z. Zhou, G. Sun, Y. Meng,
algorithm and Bi-GRU optimization. The results showed that S. Ma, Q. Wang et al., “Uncertainty is fragile: Manipulating uncertainty
both algorithms achieve great performance in detecting fake in large language models,” arXiv preprint arXiv:2407.11282, 2024.
[13] H. Guo, T. Huang, H. Huang, M. Fan, and G. Friedland, “Detect-
news. When using a single Bi-GRU optimized Transformer ing covid-19 conspiracy theories with transformers and tf-idf,” arXiv
algorithm, the accuracy of the training set reached 100%, while preprint arXiv:2205.00377, 2022.
the accuracy of the testing set was 99.67%. The optimized [14] W. H. Bangyal, R. Qasim, N. U. Rehman, Z. Ahmad, H. Dar, L. Rukhsar,
Z. Aman, and J. Ahmad, “Detection of fake news text classification on
version with the addition of the Bayesian algorithm also covid-19 using deep learning approaches,” Computational and mathe-
achieved 100% accuracy on the training set and increased matical methods in medicine, vol. 2021, no. 1, p. 5514220, 2021.
to 99.73% on the testing set. This increased accuracy of [15] L. Bozarth and C. Budak, “Toward a better performance evaluation
framework for fake news classification,” in Proceedings of the inter-
0.06% demonstrated the Bayesian algorithm as an important national AAAI conference on web and social media, vol. 14, 2020, pp.
component for optimizing the Transformer model. 60–71.
[16] W. Liu, S. Cheng, D. Zeng, and H. Qu, “Enhancing document-level
Analyzing the training accuracy and loss function curves event argument extraction with contextual clues and role relevance,”
shows that the model is close to convergence around the 10th arXiv preprint arXiv:2310.05991, 2023.
epoch, and the accuracy is close to 100%. This indicates that [17] M. Z. Nawaz, M. S. Nawaz, P. Fournier-Viger, and Y. He, “Analysis and
classification of fake news using sequential pattern mining,” Big Data
the proposed optimization algorithm not only has fast classifi- Mining and Analytics, vol. 7, no. 3, pp. 942–963, 2024.
cation speed, but also can achieve high-precision for fake news [18] N. Rai, D. Kumar, N. Kaushik, C. Raj, and A. Ali, “Fake news classi-
detection in a short time. This point has important practical fication using transformer based enhanced lstm and bert,” International
Journal of Cognitive Computing in Engineering, vol. 3, pp. 98–105,
applications for the early warning and accurate identification 2022.
of fake news. [19] M. Fayaz, A. Khan, M. Bilal, and S. U. Khan, “Machine learning for
In conclusion, this study proves the effectiveness and per- fake news classification with optimal feature selection,” Soft Computing,
vol. 26, no. 16, pp. 7763–7771, 2022.
formance of Transformer algorithm based on the Bayesian [20] P. Yu, J. Yi, T. Huang, Z. Xu, and X. Xu, “Optimization of transformer
algorithm and Bi-GRU optimization in fake news classification heart disease prediction model based on particle swarm optimization
problem. Such combinations not only improve the classifica- algorithm,” arXiv preprint arXiv:2412.02801, 2024.
tion accuracy, but also provide a new solution to the growing
challenge of false information. This achievement provides
important reference value and inspiration for future related
research.

You might also like