paper 4-- Text_Classification_Based_on_Machine_Learning

Uploaded by

sasobaid

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

paper 4-- Text_Classification_Based_on_Machine_Learning

Uploaded by

sasobaid

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

paper 4:- Text_Classification_Based_on_Machine_Learning

ABSTRACT

The paper is focusing on improving text classification methods using machine learning (ML) and
natural language processing (NLP) technologies. It introduces the Trusted Platform Module
(TPM) algorithm that combines ML and NLP techniques for better classification performance.
The TPM algorithm achieves over 95% accuracy in experiments distinguishing between spam
and legitimate emails across different datasets.

INTRODUCTION

Text is a crucial way to share and store information, especially in today's digital world with
paperless offices, online libraries, and e-commerce. It serves as a universal medium for
communication and data management.
Highlights the challenges posed by the rapid increase in digital text data due to the global
growth of the Internet.
Points out the need for effective text classification technologies to process massive amounts of
data and meet user-specific requirements, with applications in sentiment analysis, opinion
mining, and domain-specific recognition.

TECHNIQUES USED

It is using many techniques such as: -

1. Text formatting and cleaning operations: - tokenization and word segmentation
2. Feature Extraction: BERT-based semantic feature extraction for NLP tasks
3. Classification Framework: Use of active learning methods, including Edge MS sampling
for data annotation

MODELS USED

In the paper many models were used such as: -

1. Trusted Platform Module (TPM): Combines LSTM, CNN, and Bi-GRU layers.
2. Compared Models: k-Nearest Neighbors (KNN), Support Vector Machines (SVM), Naive
Bayes (NB)

RESULTS

The results were as the following: -

1. TPM outperformed other models with the highest accuracy on two benchmark datasets
(TREC2007 and Enron-spam): TPM accuracy Over 95%. TPM achieved a more balanced
trade-off between precision and recall compared to other models.
2. NB accuracy: Moderate performance.
3. SVM accuracy: Lower than TPM but higher than KNN
4. KNN performed the worst in accuracy and computational efficiency.

USE AI FOR ARTICLE CLASSIFICATION

We can summarize the AI using as the following: -

1. Combines NLP and ML technologies to enhance text feature extraction and classification
accuracy.
2. TPM uses deep learning techniques (LSTM and CNN) for semantic understanding,
augmented by Bi-GRU for dataset-specific feature extraction
3. Graph-based semi-supervised learning algorithms support label propagation to minimize
manual data labeling efforts.

CONCLUSION

The TPM algorithm is significantly improves text classification performance compared to

traditional methods like SVM, NB, and KNN.
Future improvements may include adapting the methodology for large-scale, real-time text
classification tasks.