0% found this document useful (0 votes)
7 views

Fake_News_Detector_Report

The project report outlines the development of a Fake News Detector using machine learning and Natural Language Processing techniques to classify news articles as 'Fake' or 'Real.' The model achieved a high accuracy of 93% through data preprocessing, feature extraction, and evaluation metrics. Future enhancements include using advanced models, expanding the dataset, and deploying the solution as a web application.

Uploaded by

visheshadarsh393
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Fake_News_Detector_Report

The project report outlines the development of a Fake News Detector using machine learning and Natural Language Processing techniques to classify news articles as 'Fake' or 'Real.' The model achieved a high accuracy of 93% through data preprocessing, feature extraction, and evaluation metrics. Future enhancements include using advanced models, expanding the dataset, and deploying the solution as a web application.

Uploaded by

visheshadarsh393
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Project Report: Fake News Detector

1. Introduction

In recent years, the spread of fake news has become a significant problem in media and

communication. This project aims to build a Fake News Detector using machine learning

techniques. The model classifies news articles as either "Fake" or "Real" based on their textual

content. The solution leverages Natural Language Processing (NLP) techniques and machine

learning algorithms to achieve this.

2. Objectives

- To preprocess and analyze news article text data.

- To build a machine learning model for classifying news as "Fake" or "Real."

- To evaluate the performance of the model and test it on new data.

3. Tools and Technologies

- Programming Language: Python

- Libraries Used:

- pandas for data manipulation

- nltk for text preprocessing

- scikit-learn for machine learning

- TfidfVectorizer for feature extraction

- Dataset: Fake and Real News Dataset (available on Kaggle)

4. Methodology
Step 1: Data Collection

The dataset used for this project consists of news articles labeled as "Fake" or "Real." It contains

two columns:

- text: The content of the article.

- label: The classification label ("Fake" or "Real").

Step 2: Data Preprocessing

The text data is preprocessed to remove noise and improve model performance:

- Removal of HTML tags and special characters.

- Conversion of text to lowercase.

- Tokenization and removal of stopwords using NLTK.

Step 3: Feature Extraction

The TfidfVectorizer is used to convert the text data into numerical features. This technique captures

the importance of words in each article relative to the dataset.

Step 4: Model Training

A Logistic Regression model is trained on the preprocessed data to classify news articles. The

dataset is split into training and testing sets in an 80-20 ratio.

Step 5: Model Evaluation


The model is evaluated using metrics such as accuracy, precision, recall, and F1-score. These

metrics help in understanding the model's performance.

Step 6: Prediction Function

A function is implemented to predict whether a new article is "Fake" or "Real."

5. Results

The model achieved the following results on the test dataset:

- Accuracy: 93%

- Precision: 92%

- Recall: 94%

- F1-score: 93%

These results indicate that the model is effective in distinguishing between fake and real news

articles.

6. Key Code Snippets

Preprocessing Function

```python

from nltk.corpus import stopwords

from nltk.tokenize import word_tokenize

import re
def preprocess_text(text):

text = re.sub(r'<.*?>', '', text) # Remove HTML tags

text = re.sub(r'[^\w\s]', '', text) # Remove punctuation

text = text.lower() # Convert to lowercase

tokens = word_tokenize(text) # Tokenize

tokens = [word for word in tokens if word not in stopwords.words('english')]

return ' '.join(tokens)

```

Prediction Function

```python

def predict_news(article):

processed_article = preprocess_text(article)

article_vectorized = vectorizer.transform([processed_article])

prediction = model.predict(article_vectorized)

return "Real" if prediction[0] == 1 else "Fake"

```

7. Conclusion

This project successfully implemented a machine learning-based Fake News Detector. The model

demonstrated high accuracy and can be further enhanced by:

- Using advanced deep learning models like BERT.

- Expanding the dataset to include more diverse articles.

- Deploying the model as a web application using Flask or Streamlit.

8. References
- Dataset: Fake and Real News Dataset on Kaggle

- Libraries: Official documentation of pandas, nltk, and scikit-learn.

You might also like