0% found this document useful (0 votes)
20 views

Fake News Detectio3

Uploaded by

addalatejasri123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Fake News Detectio3

Uploaded by

addalatejasri123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 24

FAKE NEWS DETECTION : using machine learning

submitted by: Ms. Addala Teja sri seethalu


(2302001)
AIMLDS
KGRL COLLEGE OF PG COURSES (A)
BHIMAVARAM
M.C.A
JULY - 2024

PROJECT: FAKE NEWS DETECTION USING MACHINE LEARNING

Uncovering the truth has never been easier! Learn how machine learning
algorithms can help combat fake news with our fake news detection project tutorial!

Imagine a scenario where a false news story spreads rapidly on social


media ,claiming that a particular medication is a cure for a deadly disease . people start
hoarding the medication ,causing scarcity and preventing those who need it from accessing
it.
This example scenario shows one of the several real – world risks of fake news.

The rapid spread of fake news has become a major issue worldwide. The spread of false
misleading news has led to significant and economic consequences, impacting from finance
to healthcare. For example ,IN 2020,during the COVID-19 pandemic, several countries
witnessed a spike in false news about the virus, leading to confusion and panic among
people. Misinformation and fake news can have a long-term impact, especially when people
rely on accurate imformati0n to make critical decisions. The need for detecting fake news
has never been more crucial. Machine learning techniques can help us detect fake news
efficiently and accurately . Using natural language processing techniques, machine learning
algorithms can accurately detect and categorize true and false news ML systems may
distinguish between true news and false news by analyzing patterns in the language and
source used in news reports.
This blog will explore a fake news detection project using machine learning and discuss how
machine learning algorithms can efficiently detect and distinguish false news from real news.
We will also explore the key machine -learning algorithms used to identify false and true
news and real-world use cases of fake news detection.

Table of Contents

 What is fake news detection using machine learn project?

 Advantages and disadvantages of fake news detection using machine learn

 Top 5 machine learning algorithms for fake news detection

 Top 5 fake news detection project datasets

 Fake news detection real-world use cases / applications

 Fake news detection is very useful to us

 Top 3 fake news detection projects in GitHub

 Build a fake news detection project in python with source code – a step – by – step

approach

 Boost your career with fake news detection project by


 FAQs

 What is fake news detection using machine learning project?

Detecting fake news using machine learning techniques would mean

having an automatic detection system that looks at a piece of text (tweets, news articles,

WhatsApp message) and determine how likely it looks like at a piece of false news. The

system will be a machine learning model trained on a large enough dataset containing

example of real and false news from various sources and styles. However, since machine

learning models only look at numerical features, we must perform natural language

processing on this text corpus(collection of text samples).

Natural language processing will perform data cleaning

,stemming ,lemmatization, and vectorization using one of the many available techniques

and convert sentences into a vectors of numbers that machine learning models can

interpret. Once this is done, we can train models like naïve bayes, logistic regression, and

random forests and observe their results.

If we find that the performance of these machine learning techniques is lacking in the

dataset, we can delve into deep learning and look at LSTM or Attention-based models to

perform text classification.

But first, let us see why you should use machine learning for detecting false news and what

drawbacks you should be aware of while doing so .

FAKE NEWS DETECTION


MADE BY : NAVYA SRI (22PD1A0552)
Advantages and Disadvantages of fake news detection using machine learnin :

Machine learning has led to significant developments in fake news detection. However,

machine learning has advantages and disadvantages when detecting false news. This section

will explore the pros and cons of fake news prediction using machine learning.

Advantages of detection fake news using ML :

 Scalability

 Privacy concern

 Maintenance

Top 5 machine learning algorithms for fake news detection data science project :

 GNN

(GRAPH NEURAL NETWORKS )

 Bi LSTM + Attention

( Bi-directional learning for stance detection in context of checking fake news )

 CNN + DNN

( Deep neural network that ends with a soft max layer )

 CNN + BOOSTED TREES

( convolution neural networks and boosted trees allows for more robust and accurate

detection of false news. )

 MLP

( multilayer perceptron is a type of neural network )


Top 5 fake news detection project datasets :

 Fake news net : dataset of political and gossip tweets

 Fake news corpus

 Fake health

 Constraint COVID-19 fake news dataset

 FNC-1 ( FAKE NEWS CHALLENGE STAGE 1 )

Fake news prediction using machine learning real world use cases / applications

Fake news detection has a wide range of applications across various industries. Let us

explore some real-world applications of false news detection


o Social media

( fake news spreads quickly on social media platforms )

o News / journalism

( News organization use machine learning algorithm to verify information and sources )

o Politics

(fake news can significantly impact political campaigns and election )

o Finance

( fake news can also significantly impact financial markets )

o Healthcare

( Fake news can also seriously affect the healthcare industry )

Top 3 fake news detection projects on GitHub :

There are many research projects on false news that one can explore to understand the

scope of the problem and the best available approaches.to get started , we list some of the
better projects available publicly on GitHub for detecting fake news with python.y

Comprehensive project for fake news analusis using machine learning,build fake news

detecting using python project with source code – A step – by – step approach

DATASET DESCRIPTION

Train.csv : A full training database with the following attributes:

o Id : unique id for a news

o Title : the title of a new article

o Author: author of the news article

o Text : the text of the article

o Lable : a lable that marks the article as potentially unreliable

… 1 : unreliable

… 0 : reliable
Here is a basic example of fake news detection using machine learning with Python

and scikit-learn. This example uses a logistic regression model, but you can

experiment with other models for better results.

### Prerequisites

Make sure you have the following libraries installed:

- pandas

- scikit-learn

- nltk (for natural language processing tasks)

You can install them using pip:

sh

pip install pandas scikit-learn nltk

### Step-by-Step Code


1. *Import necessary libraries:*

python

import pandas as pd

import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score, classification_report

import nltk

from nltk.corpus import stopwords

import re

nltk.download('stopwords')

2. *Load and preprocess the dataset:*

For this example, we will use a CSV file containing labeled news articles. You can use

any suitable dataset.

python

# Load dataset

df = pd.read_csv('fake_news_dataset.csv') # Ensure your CSV file has columns like

'text' and 'label'

# Basic text cleaning function

def clean_text(text):

text = re.sub(r'\W', ' ', text)


text = re.sub(r'\s+', ' ', text)

text = re.sub(r'\s+', ' ', text)

text = text.lower()

return text

df['text'] = df['text'].apply(clean_text)

# Define features and labels

X = df['text']

y = df['label']

3. *Split the dataset into training and testing sets:*

python

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

random_state=42)

4. *Convert text data to numerical data using TF-IDF Vectorizer:*

python

vectorizer = TfidfVectorizer(stop_words=stopwords.words('english'), max_df=0.7)

X_train_tfidf = vectorizer.fit_transform(X_train)

X_test_tfidf = vectorizer.transform(X_test)
5. *Train the machine learning model:*

python

model = LogisticRegression()

model.fit(X_train_tfidf, y_train)

6. *Evaluate the model:*

python

y_pred = model.predict(X_test_tfidf)

print(f'Accuracy: {accuracy_score(y_test, y_pred)}')

print(classification_report(y_test, y_pred))

### Complete Example

```python

import pandas as pd

import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score, classification_report

import nltk

from nltk.corpus import stopwords


import re

nltk.download('stopwords')

# Load dataset

df = pd.read_csv('fake_news_dataset.csv') # Ensure your CSV file has columns like

'text' and 'label'

# Basic text cleaning function

def clean_text(text):

text = re.sub(r'\W', ' ', text)

text = re.sub(r'\s+', ' ', text)

text = re.sub

Detecting fake news using machine learning involves training algorithms to identify

patterns and features associated with false information. Here’s an overview of the

process:

1. *Data Collection*: Gather datasets containing labeled examples of fake and real

news. Commonly used datasets include the Fake News Challenge (FNC) and the LIAR

dataset.
2. *Data Preprocessing*: Clean and prepare the text data for analysis. This includes:

- Tokenization: Breaking down text into individual words or tokens.

- Stop Word Removal: Eliminating common words that do not carry significant

meaning.

- Lemmatization/Stemming: Reducing words to their base or root form.

- Vectorization: Converting text into numerical representations, such as TF-IDF

(Term Frequency-Inverse Document Frequency) or word embeddings (e.g.,

Word2Vec, GloVe).

3. *Feature Engineering*: Identify and create features that help distinguish fake

news from real news. These features can be:

- Text-based features: Word frequencies, n-grams, sentiment scores.

- Source-based features: Credibility and reputation of the news source.

- Social context features: Engagement metrics, user profiles, and propagation

patterns on social media.

4. *Model Selection*: Choose machine learning algorithms to train on the processed

data. Common models include:

- Logistic Regression

- Support Vector Machines (SVM)

- Decision Trees and Random Forests

- Gradient Boosting Machines (GBM)

- Neural Networks, especially Recurrent Neural Networks (RNN) and Transformers

for handling sequential text data.


5. *Training and Evaluation*: Split the data into training and testing sets. Train the

model on the training set and evaluate its performance on the testing set using

metrics like accuracy, precision, recall, and F1-score.

6. *Model Tuning*: Optimize hyperparameters and improve the model’s

performance through techniques like cross-validation, grid search, or random search.

7. *Deployment*: Integrate the trained model into applications or systems that can

automatically flag potential fake news articles. Continuous monitoring and retraining

of the model are necessary to adapt to new patterns and changes in the data.

By leveraging these steps, machine learning models can help automate and enhance

the detection of fake news, contributing to more reliable and trustworthy

information dissemination.
To implement a fake news detection system using machine learning, let's assume you

have a dataset with two CSV files: true.csv and fake.csv. Below is a step-by-step guide

to building a machine learning model for fake news detection:

### Step 1: Import Libraries

First, you'll need to import the necessary libraries for data manipulation,

visualization, and machine learning.

python
import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.model_selection import train_test_split

from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

### Step 2: Load the Data

Load the true.csv and fake.csv datasets.

python

# Load the datasets

true_news = pd.read_csv('path/to/true.csv')

fake_news = pd.read_csv('path/to/fake.csv')

# Add a label column to each dataset

true_news['label'] = 1 # 1 indicates true news

fake_news['label'] = 0 # 0 indicates fake news

# Combine the datasets

news = pd.concat([true_news, fake_news]).reset_index(drop=True)


### Step 3: Preprocess the Data

Clean and preprocess the text data.

python

import re

from nltk.corpus import stopwords

# Function to clean the text

def clean_text(text):

text = re.sub(r'http\S+', '', text) # Remove URLs

text = re.sub(r'[^A-Za-z\s]', '', text) # Remove non-alphabetic characters

text = text.lower() # Convert to lowercase

text = ' '.join(word for word in text.split() if word not in stopwords.words('english'))

# Remove stopwords

return text

# Apply the function to the text column

news['text'] = news['text'].apply(clean_text)

### Step 4: Split the Data

Split the data into training and testing sets.

python
X = news['text']

y = news['label']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

random_state=42)

### Step 5: Feature Extraction

Convert the text data to numerical features using TF-IDF.

Python

tfidf_vectorizer = TfidfVectorizer(max_features=5000)

X_train_tfidf = tfidf_vectorizer.fit_transform(X_train)

X_test_tfidf = tfidf_vectorizer.transform(X_test)

### Step 6: Train the Model

Train a logistic regression model.

python

model = LogisticRegression()

model.fit(X_train_tfidf, y_train)

### Step 7: Evaluate the Model

Evaluate the model on the test data.


python

y_pred = model.predict(X_test_tfidf)

# Calculate evaluation metrics

print("Accuracy:", accuracy_score(y_test, y_pred))

print("Classification Report:\n", classification_report(y_test, y_pred))

print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

# Plot the confusion matrix

conf_matrix = confusion_matrix(y_test, y_pred)

sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=['Fake',

'True'], yticklabels=['Fake', 'True'])

plt.xlabel('Predicted')

plt.ylabel('Actual')

plt.show()

### Step 8: Save and Deploy the Model

Save the trained model and the TF-IDF vectorizer for future use.

python

import joblib
# Save the model

joblib.dump(model, 'fake_news_model.pkl')

# Save the vectorizer

joblib.dump(tfidf_vectorizer, 'tfidf_vectorizer.pkl')

You can now deploy this model to detect fake news in real-time by loading the saved

model and vectorizer, then using them to predict the labels for new news articles!

To work on fake news detection using machine learning, you'll need a suitable

dataset to train and test your models. Here are some popular datasets commonly

used for this purpose:

1. *LIAR Dataset*:
- Contains over 12,000 labeled short statements from Politifact, with labels like

pants-fire, false, barely-true, half-true, mostly-true, and true.

- Available at: [LIAR

Dataset](https://www.cs.ucsb.edu/~william/data/liar_dataset.zip)

2. *Fake News Challenge (FNC-1) Dataset*:

- Contains over 50,000 labeled news articles, with stance detection as the primary

task.

- Available at: [FNC-1 Dataset](http://www.fakenewschallenge.org/)

3. *Kaggle Fake News Dataset*:

- Contains a mix of true and fake news articles.

- Available at: [Kaggle Fake News](https://www.kaggle.com/c/fake-news)

4. *BuzzFeed News Dataset*:

- Consists of political news articles labeled by BuzzFeed journalists as either true,

false, or mixed.

- Available at: [BuzzFeed News Dataset](https://github.com/BuzzFeedNews/2016-

10-facebook-fact-check)

5. *ISOT Fake News Dataset*:

- Contains two CSV files: one for fake news and one for true news.

- Available at: [ISOT

Dataset](https://www.uvic.ca/engineering/ece/isot/datasets/fake-news/index.php)
To start working on fake news detection, follow these steps:

1. *Data Collection*:

- Choose a dataset from the above options and download it.

2. *Data Preprocessing*:

- Clean the text data (remove punctuation, stop words, etc.).

- Tokenize the text.

- Convert text to numerical representations (e.g., TF-IDF, word embeddings).

3. *Model Selection*:

- Choose a machine learning model (e.g., logistic regression, SVM, random forest,

deep learning models like LSTM, BERT).

- Split your data into training and testing sets.

4. *Model Training*:

- Train your chosen model on the training data.

- Evaluate the model on the test data using appropriate metrics (accuracy,

precision, recall, F1-score).

5. *Model Evaluation*:

- Fine-tune your model based on the evaluation metrics.

- Consider cross-validation for a more robust evaluatio

6. *Deployment*:
- Once satisfied with the model performance, deploy it for real-time fake news

detection.

This project is made by the student of.,

KGRL COLLEGE OF PG COURSES (A)


BHIMAVARAM

REPORT: I’M Addala Teja sri seethalu (2302001), M.C.A


VERY INTERESTED IN THIS FAKE NEWS DETECTION PROJECT AND NOW
I KNOWN WHAT IS REAL NEWS OR WHAT IS FAKE NEWS.

…..~THANKYOU SIR GIVING ME THIS VALUABLE OPPORTUNITY~…..

You might also like