0% found this document useful (0 votes)
12 views

Internship Report

Uploaded by

22311a1271
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Internship Report

Uploaded by

22311a1271
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

A Summer Industry Internship – II Report

On

FAKE NEWS DETECTION


during
IV Year I Semester Summer
submitted to
The Department of Information Technology

In partial fulfillment of the academic requirements of


Jawaharlal Nehru Technological University
for
The award of the degree of

Bachelor of Technology
in
Information Technology

By
J. Poornima 20311A1201
P. Hasmita Reddy 20311A1202
Srinidhi Yedulla 20311A1232

Sreenidhi Institute of Science and Technology


Yamnampet, Ghatkesar, R.R. District, Hyderabad - 501301

An Autonomous Institution

Affiliated to
Jawaharlal Nehru Technology University
Hyderabad - 500085
Department of Information Technology

DEPARTMENT OF INFORMATION TECHNOLOGY


SREENIDHI INSTITUTE OF SCIENCE AND TECHNOLOGY
(An Autonomous Institution)
(Affiliated to JNT University; ISO Certified 9001:2000)
Yamnampet, Ghatkesar, Hyderabad-501301
Ph.Nos : 08415-200597, 08415-325444, 9395533303

CERTIFICATE
This is to certify that the Dissertation entitled “FAKE NEWS DETECTION” is bonafide
workdone and submitted by Jonnalagadda Poornima (20311A1201), P. Hasmita Reddy
(20311A1202), Srinidhi Yedulla (20311A1232) , in partial fulfillment of the requirement for the
award of Degree of Bachelor of Technology in Information Technology, SREENIDHI
INSTITUTE OF SCIENCE AND TECHNOLOGY, Affiliated to Jawaharlal Nehru
Technological University, Hyderabad is a record of bonafide work carried out by us under the
guidance and supervision.

The results presented in this dissertation have been verified and are found to be satisfactory. The
results embodied in this dissertation have not been submitted to any other university for the award
of any other degree or diploma.

Project Internal Guide Project Co-ordinator Head of the Department


Dr. Rohita Y Dr. Rohita Y Dr. SUNIL BHUTADA
Associate professor Associate Professor Professor

External Examiner
Date:-
DECLARATION

We, J. Poornima (20311A1201), P. Hasmita Reddy (20311A1202), Srinidhi Yedulla


(20311A1232), students of SREENIDHI INSTITUTE OF SCIENCE AND
TECHNOLOGY, YAMNAMPET, GHATKESAR, studying IV year I semester,
INFORMATION TECHNOLOGY solemnly declare that the Summer Industry Internship-II
Report, titled “Fake News Detection” is submitted to SREENIDHI INSTITUTE OF
SCIENCE AND TECHNOLOGY for partial fulfillment for the award of degree of Bachelor
of technology in INFORMATION TECHNOLOGY.

It is declared to the best of our knowledge that the work reported does not form part of
any dissertation submitted to any other University or Institute for award of any degree.

Jonnalagadda Poornima(20311A1201)
P. Hasmita Reddy (20311A1202)
Srinidhi Yedulla (20311A1232)
ACKNOWLEDGEMENT

I would like to express my gratitude to all the people behind the screen who helped me to
transform an idea into a real application. I would like to thank my Internship coordinator Dr.
Rohita Y mam, for their technical guidance, constant encouragement and support in carrying
out my project at college.

I profoundly thank Dr. Sunil Bhutada sir, Head of the Department of Information Technology
who has been an excellent guide and also a great source of inspiration to my work.

I would like to express my heart-felt gratitude to my parents without whom I would not have
been privileged to achieve and fulfill my dreams. I am grateful to our principal, Dr. T. Ch.
Siva Reddy, who most ably run the institution and has had the major hand in enabling me to
do my project.

The satisfaction and euphoria that accompany the successful completion of the task would be
great but incomplete without the mention of the people who made it possible with their constant
guidance and encouragement crowns all the efforts with success. In this context, I would like
thank all the other staff members, both teaching and non-teaching, who have extended their
timely help and eased my task.

Jonnalagadda Poornima(20311A1201)
P. Hasmita Reddy (20311A1202)
Srinidhi Yedulla (20311A1232)
FAKE NEWS DETECTION

ABSTRACT

The fake news on social media and various other media is wide spreading and is a
matter of serious concern due to its ability to cause a lot of social and national
damage with destructive impacts. A lot of research is already focused on detecting
it. This project makes an analysis related to fake news detection and explores the
traditional machine learning models to choose the best, in order to create a model
of a product with supervised machine learning algorithm, that can classify fake
news as true or false, by using tools like python scikit-learn, NLP for textual
analysis. This process will result in feature extraction and vectorization; we
propose using Python scikit-learn library to perform tokenization and feature
extraction of text data, because this library contains useful tools like Count
Vectorizer and Tiff Vectorizer. Then, we will perform feature selection methods,
to experiment and choose the best fit features to obtain the highest precision.
Through this project, we aim to contribute to the ongoing efforts in combatting the
spread of misinformation and promoting digital media literacy. The successful
development of an accurate fake news detection system will not only aid
individuals in making informed decisions but also assist content platforms and
social media networks in implementing measures to curb the dissemination of fake
news, thereby fostering a more trustworthy information ecosystem.
FAKE NEWS DETECTION

Preface
Table of Contents

1 INTRODUCTION 1

1.1 INTRODUCTION TO DATA ANALYTICS 1


1.2 INTRODUCTION TO PROJECT 1
1.3 PURPOSE OF THE PROJECT 2
1.4 EXISTING SYSTEM 2
1.5 PROPOSED SYSTEM 3
1.6 MOTIVATION 3
2 SYSTEM ANALYSIS 4

2.1 INTRODUCTION 4
2.2 ANALYSIS MODEL 4
3 SOFTWARE REQUIREMENTS SPECIFICATIONS 8

3.1 FUNCTIONAL REQUIREMENTS 8


3.2 NON-FUNCTIONAL REQUIREMENTS 8
3.3 SOFTWARE REQUIREMENTS 9
3.4 HARDWARE REQUIREMENTS 9
4 SYSTEM DESIGN AND IMPLEMENTATION 10

4.1 SYSTEM ARCHITECTURE 10

4.2 IMPLEMENTATION 11

4.3 UML DIAGRAMS 13

5 OUTPUT SCREENS 16

6 INTERNSHIP FEEDBACK 18
6.1 CHALLENGES FACED 18

7 CONCLUSION AND FUTURE SCOPE 19

8 REFERENCES 20
INTRODUCTION

1.1 INTRODUCTION TO DATA ANALYTICS


Data analytics converts raw data into actionable insights. It includes a range
of tools, technologies, and processes used to find trends and solve problems
by using data. Data analytics can shape business processes, improve decision-
making, and foster business growth.

Data analytics helps companies gain more visibility and a deeper


understanding of their processes and services. It gives them detailed insights
into the customer experience and customer problems. By shifting the
paradigm beyond data to connect insights with action, companies can create
personalized customer experiences, build related digital products, optimize
operations, and increase employee productivity.
Many computing techniques are used in data analytics. The following are
some of the most common ones:

Natural language processing is the technology used to make computers


understand and respond to spoken and written human language. Data analysts
use this technique to process data like dictated notes, voice commands, and
chat messages.

1.2 INTRODUCTION TO PROJECT

In today's information-rich world, where news and information spread


rapidly across digital platforms, the rise of fake news has emerged as a
significant concern. Fake news refers to false or misleading information
presented as genuine news, often with the intent to deceive readers,
manipulate public opinion, or achieve specific agendas. The proliferation of
fake news poses serious threats to society, as it can lead to misinformation,
social discord, and erosion of trust in reliable sources of information.

To combat the dissemination of fake news and preserve the credibility of


digital media, researchers and technologists have turned to the power of
artificial intelligence and machine learning. Fake news detection has evolved
as a cutting-edge field that harnesses advanced algorithms and natural

1
language processing techniques to automatically identify and classify
deceptive content from authentic news.

The primary objective of fake news detection is to build robust and accurate
systems capable of distinguishing between real and fake news articles. These
systems leverage vast amounts of data, comprising both legitimate and
deceptive news, to train machine learning models.

1.3 PURPOSE OF THE PROJECT

The objective of this project is to examine the problems and possible


significances related with the spread of fake news. We will be working on
different fake news data set in which we will apply different machine learning
algorithms to train the data and test it to find which news is the real news or
which one is the fake news. As the fake news is a problem that is heavily
affecting society and our perception of not only the media but also facts and
opinions themselves. By using the artificial intelligence and the machine
learning, the problem can be solved as we will be able to mine the patterns
from the data to maximize well defined objectives. So, our focus is to find
which machine learning algorithm is best suitable for what kind of text
dataset. Also, which dataset is better for finding the accuracies as the
accuracies directly depends on the type of data and the amount of data.

1.4 EXISTING SYSTEM

There exists a large body of research on the topic of machine learning


methods for deception detection, most of it has been focusing on classifying
online reviews and publicly available social media posts. Particularly since
late 2016 during the American Presidential election, the question of
determining 'fake news' has also been the subject of particular attention within
the literature. Conroy, Rubin, and Chen outline several approaches that seem
promising towards the aim of perfectly classify the misleading articles. They
note that simple content-related n-grams and shallow parts-of-speech tagging
have proven insufficient for the classification task, often failing to account for
important context information. Rather, these methods have been shown useful
only in tandem with more complex methods of analysis. Deep Syntax analysis
using Probabilistic Context Free Grammars have been shown to be
particularly valuable in combination with n-gram methods. Feng, Banerjee,

2
and Choi are able to achieve 85%-91% accuracy in deception related
classification tasks using online review corpora.

1.5 PROPOSED SYSTEM

The proposed system aims to develop a machine learning-based solution


for detecting fake news from real news articles. The process starts with data
collection, where a dataset containing labeled news articles (real or fake) is
gathered from various sources. Next, data preprocessing techniques are
applied to clean and prepare the text data, including tokenization,
lowercasing, stemming, and stopword removal. Feature extraction follows,
using methods like TF-IDF or word embeddings to convert the text data into
numerical features. A machine learning algorithm, such as Logistic
Regression, is chosen for model training using the preprocessed and feature-
extracted data. The model's performance is evaluated using metrics like
accuracy, precision, recall, and F1-score on a test dataset. Fine-tuning and
hyperparameter optimization are carried out to enhance the model's accuracy.
The trained model is deployed for real-time prediction, where new news
articles are preprocessed, and the model predicts whether they are real or fake.
Regular monitoring and maintenance ensure the model's effectiveness in
detecting fake news. The report concludes with insights into the system's
performance, limitations, ethical considerations, and suggestions for future
improvements.

1.6 MOTIVATION

We will be training and testing the data, when we use supervised


learning it means we are labeling the data. By getting the testing and training
data and labels we can perform different machine learning algorithms but
before performing the predictions and accuracies, the data is need to be
preprocessing i.e. the null values which are not readable are required to be
removed from the data set and the data is required to 2 be converted into
vectors by normalizing and tokening the data so that it could be understood
by the machine. Next step is by using this data, getting the visual reports,
which we will get by using the Mat Plot Library of Python and Sickit Learn.
This library helps us in getting the results in the form of histograms, pie charts
or bar charts.

3
2. SYSTEM ANALYSIS

2.1 INTRODUCTION

System analysis of fake news detection involves a comprehensive


examination and evaluation of the techniques, algorithms, and methodologies
employed in the process of identifying deceptive content in news articles. This
analytical approach aims to assess the efficiency, accuracy, scalability, and
ethical considerations of fake news detection systems. By studying the
strengths and limitations of various components, such as data preprocessing,
feature engineering, and machine learning models, system analysis plays a
vital role in advancing the effectiveness and reliability of fake news detection
technologies. The insights gained from system analysis contribute to the
development of more robust and trustworthy solutions to combat the
dissemination of fake news, fostering a more informed and credible digital
information ecosystem.

2.2 ANALYSIS MODEL

TECHNOLOGIES
PYTHON:

Python is a high-level, interpreted, interactive and object-oriented scripting


language. Python is designed to be highly readable. It uses English keywords
frequently where as other languages use punctuation, and it has fewer
syntactical constructions than other languages.

➢ Python is Interpreted − Python is processed at runtime by the


interpreter. You do not need to compile your program before executing
it.

➢ Python is Interactive − You can actually sit at a Python prompt and


interact with the interpreter directly to write your programs.

➢ Python is Object-Oriented − Python supports Object-Oriented style or


technique of programming that encapsulates code within objects.

➢ Python is a Beginner's Language − Python is a great language for the

4
beginner-level programmers and supports the development of a wide
range of applications from simple text processing to WWW browsers to
games.

MACHINE LEARNING:

Machine learning is a transformative field of artificial intelligence that


empowers computers to learn and improve from experience without being
explicitly programmed. By utilizing algorithms and statistical techniques,
machine learning enables systems to identify patterns, make predictions, and
gain insights from vast amounts of data.

This technology has revolutionized various industries, from healthcare and


finance to autonomous vehicles and natural language processing. With its
ability to handle complex tasks and adapt to new information, machine
learning continues to drive innovation, making it a cornerstone of the AI
revolution and shaping the future of problem-solving and decision-making.

ALGORITHMS:

TF-IDF:

TF-IDF stands for Term Frequency Inverse Document Frequency of


records. It can be defined as the calculation of how relevant a word in a series
or corpus is to a text. The meaning increases proportionally to the number
of times in the text a word appears but is compensated by the word frequency
in the corpus (data-set).

Term Frequency: In document d, the frequency represents the number of


instances of a given word t. Therefore, we can see that it becomes more
relevant when a word appears in the text, which is rational. Since the
ordering of terms is not significant, we can use a vector to describe the text
in the bag of term models. For each specific term in the paper, there is an
entry with the value being the term frequency.

tf(t,d) = count of t in d / number of words in d


Document Frequency: This tests the meaning of the text, which is very
similar to TF, in the whole corpus collection. The only difference is that in
document d, TF is the frequency counter for a term t, while df is the number

5
of occurrences in the document set N of the term t. In other words, the
number of papers in which the word is present is DF.

df(t) = occurrence of t in documents

Inverse Document Frequency: Mainly, it tests how relevant the word is.
The key aim of the search is to locate the appropriate records that fit the
demand. First, find the document frequency of a term t by counting the
number of documents containing the term:

df(t) = N(t)

where

df(t) = Document frequency of a term t

N(t) = Number of documents containing the term t

The IDF of the word is the number of documents in the corpus separated by
the frequency of the text.

idf(t) = N/ df(t) = N/N(t)

The more common word is supposed to be considered less significant, but


the element (most definite integers) seems too harsh. We then take the
logarithm (with base 2) of the inverse frequency of the paper. So the if of
the term t becomes:

idf(t) = log(N/ df(t))

Usually, the tf-idf weight consists of two terms-

1. Normalized Term Frequency (tf)

2. Inverse Document Frequency (idf)

tf-idf(t, d) = tf(t, d) * idf(t)

LOGISTIC REGRESSION:

Logistic Regression is a supervised learning algorithm used for binary


classification tasks, where the goal is to predict the probability of an input
data point belonging to one of two classes (e.g., yes/no, spam/not spam). The
algorithm models the relationship between the input features and the binary
6
outcome using the logistic function (also known as the sigmoid function).

1.Sigmoid Function (σ(z)):

The sigmoid function takes any real-valued number as input and maps it to a
value between 0 and 1, representing the probability of the data point belonging
to the positive class (class 1). It is defined as follows:

σ(z) = 1 / (1 + e^(-z))

where:

z is the linear combination of input features and their corresponding weights:

z = β₀ + β₁x₁ + β₂x₂ + ... + βᵣxᵣ

β₀, β₁, β₂, ..., βᵣ are the coefficients (weights) associated with each feature.

x₁, x₂, ..., xᵣ are the input features.

e is Euler's number (approximately 2.71828).

2.Logistic Regression Model:

The logistic regression model predicts the probability (p) of the data point
belonging to the positive class (class 1) using the sigmoid function. The
probability of belonging to the negative class (class 0) can be calculated as (1
- p).

The logistic regression model can be written as:

p(y=1 | x) = σ(z)

where:

p(y=1 | x) is the probability of the data point belonging to class 1 (positive


class) given the input features x.

σ(z) is the sigmoid function defined above.

Training the logistic regression model involves finding the optimal values of
the coefficients (β₀, β₁, β₂, ..., βᵣ) that best fit the training data. This is typically
done using optimization techniques like gradient descent, which minimizes a
cost function that quantifies the difference between the predicted probabilities

7
and the true class labels in the training data.

3.SOFTWARE REQUIREMENTS SPECIFICATIONS

3.1 FUNCTIONAL REQUIREMENTS

1.Data Preprocessing:

• Text data is preprocessed to remove special characters and irrelevant


information using regular expressions ('re' library).

• Stopwords are removed using 'nltk.corpus.stopwords'.

2.Feature Extraction and Representation:

• The TfidfVectorizer from scikit-learn is used to convert the preprocessed


text data into numerical features using TF-IDF representation.

3.Model Training and Evaluation:

• A machine learning algorithm (Logistic Regression) is selected and trained


using the 'fit()' method from scikit-learn.

• The model's performance is evaluated using the accuracy_score function


from scikit-learn on both the training and test data.

4.Real-time Prediction:

• A new data point ('X_test[3]') is used for real-time prediction to check if


the news is real or fake.

• The trained model's 'predict()' method is used to predict the label for the
new data point.

5.Monitoring and Maintenance:

• The model's accuracy on the training and test data is computed to monitor
its performance.

8
3.2 NON-FUNCTIONAL REQUIREMENTS

1.Portability:

It is the usability of the same software in different environments. The project


can be run in any operating system.

2.Performance :

These requirements determine the resources required, time interval,


throughput and everything that deals with the performance of the system.

3. Accuracy :

The result of the requesting query is very accurate and high speed of retrieving
information. The degree of security provided by the system is high and
effective.

4.Maintainability :

Project is simple as further updates can be easily done without affecting its
stability. Maintainability basically defines that how easy it is to maintain the
system. It means that how easy it is to maintain the system, analyse, change
and test the application.

3.3 HARDWARE REQUIREMENTS:

• System - Pentium-IV

• Speed - 2.4GHZ

• Hard disk - 40GB

• Monitor - 15VGA color

• RAM - 512MB

3.4 SOFTWARE REQUIREMENTS:

9
• Operating System - Windows XP

• Coding languag

4.SYSTEM DESIGN AND IMPLEMENTATION

4.1 SYSTEM ARCHITECTURE

The system architecture for fake news detection involves data collection from
a CSV file containing labeled news articles. The text data is preprocessed to
remove special characters, tokenized into words, and stemmed. TF-IDF
vectorization is applied for feature extraction. The Logistic Regression model
is trained on the feature-extracted data and evaluated for performance using
accuracy metrics on test data. The system is capable of making real-time
predictions for new news articles. Logging mechanisms handle errors and
system activities for debugging. The architecture aims to efficiently detect
fake news from real news articles using machine learning and natural
language processing techniques.

10
4.2 IMPLEMENTATION

A. Data Use - By using pandas, we can read the .csv file and then we can
display the shape of the dataset with that we can also display the dataset in
the correct form. We will be training and testing the data, when we use
supervised learning it means we are labeling the data.

11
B.Data Preprocessing - Data preprocessing in the above code involves several
essential steps to prepare the news articles' raw text data for further analysis
and model training. First, special characters, numbers, and punctuation marks
are removed using regular expressions. Next, all words in the text are converted
to lowercase to ensure case-insensitive processing. Tokenization is then
applied to split the text into individual words or tokens. The Porter Stemmer
from 'nltk.stem.porter' is utilized for stemming words to their base form. Lastly,
common stopwords are removed using the 'nltk.corpus.stopwords' module.

C.Feature Extraction - feature extraction is performed using the TF-IDF


(Term Frequency-Inverse Document Frequency) representation. TF-IDF is a
numerical statistic that evaluates the importance of a word in a document
relative to a corpus of documents. The TfidfVectorizer from
'sklearn.feature_extraction.text' is utilized to convert the preprocessed text data
into a numerical feature matrix. The TF-IDF scores are computed for each word
in the corpus of news articles, taking into account its frequency in the current
document (news article) and its rarity across the entire corpus.

D.Model Training – model training involves utilizing the Logistic Regression


12
algorithm for fake news detection. After preprocessing the text data and
extracting features using TF-IDF, the data is split into training and test sets.
The Logistic Regression model is then selected and initialized using the
'sklearn.linear_model.LogisticRegression()' function. The training data
('X_train' and 'Y_train') is used to fit the model using the 'fit()' method, which
enables the model to learn patterns and relationships between the TF-IDF
features and the corresponding news article labels (0 for real and 1 for fake
news). The accuracy of the trained model is evaluated on the test data using the
'accuracy_score()' function from 'sklearn.metrics', providing a measure of the
model's performance in distinguishing between real and fake news articles.

E.Model Prediction - model prediction refers to the process of using the


trained Logistic Regression model to predict the authenticity (real or fake) of
new, unseen news articles in real-time. After preprocessing the text data and
extracting TF-IDF features for the test set, the model is utilized to make
predictions on these features using the 'predict()' method. The model outputs
predicted labels for the test data, which are then compared with the true labels
(ground truth) to evaluate the model's accuracy on the test data.

13
4.3 UML DIAGRAMS

1. USECASE DIAGRAM:

The purpose of use case diagram is to capture the dynamic aspect of a system.
This is used to gather the requirements of a system including internal and
external influences. The main purpose of a use case diagram is to show what
system functions are performed for which actor. Roles of the actors in the
system can be depicted. The UML is a very important part of developing
object-oriented software and the software development process. The UML
uses mostly graphical notations to express the design of software projects.

14
2.CLASS DIAGRAM:

The class diagram describes the structure of a system by showing the system's
classes, their attributes, operations, and the relationships among the classes. It
explains which class contains system and also explains the responsibility of the
system. This is also known as structural diagram.

15
3.SEQUENCE DIAGRAM:

A sequence diagram details the interaction between objects in a sequential


order i.e. the order in which these interactions take place.

These diagrams sometimes known as event diagrams or event scenarios. This


helps in understanding how the objects and component interacts to execute
the process. This has two dimensions which represents time (Vertical) and
different objects (Horizontal).

16
5.OUTPUT SCREENS

17
18
6. INTERNSHIP FEEDBACK

During the internship focused on fake news detection using data analytics,
there were notable positive aspects to commend. The project's choice aligned
with the timely concern of misinformation, demonstrating a keen
understanding of current issues. Proficiency in utilizing data analytics
techniques to discern patterns and trends within the dataset was evident, and
the successful implementation of algorithms for fake news detection
showcased technical competence. Effective communication of findings,
making the complexities of fake news detection accessible to a non-technical
audience, was a noteworthy skill.

6.1 CHALLENGES FACED

However, the internship also presented its set of challenges. Dealing with data
quality issues, including noise and unreliability, required thoughtful strategies.
Addressing the imbalance between genuine and fake news instances in the
dataset posed a common challenge, and difficulties in feature engineering were
encountered in selecting informative features for distinguishing between real
and fake news. Model tuning, with potential overfitting or underfitting issues,
added another layer of complexity. Ethical considerations, such as avoiding
biases in the model, were also navigated.

From a professional development perspective, the internship provided a


valuable learning curve, showcasing adaptability in the face of challenges.
Reflecting on new skills or tools acquired in the field of data analytics further
demonstrated growth. Collaborative efforts within the team highlighted
effective teamwork and contributions to the project's success.

19
7.CONCLUSION AND FUTURE SCOPE

In conclusion, the fake news detection project successfully implemented a


machine learning-based approach to identify and classify fake news articles
from real ones. The system utilized the Logistic Regression algorithm and TF-
IDF feature extraction to preprocess and analyze news articles' content
effectively. The model demonstrated promising accuracy in distinguishing
between fake and real news, with its performance evaluated on both training
and test datasets. By achieving high accuracy on the test data, the system
showcased its ability to generalize to unseen articles, which is essential for
practical application.

The project's results indicated that the combination of natural language


processing techniques and machine learning algorithms can be a valuable tool
in combating misinformation and promoting reliable information
dissemination.

Future improvements may include incorporating additional advanced


machine learning algorithms, fine-tuning hyperparameters, and exploring
ensemble methods to further enhance accuracy and robustness. With
continued monitoring and maintenance, the fake news detection system has
the potential to play a crucial role in ensuring information credibility and
fostering a more informed society.

We can also implement the following:

• Continuous Model Monitoring: Develop a monitoring system to


regularly evaluate the model's performance in real-world scenarios,
allowing for timely updates and improvements as new data becomes
available.

• User Interface: Build a user-friendly interface to interact with the


system, allowing users to input and verify news articles' authenticity,
making it accessible and practical for end-users.

• Real-time Prediction Pipeline: Create a real-time prediction pipeline to


handle incoming news articles and provide rapid predictions, ensuring
the system's practical usability for real-world applications.

20
REFERENCES

[1]https://www.researchgate.net/publication/339022255_A_smart_System_fo
r_Fake_News_Detection_Using_Machine_Learning

[2] https://link.springer.com/article/10.1007/s41060-021-00302-z#Sec4

[3]https://iopscience.iop.org/article/10.1088/1757-899X/1099/1/012040/pdf

[4]https://www.researchgate.net/publication/343916946_Fake_News_Detecti
on_Using_Machine_Learning_Algorithms

[5]https://www.hindawi.com/journals/wcmc/2022/1575365/?utm_source=goo
gle&utm_medium=cpc&utm_campaign=HDW_MRKT_GBL_SUB_ADWO
_PAI_DYNA_JOUR_X_PJ_GROUP4_Geostrategy&gclid=Cj0KCQiAo-
yfBhD_ARIsANr56g5Fa1NmOux5XQMPti0vHWysrEbRCFKWrW1SFLAX
zTFincXkIBNcACIaAsPbEALw_wcB

21
APPENDIX A

Sreenidhi Institute of Science and Technology


Department Of INFORMATION TECHNOLOGY

SUMMER INDUSTRY INTERNSHIP - II

Batch No:1

Roll No Name Title

20311A1201 J. POORNIMA
20311A1202 P.HASMITA
REDDY FAKE NEWS DETECTION
20311A1232 SRINIDHI
YEDULLA

ABSTRACT
The fake news on social media and various other media is wide spreading and is a
matter of serious concern due to its ability to cause a lot of social and national
damage with destructive impacts. A lot of research is already focused on detecting
it. This project makes an analysis related to fake news detection and explores the
traditional machine learning models to choose the best, in order to create a model
of a product with supervised machine learning algorithm, that can classify fake
news as true or false, by using tools like python scikit-learn, NLP for textual
analysis. This process will result in feature extraction and vectorization; we
propose using Python scikit-learn library to perform tokenization and feature
extraction of text data, because this library contains useful tools like Count
Vectorizer and Tiff Vectorizer.

Student 1: J. Poornima Student 2: P. Hasmita Reddy Student 3: Srinidhi Yedulla

Project Coordinator Internal Guide HOD-IT


Dr. ROHITA Y Dr. ROHITA Y Dr. SUNIL BHUTADA
Associate Professor Associate Professor Professor

22
APPENDIX B
CORRELATION BETWEEN THE GROUP
PROJECT AND THE PROGRAM
OUTCOMES (POS), PROGRAM SPECIFIC
OUTCOMES (PSOS)

Batch No: 1
Title
Roll No Name
20311A1201 J. POORNIMA

20311A1202 P. HASMITA REDDY FAKE NEWS DETECTION

20311A1232 SRINIDHI YEDULLA

Table 1: Project/Internship correlation with appropriate POs/PSOs (Please


specify level of Correlation, H/M/L against POs/PSOs)

H High M Moderate L Low

SREENIDHI INSTITUTE OF SCIENCE AND TECHNOLOGY


DEPARTMENT OF INFORMATION TECHNOLOGY
Projects Correlation with POs/PSOs

PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2 PSO3

M L L H H L M H M H H H H H M

Student 1: J. Poornima Student 2: P. Hasmita Reddy Student 3: Srinidhi Yedulla

Project Coordinator Internal Guide HOD-IT


Dr. ROHITA Y Dr. ROHITA Y Dr. SUNIL BHUTADA
Associate Professor Associate Professor Professor

23
APPENDIX C

DOMAIN OF THE PROJECT AND NATURE OF THE PROJECT

Batch No: 1
Title
Roll No Name

20311A1201 J. POORNIMA
P. HASMITA REDDY FAKE NEWS DETECTION
20311A1202

20311A1232 SRINIDHI YEDULLA

Table 2: Nature of the Project/Internship work (Please tick √ Appropriate for your project)

Batch No. Title Nature of project

Product Application Research Others(specify)

FAKE NEWS
1 DETECTION

Student 1: J. Poornima Student 2: P. Hasmita Reddy Student 3: Srinidhi Yedulla

Project Coordinator Internal Guide HOD-IT


Dr. ROHITA Y Dr. ROHITA Y Dr. SUNIL BHUTADA
Associate Professor Associate Professor Professor

24
Table 3: Domain of the Project/ Internship work (Please tick √ Appropriate for your project)

Domain Of The Project

DATA SOFTWA
Batch Title ARTIFICIAL COMPUTER WAREHO CLOUD RE
No INTELLIGE NETWORKS USING, COMPUTIN ENGINEE
NCE, , DATA G, RING,
MACHINE INFORMATI MINING, INTERNET IMAGE
LEARNING ON AND BIG OF THINGS PROCESS
AND DEEP SECURITY, DATA ING
LEARNING CYBER ANALYTI
SECURITY CS

FAKE NEWS
DETECTION
1

Student 1: J. Poornima Student 2: P. Hasmita Reddy Student 3: Srinidhi Yedulla

Project Coordinator Internal Guide HOD-IT


Dr. ROHITA Y Dr. ROHITA Y Dr. SUNIL BHUTADA
Associate Professor Associate Professor Professor

25

You might also like