Internship Report
Internship Report
On
Bachelor of Technology
in
Information Technology
By
J. Poornima 20311A1201
P. Hasmita Reddy 20311A1202
Srinidhi Yedulla 20311A1232
An Autonomous Institution
Affiliated to
Jawaharlal Nehru Technology University
Hyderabad - 500085
Department of Information Technology
CERTIFICATE
This is to certify that the Dissertation entitled “FAKE NEWS DETECTION” is bonafide
workdone and submitted by Jonnalagadda Poornima (20311A1201), P. Hasmita Reddy
(20311A1202), Srinidhi Yedulla (20311A1232) , in partial fulfillment of the requirement for the
award of Degree of Bachelor of Technology in Information Technology, SREENIDHI
INSTITUTE OF SCIENCE AND TECHNOLOGY, Affiliated to Jawaharlal Nehru
Technological University, Hyderabad is a record of bonafide work carried out by us under the
guidance and supervision.
The results presented in this dissertation have been verified and are found to be satisfactory. The
results embodied in this dissertation have not been submitted to any other university for the award
of any other degree or diploma.
External Examiner
Date:-
DECLARATION
It is declared to the best of our knowledge that the work reported does not form part of
any dissertation submitted to any other University or Institute for award of any degree.
Jonnalagadda Poornima(20311A1201)
P. Hasmita Reddy (20311A1202)
Srinidhi Yedulla (20311A1232)
ACKNOWLEDGEMENT
I would like to express my gratitude to all the people behind the screen who helped me to
transform an idea into a real application. I would like to thank my Internship coordinator Dr.
Rohita Y mam, for their technical guidance, constant encouragement and support in carrying
out my project at college.
I profoundly thank Dr. Sunil Bhutada sir, Head of the Department of Information Technology
who has been an excellent guide and also a great source of inspiration to my work.
I would like to express my heart-felt gratitude to my parents without whom I would not have
been privileged to achieve and fulfill my dreams. I am grateful to our principal, Dr. T. Ch.
Siva Reddy, who most ably run the institution and has had the major hand in enabling me to
do my project.
The satisfaction and euphoria that accompany the successful completion of the task would be
great but incomplete without the mention of the people who made it possible with their constant
guidance and encouragement crowns all the efforts with success. In this context, I would like
thank all the other staff members, both teaching and non-teaching, who have extended their
timely help and eased my task.
Jonnalagadda Poornima(20311A1201)
P. Hasmita Reddy (20311A1202)
Srinidhi Yedulla (20311A1232)
FAKE NEWS DETECTION
ABSTRACT
The fake news on social media and various other media is wide spreading and is a
matter of serious concern due to its ability to cause a lot of social and national
damage with destructive impacts. A lot of research is already focused on detecting
it. This project makes an analysis related to fake news detection and explores the
traditional machine learning models to choose the best, in order to create a model
of a product with supervised machine learning algorithm, that can classify fake
news as true or false, by using tools like python scikit-learn, NLP for textual
analysis. This process will result in feature extraction and vectorization; we
propose using Python scikit-learn library to perform tokenization and feature
extraction of text data, because this library contains useful tools like Count
Vectorizer and Tiff Vectorizer. Then, we will perform feature selection methods,
to experiment and choose the best fit features to obtain the highest precision.
Through this project, we aim to contribute to the ongoing efforts in combatting the
spread of misinformation and promoting digital media literacy. The successful
development of an accurate fake news detection system will not only aid
individuals in making informed decisions but also assist content platforms and
social media networks in implementing measures to curb the dissemination of fake
news, thereby fostering a more trustworthy information ecosystem.
FAKE NEWS DETECTION
Preface
Table of Contents
1 INTRODUCTION 1
2.1 INTRODUCTION 4
2.2 ANALYSIS MODEL 4
3 SOFTWARE REQUIREMENTS SPECIFICATIONS 8
4.2 IMPLEMENTATION 11
5 OUTPUT SCREENS 16
6 INTERNSHIP FEEDBACK 18
6.1 CHALLENGES FACED 18
8 REFERENCES 20
INTRODUCTION
1
language processing techniques to automatically identify and classify
deceptive content from authentic news.
The primary objective of fake news detection is to build robust and accurate
systems capable of distinguishing between real and fake news articles. These
systems leverage vast amounts of data, comprising both legitimate and
deceptive news, to train machine learning models.
2
and Choi are able to achieve 85%-91% accuracy in deception related
classification tasks using online review corpora.
1.6 MOTIVATION
3
2. SYSTEM ANALYSIS
2.1 INTRODUCTION
TECHNOLOGIES
PYTHON:
4
beginner-level programmers and supports the development of a wide
range of applications from simple text processing to WWW browsers to
games.
MACHINE LEARNING:
ALGORITHMS:
TF-IDF:
5
of occurrences in the document set N of the term t. In other words, the
number of papers in which the word is present is DF.
Inverse Document Frequency: Mainly, it tests how relevant the word is.
The key aim of the search is to locate the appropriate records that fit the
demand. First, find the document frequency of a term t by counting the
number of documents containing the term:
df(t) = N(t)
where
The IDF of the word is the number of documents in the corpus separated by
the frequency of the text.
LOGISTIC REGRESSION:
The sigmoid function takes any real-valued number as input and maps it to a
value between 0 and 1, representing the probability of the data point belonging
to the positive class (class 1). It is defined as follows:
σ(z) = 1 / (1 + e^(-z))
where:
β₀, β₁, β₂, ..., βᵣ are the coefficients (weights) associated with each feature.
The logistic regression model predicts the probability (p) of the data point
belonging to the positive class (class 1) using the sigmoid function. The
probability of belonging to the negative class (class 0) can be calculated as (1
- p).
p(y=1 | x) = σ(z)
where:
Training the logistic regression model involves finding the optimal values of
the coefficients (β₀, β₁, β₂, ..., βᵣ) that best fit the training data. This is typically
done using optimization techniques like gradient descent, which minimizes a
cost function that quantifies the difference between the predicted probabilities
7
and the true class labels in the training data.
1.Data Preprocessing:
4.Real-time Prediction:
• The trained model's 'predict()' method is used to predict the label for the
new data point.
• The model's accuracy on the training and test data is computed to monitor
its performance.
8
3.2 NON-FUNCTIONAL REQUIREMENTS
1.Portability:
2.Performance :
3. Accuracy :
The result of the requesting query is very accurate and high speed of retrieving
information. The degree of security provided by the system is high and
effective.
4.Maintainability :
Project is simple as further updates can be easily done without affecting its
stability. Maintainability basically defines that how easy it is to maintain the
system. It means that how easy it is to maintain the system, analyse, change
and test the application.
• System - Pentium-IV
• Speed - 2.4GHZ
• RAM - 512MB
9
• Operating System - Windows XP
• Coding languag
The system architecture for fake news detection involves data collection from
a CSV file containing labeled news articles. The text data is preprocessed to
remove special characters, tokenized into words, and stemmed. TF-IDF
vectorization is applied for feature extraction. The Logistic Regression model
is trained on the feature-extracted data and evaluated for performance using
accuracy metrics on test data. The system is capable of making real-time
predictions for new news articles. Logging mechanisms handle errors and
system activities for debugging. The architecture aims to efficiently detect
fake news from real news articles using machine learning and natural
language processing techniques.
10
4.2 IMPLEMENTATION
A. Data Use - By using pandas, we can read the .csv file and then we can
display the shape of the dataset with that we can also display the dataset in
the correct form. We will be training and testing the data, when we use
supervised learning it means we are labeling the data.
11
B.Data Preprocessing - Data preprocessing in the above code involves several
essential steps to prepare the news articles' raw text data for further analysis
and model training. First, special characters, numbers, and punctuation marks
are removed using regular expressions. Next, all words in the text are converted
to lowercase to ensure case-insensitive processing. Tokenization is then
applied to split the text into individual words or tokens. The Porter Stemmer
from 'nltk.stem.porter' is utilized for stemming words to their base form. Lastly,
common stopwords are removed using the 'nltk.corpus.stopwords' module.
13
4.3 UML DIAGRAMS
1. USECASE DIAGRAM:
The purpose of use case diagram is to capture the dynamic aspect of a system.
This is used to gather the requirements of a system including internal and
external influences. The main purpose of a use case diagram is to show what
system functions are performed for which actor. Roles of the actors in the
system can be depicted. The UML is a very important part of developing
object-oriented software and the software development process. The UML
uses mostly graphical notations to express the design of software projects.
14
2.CLASS DIAGRAM:
The class diagram describes the structure of a system by showing the system's
classes, their attributes, operations, and the relationships among the classes. It
explains which class contains system and also explains the responsibility of the
system. This is also known as structural diagram.
15
3.SEQUENCE DIAGRAM:
16
5.OUTPUT SCREENS
17
18
6. INTERNSHIP FEEDBACK
During the internship focused on fake news detection using data analytics,
there were notable positive aspects to commend. The project's choice aligned
with the timely concern of misinformation, demonstrating a keen
understanding of current issues. Proficiency in utilizing data analytics
techniques to discern patterns and trends within the dataset was evident, and
the successful implementation of algorithms for fake news detection
showcased technical competence. Effective communication of findings,
making the complexities of fake news detection accessible to a non-technical
audience, was a noteworthy skill.
However, the internship also presented its set of challenges. Dealing with data
quality issues, including noise and unreliability, required thoughtful strategies.
Addressing the imbalance between genuine and fake news instances in the
dataset posed a common challenge, and difficulties in feature engineering were
encountered in selecting informative features for distinguishing between real
and fake news. Model tuning, with potential overfitting or underfitting issues,
added another layer of complexity. Ethical considerations, such as avoiding
biases in the model, were also navigated.
19
7.CONCLUSION AND FUTURE SCOPE
20
REFERENCES
[1]https://www.researchgate.net/publication/339022255_A_smart_System_fo
r_Fake_News_Detection_Using_Machine_Learning
[2] https://link.springer.com/article/10.1007/s41060-021-00302-z#Sec4
[3]https://iopscience.iop.org/article/10.1088/1757-899X/1099/1/012040/pdf
[4]https://www.researchgate.net/publication/343916946_Fake_News_Detecti
on_Using_Machine_Learning_Algorithms
[5]https://www.hindawi.com/journals/wcmc/2022/1575365/?utm_source=goo
gle&utm_medium=cpc&utm_campaign=HDW_MRKT_GBL_SUB_ADWO
_PAI_DYNA_JOUR_X_PJ_GROUP4_Geostrategy&gclid=Cj0KCQiAo-
yfBhD_ARIsANr56g5Fa1NmOux5XQMPti0vHWysrEbRCFKWrW1SFLAX
zTFincXkIBNcACIaAsPbEALw_wcB
21
APPENDIX A
Batch No:1
20311A1201 J. POORNIMA
20311A1202 P.HASMITA
REDDY FAKE NEWS DETECTION
20311A1232 SRINIDHI
YEDULLA
ABSTRACT
The fake news on social media and various other media is wide spreading and is a
matter of serious concern due to its ability to cause a lot of social and national
damage with destructive impacts. A lot of research is already focused on detecting
it. This project makes an analysis related to fake news detection and explores the
traditional machine learning models to choose the best, in order to create a model
of a product with supervised machine learning algorithm, that can classify fake
news as true or false, by using tools like python scikit-learn, NLP for textual
analysis. This process will result in feature extraction and vectorization; we
propose using Python scikit-learn library to perform tokenization and feature
extraction of text data, because this library contains useful tools like Count
Vectorizer and Tiff Vectorizer.
22
APPENDIX B
CORRELATION BETWEEN THE GROUP
PROJECT AND THE PROGRAM
OUTCOMES (POS), PROGRAM SPECIFIC
OUTCOMES (PSOS)
Batch No: 1
Title
Roll No Name
20311A1201 J. POORNIMA
PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2 PSO3
M L L H H L M H M H H H H H M
23
APPENDIX C
Batch No: 1
Title
Roll No Name
20311A1201 J. POORNIMA
P. HASMITA REDDY FAKE NEWS DETECTION
20311A1202
Table 2: Nature of the Project/Internship work (Please tick √ Appropriate for your project)
FAKE NEWS
1 DETECTION
24
Table 3: Domain of the Project/ Internship work (Please tick √ Appropriate for your project)
DATA SOFTWA
Batch Title ARTIFICIAL COMPUTER WAREHO CLOUD RE
No INTELLIGE NETWORKS USING, COMPUTIN ENGINEE
NCE, , DATA G, RING,
MACHINE INFORMATI MINING, INTERNET IMAGE
LEARNING ON AND BIG OF THINGS PROCESS
AND DEEP SECURITY, DATA ING
LEARNING CYBER ANALYTI
SECURITY CS
FAKE NEWS
DETECTION
1
√
25