PLAGIARISM CHECKER
PLAGIARISM CHECKER
TUMAKURU-572105
(A Constitute College of Sri Siddartha Academy of Higher Education)
Mini-Project(CS6MP1) Report On
THEJAS K U (21CS122)
MONISH S (21CS123)
CERTIFICATE
Certified that the mini project work entitled “SIDE BY SIDE PLAGIARISM CHECKER”
is a bonafide work being carried out by THEJAS K U (21CS122),MONISH S (21CS123) in
partial fulfillment for the completion of VI Semister of Bachelor of Engineering in Depart-
ment of Computer Science & Engineering from Sri Siddhartha Institute of Technology,A
Constitute College of Sri Siddartha Academy of Higher Education during the academic
year 2024-25. It is certified that all corrections/suggestions indicated for internal assess-
ment have been incorporated in the report deposited in the department library. The mini
Project report has been approved as it satisfies the academic requirements in respect of
mini project work prescribed for the Bachelor of Engineering degree.
THEJAS K U (21CS122)
MONISH S (21CS123)
Contents
Abstract 1
1 Introduction 2
1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Literature Survey 4
2.1 Existing System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Proposed System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Requirements 6
3.1 System Hardware: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Software Requirements: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4 Design 8
4.1 ER Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.2 Data Flow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5 Source Code 9
6 Snapshots 17
6.1 Home . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.2 Importing Text file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.3 Importing Data file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.4 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.5 Searching a word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Conclusion 19
Bibliography 20
Abstract
The plagiarism checking using Python is a software application designed to help users
check for plagiarism in any document or text file they upload to the system. It is particu-
larly useful for academic institutions, research firms, and writers who want to ensure that
their work is original and free from any plagiarism. This project is important because
plagiarism is a major issue in academia and other fields, where it can lead to academic
misconduct, professional repercussions, and legal issues. This application is designed to
solve the problem of plagiarism by making it easier for users to check their work for
originality before submitting it. The app uses Selenium and Beautiful Soup to automate
the process of uploading files and retrieving plagiarism reports.The application uses a
Python script to automate the process of submitting written content to retrieving the
plagiarism report.The application is applicable to anyone who needs to submit written
work, including students, academics,and professionals. It is particularly useful for educa-
tional institutions that need to check large volumes of written work for plagiarism. The
application is compatible with file format like TXT. The application is released under an
open-source license, meaning it is freely available to anyone who wishes to use it.
1
Chapter 1
Introduction
Plagiarism is an unethical act of using someone else’s work or ideas without giving them
credit, which is a growing problem in various fields.In the digital age, with vast amounts
of information available online, the issue of plagiarism has become increasingly prevalent.
Whether in academia or content creation, ensuring originality is crucial. To tackle this
challenge, we introduce a Python-based plagiarism checker. Leveraging the power of nat-
ural language processing and text similarity algorithms, our tool provides an efficient and
reliable way to detect potential plagiarism.With this plagiarism checker, users can com-
pare texts identifying similarities and potential instances of plagiarism.This introduction
serves as a gateway to explore the functionality and implementation of our plagiarism
checker, empowering users to uphold integrity and originality in their work.
1.2 Aim
1.Prevention of Academic Fraud: Stop students from cheating by copying others
work.
2.Quality Assurance in Writing: Check that writing is good and original.
2
Side By Side Plagiarism Checker 2024-25
3.Continuous Improvement: Keep making the plagiarism checker better to catch new
types of cheating.
4.Scalability Optimization: Optimize the implementation for scalability to handle
large data efficiently, possibly by leveraging parallel processing or distributed computing
techniques.
5.Improved User Experience: To enhance user satisfaction by offering intuitive inter-
faces, efficient search functionalities, and personalized services.
6.Protection of Intellectual Property: Make sure authors get credit for their own
work.
1.3 Objectives
1.Implement algorithms for text analysis.
2.Utilize techniques for similarity detection.
3.Ensure scalability for handling large datasets efficiently.
4.Develop an intuitive interface for user interaction.
5. Support analysis of diverse content types.
6.Strive for high accuracy and reliability.
7. Provide comprehensive documentation and support resources.
8.Provide clear results for actionable insights.
The existing system for a plagiarism checker typically involves manual methods of com-
paring text against existing sources. This could include educators or individuals reading
through written work and conducting searches on search engines or online databases to
4
Side By Side Plagiarism Checker 2024-25
check for similarities. However, this manual approach is time-consuming, labor-intensive,
and prone to human error. It lacks the efficiency and accuracy. Additionally, the old sys-
tem may lack the ability to handle large volumes of text efficiently, making it impractical
for use in academic or professional settings with high demands for plagiarism checking.
The proposed system for a plagiarism checker project aims to enhance accuracy, efficiency,
and user experience through several key features. It would incorporate advanced auto-
mated detection algorithms. Integration with learning platforms and writing tools would
streamline access for users, while customization options would allow users to tailor set-
tings to specific needs and policies. Real-time analysis and detailed reports would provide
immediate feedback and actionable insights to users, facilitating a better understanding
and more effective addressing of plagiarism issues. Security measures would ensure the
protection of user data, and scalability would enable the system to handle large volumes
of text and growing user bases without compromising performance. Continuous improve-
ment and user education support would further enhance the project’s effectiveness in
promoting academic integrity and originality.
6
Side By Side Plagiarism Checker 2024-25
2.Object-Oriented: Python supports object-oriented programming (OOP) principles, al-
lowing you to create and manipulate objects with associated attributes and methods.
3.High-Level: Python abstracts low-level details, making it easier for developers to ex-
press their ideas in a concise and readable manner. Built-in Data Structures: Python
provides high-level built-in data structures like lists, dictionaries, and sets. These struc-
tures simplify common programming tasks.
4.Dynamic Typing and Dynamic Binding: Python allows you to change variable types
during runtime, making it flexible. Dynamic binding means that method calls are resolved
at runtime, enhancing code modularity.
5.Rapid Application Development (RAD): Python’s simplicity and expressiveness make
it ideal for RAD. You can quickly prototype and develop applications.
6.Scripting and Glue Language: Python is commonly used for scripting tasks (automating
repetitive processes) and as a glue language (connecting existing components together).
7.Readability: Python emphasizes readability with its clean and straightforward syntax.
This readability reduces the cost of program maintenance.
8.Debugging: Python’s error handling is robust. Instead of causing segmentation faults,
errors raise exceptions. Debugging is straightforward, and Python’s introspective power
allows for effective debugging tools.
3.Internet Connection: Optional for installing dependencies and accessing online re-
sources. However, an internet connection is not mandatory for the plagiarism checker to
function offline.
8
Chapter 5
Source Code
import t k i n t e r as tk
from t k i n t e r import t t k
from t k i n t e r . f o n t import Font
from t k i n t e r . f i l e d i a l o g import a s k o p e n f i l e n a m e , a s k o p e n f i l e n a m e s
from t k i n t e r import messagebox
import s t r i n g
def o p e n f i l e ( ) :
f i l e p a t h = askopenfilename ( f i l e t y p e s =[(” Text F i l e s ” , ” ∗ . t x t ” , ” . pdf ” )
(” All F i l e s ” , ” ∗ . ∗ ” ) ]
)
i f not f i l e p a t h :
return
text . clear ()
text gui . clear ()
t e x t f i l e =open ( f i l e p a t h , ’ r ’ , e n c o d i n g=”u t f −8”)
t e x t l i n e s= t e x t f i l e . r e a d l i n e s ( )
w o r d i d=0
for l i n e in t e x t l i n e s :
f o r word i n line . split ():
9
Side By Side Plagiarism Checker 2024-25
def o p e n f i l e s ( ) :
f i l e p a t h s = a s k o p e n f i l e n a m e s ( f i l e t y p e s =[(” Text F i l e s ” , ” ∗ . t x t ” ) , ( ” A
i f not f i l e p a t h s :
return
db gui . c l e a r ()
db text . clear ()
f i l e i d =0
f i l e d i c t . clear ()
for f i l e p a t h in f i l e p a t h s :
d b t e x t . append ( [ ] )
d b g u i . append ( [ ] )
f i l e n a m e=f i l e p a t h . s p l i t ( ’ / ’ ) [ len ( f i l e p a t h . s p l i t ( ’/ ’)) −1 ]
f i l e d i c t [ f i l e n a m e ]= f i l e i d
f i l e =open ( f i l e p a t h , ’ r ’ , e n c o d i n g =’ u t f −8 ’)
t e x t l i n e s= f i l e . r e a d l i n e s ( )
w o r d i d=0
for l i n e in t e x t l i n e s :
f o r word i n l i n e . s p l i t ( ) :
d b g u i [ f i l e i d ] . append ( [ word , 0 , F a l s e ] )
d b t e x t [ f i l e i d ] . append ( ( word . t r a n s l a t e ( s t r . maketrans ( ’ ’ ,
w o r d i d+=1
d b g u i [ f i l e i d ] [ word id −1][2]= True
f i l e . close ()
f i l e i d +=1
file menu [ ’ values ’ ] = [ ∗ f i l e d i c t ]
b t n g o . c o n f i g ( s t a t e =”normal ” )
file menu . set ([∗ f i l e d i c t ] [ 0 ] )
printer (0)
d e f check ( ) :
window . c o n f i g ( c u r s o r =”watch ” )
if ( not t e x t or not d b t e x t ) :
messagebox . s h o w e r r o r ( ” E r r o r ” , ” S e l e c t r e q u i r e d f i l e s t o compare ” )
window . c o n f i g ( c u r s o r =””)
return 1
N=i n t ( N menu . g e t ( ) )
n=N−1;
check text =[]
f o r i i n range (N−1):
c h e c k t e x t . append ( t e x t [ i ] )
t e x t l e n=l e n ( t e x t )
#### S e a r c h e s d b t e x t f o r e v e r y N c o n s e c u t i v e words i n t e x t ####
f o r i d i n range ( t e x t l e n − n ) :
c h e c k t e x t . append ( t e x t [ i d+ n ] )
f o r f i l e i d i n range ( l e n ( d b t e x t ) ) :
f o r j i n range ( l e n ( d b t e x t [ f i l e i d ] ) − n ) :
i f c h e c k t e x t==d b t e x t [ f i l e i d ] [ j : j+N ] :
f o r k i n range (N) :
t e x t g u i [ i d+k ] [ 1 ] = True
d b g u i [ f i l e i d ] [ j+k ] [ 1 ] = 1
c h e c k t e x t . pop ( 0 )
b t n s e a r c h . c o n f i g ( s t a t e =”normal ” )
printer (0)
p r i n t e r ( −1)
window . c o n f i g ( c u r s o r =””)
##### C l e a n i n g GUI v a l u e s #####
f o r i d i n range ( t e x t l e n − n ) :
text gui [ id ][1]= False
f o r f i l e i d i n range ( l e n ( d b g u i ) ) :
f o r j i n range ( l e n ( d b g u i [ f i l e i d ] ) ) :
db gui [ f i l e i d ] [ j ][1]=0
return 0
def search ( ) :
s e l e c t e d t e x t l i s t =[ ( x . t r a n s l a t e ( s t r . maketrans ( ’ ’ , ’ ’ , s t r i n g . punctua
) . lower ()
for x in s e l e c t e d t e x t . s p l i t () ]
n=l e n ( s e l e c t e d t e x t l i s t )
f o r f i l e i d i n range ( l e n ( d b t e x t ) ) :
f o r j i n range ( l e n ( d b t e x t [ f i l e i d ] ) − ( n−1 ) ) :
i f s e l e c t e d t e x t l i s t==d b t e x t [ f i l e i d ] [ j : j+n ] :
f o r k i n range ( n ) :
d b g u i [ f i l e i d ] [ j+k ] [ 1 ] = 2
printer ( f i l e i d )
return 0
messagebox . s h o w e r r o r ( ” E r r o r : Not Found ! ” , ” P l e a s e s e l e c t d i f f e r e n t t e x
return 1
def printer ( f i l e i d ) :
i f f i l e i d == −1:
m a i n t e x t w i d g e t . d e l e t e ( ” 1 . 0 ” , tk .END)
t e x t l e n=l e n ( t e x t )
p l a g c o u n t=0
## ∗∗∗ P r i n t i n g th e t e x t t o main t e x t widget ∗∗∗ ##
f o r word i n range ( t e x t l e n ) :
i f t e x t g u i [ word ][1]== F a l s e :
m a i n t e x t w i d g e t . i n s e r t ( tk .END, t e x t g u i [ word ] [ 0 ] + ” ” , ’ no
else :
m a i n t e x t w i d g e t . i n s e r t ( tk .END, t e x t g u i [ word ] [ 0 ] +” ” , ’ p
p l a g c o u n t+=1
i f t e x t g u i [ word ][2]== True :
m a i n t e x t w i d g e t . i n s e r t ( tk .END, ”\n ” , ’ normal ’ )
p r a t i o =( p l a g c o u n t / t e x t l e n )∗100
r a t i o [ ’ value ’ ] = p r a t i o
r a t i o l a b e l . c o n f i g ( t e x t=f ” P l a g i a r i s m Ratio : { p r a t i o : . 2 f }%”)
else :
c o m p t e x t w i d g e t . d e l e t e ( ” 1 . 0 ” , tk .END)
l e n n=l e n ( d b g u i [ f i l e i d ] )
f o r word i n range ( l e n n ) :
i f d b g u i [ f i l e i d ] [ word ] [ 1 ] = = 0 :
c o m p t e x t w i d g e t . i n s e r t ( tk .END, d b g u i [ f i l e i d ] [ word ] [ 0 ] + ” ”
e l i f d b g u i [ f i l e i d ] [ word ] [ 1 ] = = 1 :
c o m p t e x t w i d g e t . i n s e r t ( tk .END, d b g u i [ f i l e i d ] [ word ] [ 0 ] +”
else :
c o m p t e x t w i d g e t . i n s e r t ( tk .END, d b g u i [ f i l e i d ] [ word ] [ 0 ] +”
if d b g u i [ f i l e i d ] [ word ] [ 2 ] = = 1 :
c o m p t e x t w i d g e t . i n s e r t ( tk .END, ”\n ” , ’ normal ’ )
return 0
N=3
text =[]
text gui =[]
db text =[]
db gui =[]
f i l e d i c t ={}
# Main window
window = tk . Tk ( )
window . t i t l e ( ” P l a g i a r i s m Check ” ) # Se t window ’ s t i t l e
# # Main window c o n f .
window . r o w c o n f i g u r e ( 0 , m i n s i z e =800 , weight =1)
# # L e f t s i d e ( Button Frame )
b t n f r a m e = tk . Frame ( window , r e l i e f =tk . RAISED, bd=4)
## Right s i d e Button w i d g e t s
b t n f i l e = tk . Button ( btn frame , t e x t =”Import Text F i l e ” , command=o p e n f
b t n f i l e s = tk . Button ( btn frame , t e x t =”Import Data F i l e s ” , command=o p e n f
# # # N S e l e c t i n g Label
N l a b e l=tk . Label ( btn frame , t e x t =” S e l e c t L e v e l ” )
# # # N Selecting default int variable
d e f a u l t N = tk . IntVar ( window )
default N . set (3)
# # # N S e l e c t i n g Combobox
N menu = t t k . Combobox ( btn frame , s t a t e =”r e a d o n l y ” , t e x t v a r i a b l e= d e f a u l t
N menu [ ’ v a l u e s ’ ] = [ 3 ]
b t n c h e c k = tk . Button ( btn frame , t e x t =”Check ” , command=check )
b t n s e a r c h = tk . Button ( btn frame , s t a t e =” d i s a b l e d ” , t e x t =”Search ” ,
command= s e a r c h )
# # L e f t t e x t frame ( Main t e x t )
f i r s t f r a m e=tk . Frame ( window , r e l i e f =tk . RAISED, bd=1)
# # # Frame ’ s row−column c o n f i g
f i r s t f r a m e . r o w c o n f i g u r e ( 2 , weight =1)
f i r s t f r a m e . c o l u m n c o n f i g u r e ( 0 , weight =1)
# # # P l a g i a r i s m Ratio Label
r a t i o l a b e l=tk . Label ( f i r s t f r a m e , t e x t =”P l a g i a r i s m Ratio ” )
# # # P l a g i a r i s m Ratio Bar
r a t i o =t t k . P r o g r e s s b a r ( f i r s t f r a m e , l e n g t h =100 , mode=’ d e t e r m i n a t e ’ )
# # # Main t e x t widget
# # # # Placement # # # #
# Main Frames
N l a b e l . g r i d ( row=4, column =0, s t i c k y =”ew ” , padx=5, pady=10)
N menu . g r i d ( row = 5 , column = 0 , s t i c k y =”ew ” , padx=5, pady=10)
b t n f r a m e . g r i d ( row=0, column =0, s t i c k y =”ns ” )
Snapshots
6.1 Home
17
Side By Side Plagiarism Checker 2024-25
6.4 Result
20
Bibliography
21