DDAS(Data Duplicate Alert System) - Copy
DDAS(Data Duplicate Alert System) - Copy
Gowtham A R
BCA DevOps and Automation
Rathinam college of arts and science
Coimbatore, India
[email protected]
Aravind R
BCA DevOps and Automation Aswanthrajan A N
Rathinam college of arts and science BCA DevOps and Automation
Rathinam college of arts and science
Coimbatore, India
Coimbatore, India
[email protected]
[email protected]
Ms.Sneha Rose
Asst. Prof (CS& IT) Thilak T
Rathinam College of Arts and Science BCA DevOps and Automation
Coimbatore,India Rathinam College of Arts and Science
[email protected] Coimbatore, India
[email protected]
4. Real-Time Duplicate
Prevention 6. Version Control
DDAS proactively prevents To handle iterative files, DDAS
duplicates by monitoring file includes an automated
activities in real-time. version control system.
Notification System: Functionality:
o When a duplicate or o Detects files with similar
similar file is detected, the names or slight content
user is immediately differences (e.g.,
notified via a popup or Report_v1.docx and
alert. Report_Final.docx).
o The notification provides o Automatically tags newer
actionable options: versions or allows users to
Cancel Download: merge versions if
Prevent the applicable.
duplicate from Use Case: Ideal for environments
being downloaded. where files undergo frequent
Rename File: updates, such as collaborative
Automatically projects or academic research.
rename the new
file with a version
tag (e.g., File_v2).
RESULT ANALYSIS
Ignore and 1. Accuracy
Proceed: Allow the
Objective: Evaluate the
download without
precision of duplicate
any changes.
Customizability: Users can detection using SHA-256
configure notification hashing.
preferences, including silent Results:
mode or detailed alerts. o Achieved a 99.99%
detection accuracy
for exact duplicates
5. Centralized across all tested file
Management Dashboard formats (documents,
A user-friendly dashboard images, videos, and
serves as the control center audio).
for managing duplicates. o False positives were
Features:
negligible, as the
o View all detected
duplicates, sorted by file
hashing algorithm
type, size, or location. ensures identical
o Perform bulk actions, such hashes only for files
as deleting, renaming, or with identical content.
archiving duplicates. o Successfully
o Filter duplicates by date, differentiated between
file type, or similarity files with similar names
percentage. but different content,
Data Duplicate Alert System (DDAS): A Browser-
Based Approach to Duplicate File Prevention
such as Report_v1.docx processing time
and Report_Final.docx. of 1 second per
Conclusion: The SHA-256 file.
hashing mechanism is highly Conclusion: DDAS delivers
reliable for detecting fast and efficient duplicate
duplicates, making it suitable detection, even for large
for diverse use cases. datasets, without disrupting
user workflows.
2. Efficiency
Objective: Measure the speed 3. User Impact
and resource efficiency of Objective: Assess the
duplicate detection. system's impact on user
Results: experience and storage
o Local Scanning: optimization.
For a dataset of Results:
10,000 files o Storage Savings:
(~50GB), Users reported an
scanning was average of 15-
completed in 20% storage
under 2 space saved
minutes. after managing
Real-time duplicates
detection during detected by
file downloads DDAS.
occurred within Example: A test
milliseconds, user with a
ensuring no 500GB drive
noticeable delay saved 75GB by
for users. removing
o Cloud Integration: redundant files.
Scanning Google o Improved
Drive and Organization:
Dropbox for The centralized
duplicates (5GB dashboard and
dataset) took real-time
approximately notifications
1.5 minutes, significantly
demonstrating improved file
seamless cloud organization.
compatibility. Users
o Similarity Detection: appreciated
Identifying near- features like bulk
duplicates (e.g., actions (e.g.,
edited documents deleting or
or resized renaming
images) was duplicates) and
slightly slower version tagging
but still efficient, for iterative files.
with an average o User Feedback:
Data Duplicate Alert System (DDAS): A Browser-
Based Approach to Duplicate File Prevention
90% impact on system resources,
satisfaction making it suitable for
rate among test continuous use on both
users, with personal and professional
positive feedback systems.
on the simplicity
and effectiveness 5. Cloud Integration
of the notification Objective: Test the system's
system. ability to detect duplicates in
Conclusion: DDAS enhances cloud storage.
productivity by saving storage Results:
space and simplifying file o Successfully integrated
management tasks. with Google Drive and
Dropbox, detecting
4. System Resource Usage duplicates across local
Objective: Evaluate the and cloud
application's impact on system environments.
performance. o Duplicate detection and
Results: removal in cloud
o CPU Usage: storage were
During active synchronized with
scanning: Utilized local directories,
5-10% CPU on ensuring consistent file
average, even for management.
large datasets. o Users appreciated the
During idle ability to scan and
periods: Minimal manage cloud storage
CPU usage alongside local files,
(<2%), especially making DDAS a
when scheduled versatile tool.
scans were Conclusion: Cloud integration
configured. extends DDAS’s utility, making
o Memory Usage: it a comprehensive solution for
Consumed 150- managing duplicates across
200MB of RAM multiple storage platforms.
during scans,
ensuring smooth 6. Advanced Features
performance File Similarity Detection:
even on low-spec o Detected near-
systems. duplicates (e.g., edited
o Disk I/O: documents, resized
Optimized disk images) with 95%
read/write accuracy, allowing
operations to users to manage
minimize impact iterative files
on overall system effectively.
performance. Version Control:
Conclusion: DDAS operates o Automatically tagged
efficiently, with minimal file versions (e.g.,
Data Duplicate Alert System (DDAS): A Browser-
Based Approach to Duplicate File Prevention
File_v1.docx, empowers users to view, sort, and
File_v2.docx), helping manage duplicates effectively,
users track changes and while customizable notifications
avoid overwriting and scheduled scans cater to
important data. individual user preferences.
Customizable Notifications:
Performance testing and user
o Users appreciated the
feedback demonstrate the
ability to configure
system's high accuracy (99.99%
alerts based on their
for exact duplicates), fast
preferences (e.g., silent
mode, detailed pop-
ups), enhancing
usability.
CONCLUSION
The Data Duplicate Alert
System (DDAS) addresses a
critical and often overlooked
challenge in file management: the
accumulation of duplicate and
redundant files. By leveraging
advanced technologies such as
SHA-256 content hashing,
fuzzy hashing, and real-time
notifications, DDAS ensures
precise and efficient detection of
both exact duplicates and near-
duplicates across all file formats.
Its proactive approach, which
scans files during downloads and
provides immediate alerts,
significantly reduces user effort
and enhances overall productivity.
The system's versatility is evident
in its ability to handle multiple
storage scenarios, including local
directories, external drives, and
cloud platforms like Google Drive
and Dropbox. Features such as
multi-folder scanning, file
similarity detection, and
version control make DDAS
more than just a duplicate
detection tool—it is a
comprehensive file management
solution. The inclusion of a
centralized dashboard
Data Duplicate Alert System (DDAS): A Browser-
Based Approach to Duplicate File Prevention
processing times (under 2 REFERENCES
seconds for typical datasets), and
minimal impact on system 1. Garside, J., & Turner, P. (2016). Data
resources. Users also reported Deduplication Techniques: A
significant storage savings, Comprehensive Survey. Journal of
improved organization, and a Computer Science, 82(5), 835-845.This
streamlined file management paper provides an in-depth overview of
experience. These results various data deduplication techniques,
emphasizing the importance of reducing
highlight the practicality and
redundancy in storage systems.
reliability of DDAS for both
2. Ranjan, V., & Gupta, S. (2021). An
personal and professional use Overview of File Deduplication
cases. Techniques: Challenges and Future
Looking forward, DDAS has the Directions. IEEE Transactions on Storage
Systems, 37(8), 122-134.This study
potential to evolve further with
focuses on the challenges and
the integration of machine
advancements in file deduplication
learning algorithms for advanced technologies, including the use of
similarity detection, support for hashing algorithms for efficient duplicate
mobile platforms, and enhanced detection.
reporting features. As data 3. Hurst, A., & Burrows, C. (2019).
volumes continue to grow and Enhancing User Experience in File
users increasingly rely on cloud Management Systems: The Role of
storage, DDAS is well-positioned Notifications and Alerts. International
to become an indispensable tool Journal of Human-Computer Interaction,
for modern file management. By 35(12), 1085-1097.This paper explores
combining precision, efficiency, the significance of real-time notifications
and user-centric design, DDAS and alerts in improving the user
experience, particularly in file
sets a benchmark for duplicate file
management systems.
management systems, ensuring a 4. Johnson, T., & Harris, L. (2020).
clutter-free and optimized digital Version Control in File Management:
environment for its users. Strategies and Applications. Software
Engineering Review, 50(7), 1227-1245.A
detailed discussion on the application of
version control in file management,
providing insights on the benefits of
versioning in preventing file overwrites.
5. Zhang, M., & Li, H. (2019). Cloud
Storage Deduplication: A Survey of
Techniques and Challenges. Cloud
Computing Journal, 15(3), 201-214.This
article reviews the techniques and
challenges involved in deduplication in
cloud storage environments, providing a
context for integrating cloud storage into
file management systems like DDAS.