Documentation - Secure Distributed Deduplcation
Documentation - Secure Distributed Deduplcation
Report on
<<>>
Thesis submitted in partial fulfillment of the requirement for the award of the degree of
Master of Technology
In
<<Branch Name>>
Submitted By
<< Name of the Student >>
<< Hall Ticket No >>
CERTIFICATE
name>> bearing <<Roll number>> in partial fulfilment of the requirement for the award of the
degree of M.Tech in << Branch Name >> to the <<College Name>>, <<University Name >> is a
record of bonafide work carried out by him/her under my guidance. The results presented in this
thesis have been verified and are found to be satisfactory. The results embodied in this thesis have not
been submitted to any other University for the award of any other degree or diploma.
EXTERNAL PRINCIPAL
Next page: This page is required for the candidates who pursue the project work outside (other
organization), under the supervision of an EXTERNAL GUIDE. This page is another certificate
given on the organization letterhead where the project work is pursued, which should be certified
and duly signed by the External guide about work done by the candidate and clearly mentioning
the duration of the project work. The format for this page will almost be the same as the previous
page with double spacing between the lines.
(or)
Include the Declaration page in the below specified Format:
DECLARATION
<<College Name >>, would like to declare that the project titled “<<project title>>” a partial
fulfillment of M.Tech Degree course of << University Name >> is my original work in the year
2015 under the guidance of << Guide Name>>, <<Designation>> of the << Department
(Roll Number)
Acknowledgement
I thank H.O.D <<HOD Name>> for his effort and guidance and all senior faculty
members of CSE Department for their help during my course. Thanks to programmers and non-
teaching staff of C.S.E Department of VITS.
Finally Special thanks to my parents for their support and encouragement throughout my
life and this course. Thanks to all my friends and well wishers for their constant support.
Seventh page: The seventh page may contain an ABSTRACT of the Thesis. The candidates
may emphasize here his/her contributions.
Page 8th, 9th …..: In these pages candidate must provide a TABLE OF CONTENTS, list of
tables and list of figures, Screens and notation.
NOTE: All the above pages are to be numbered in Roman numerals of lower case.
Arrangement of chapters: The following is the suggested format for arranging the Thesis
matter into various chapters:
1) Introduction ( Do not copy as it ease from Base Paper)
2) Literature Survey ( Quantity should be 10 – 15 Pages)
3) SRS ( Preferably IEEE Format- Do not include Technology Desciption))
4) Analysis and Design (Preferably ER-Diagrams & DFD)
5) Implementation ( Architecture, Algorithms used, Module detailed desc)
6) Testing (Min. 10 test caes)
7) Results (Navigation should be provided)
8) Conclusion and Future Scope (In two paragraphs)
9) Reference / Bibliography (Format is given below)
10) Appendices (if any)
a) Standards (if any)
b) Sample Code(Optional)
Graphs: The graph should clearly indicate the pinots which are used for drawing the curve or
curves. All the letters in the graph should be written with stencils.
Bibliography or References: The following format may be used for writing the
Bibliography /References.
` At the end of Thesis where the listing of references is done, the list should be made
strictly in alphabetic order of the names of the author. The references/websites have to be listed
in the following format:
[S.No] Author, Paper/Book, Publisher/Magazine/Conference, Volume/Edition, page numbers,Year
Examples:
[1] Bruce, Cryptography, Tata McGraw Hill, 1978
[2] R. R. Duncan, "Remediation of Lead in Water Supplies," IEEE Trans. Microwave Theory
Tech.,vol. 99, no. 18, pp. 257-278, Nov. 1986.
[3] http://www.google.com
Paper, Typing and Format: Bond paper should be used for the preparation of the Thesis.
Typing should be done on the 12 point size letters for the running text, 14 point size for the sub-
headings and 16 point size for main headings/titles/chapter names/etc. The font type should be
preferably TIMES NEW ROMAN.
The layout margin to be provided is 1.5’ on the left, 1’ on the top and bottom, 1’ on the right.
Fresh paragraph should commence after tab space. 1.5 line spacing shall be provided throughout
the Thesis.
The page number shall be indicated at the bottom right of the each page. Front Pages Small
Roman Numbers (Excluding title page, Certificate page, Acknowledgement page), Body pages
1,2,3 ………., Annexure 1,2,3……….. (Separate for each Annexure).
Binding: The Thesis shall be properly bound using resin of Cement gray color (Cement
Color) for M.Tech. The bound front cover should indicate in golden embossed letter the
following:
1. INTRODUCTION
With the explosive growth of digital data, deduplication techniques are widely employed to
backup data and minimize network and storage overhead by detecting and eliminating
redundancy among data. Instead of keeping multiple data copies with the same content,
deduplication eliminates redundant data by keeping only one physical copy and referring other
redundant data to that copy. Deduplication has received much attention from both academia and
industry because it can greatly improves storage utilization and save storage space, especially for
the applications with high deduplication ratio such as archival storage systems. A number of
deduplication systems have been proposed based on various deduplication strategies such as
client-side or server-side deduplication, file-level or block-level deduplication. A brief review is
given in Section 6. Especially, with the advent of cloud storage, data deduplication techniques
become more attractive and critical for the management of ever-increasing volumes of data in
cloud storage services which motivates enterprises and organizations to outsource data storage to
third-party cloud providers, as evidenced by many real-life case studies. According to the
analysis report of IDC, the volume of data in the world is expected to reach 40 trillion gigabytes
in 2020. Today’s commercial cloud storage services, such as Dropbox, Google Drive and Mozy,
have been applying deduplication to save the network bandwidth and the storage cost with client-
side deduplication. There are two types of deduplication in terms of the size: (i) file-level
deduplication, which discovers redundancies between different files and removes these
redundancies to reduce capacity demands, and (ii) blocklevel deduplication, which discovers and
removes redundancies between data blocks. The file can be divided into smaller fixed-size or
variable-size blocks. Using fixed size blocks simplifies the computations of block boundaries,
while using variable-size blocks (e.g., based on Rabin fingerprinting) provides better
deduplication efficiency. Though deduplication technique can save the storage space for the
cloud storage service providers, it reduces the reliability of the system. Data reliability is actually
a very critical issue in a deduplication storage system because there is only one copy for each file
stored in the server shared by all the owners. If such a shared file/chunk was lost, a
disproportionately large amount of data becomes inaccessible because of the unavailability of all
the files that share this file/chunk. If the value of a chunk were measured in terms of the amount
of file data that would be lost in case of losing a single chunk, then the amount of user data lost
when a chunk in the storage system is corrupted grows with the number of the commonality of
the chunk. Thus, how to guarantee high data reliability in deduplication system is a critical
problem. Most of the previous deduplication systems have only been considered in a single-
server setting. However, as lots of deduplication systems and cloud storage systems are intended
by users and applications for higher reliability, especially in archival storage systems where data
are critical and should be preserved over long time periods. This requires that the deduplication
storage systems provide reliability comparable to other high-available systems. Furthermore, the
challenge for data privacy also arises as more and more sensitive data are being outsourced by
users to cloud. Encryption mechanisms have usually been utilized to protect the confidentiality
before outsourcing data into cloud. Most commercial storage service provider is reluctant to
apply encryption over the data because it makes deduplication impossible. The reason is that the
traditional encryption mechanisms, including public key encryption and symmetric key
encryption, require different users to encrypt their data with their own keys. As a result, identical
data copies of different users will lead to different ciphertexts. To solve the problems of
confidentiality and deduplication, the notion of convergent encryption has been proposed and
widely adopted to enforce data confidentiality while realizing deduplication. However, these
systems achieved confidentiality of outsourced data at the cost of decreased error resilience.
Therefore, how to protect both confidentiality and reliability while achieving deduplication in a
cloud storage system is still a challenge.
The Farsite distributed file system provides availability by replicating each file onto
multiple desktop computers. Since this replication consumes significant storage space, it is
important to reclaim used space where possible. Measurement of over 500 desktop file systems
shows that nearly half of all consumed space is occupied by duplicate files. We present a
mechanism to reclaim space from this incidental duplication to make it available for controlled
file replication. Our mechanism includes 1) convergent encryption, which enables duplicate files
to coalesced into the space of a single file, even if the files are encrypted with different users’
keys, and 2) SALAD, a SelfArranging, Lossy, Associative Database for aggregating file content
and location information in a decentralized, scalable, fault-tolerant manner. Large-scale
simulation experiments show that the duplicate-file coalescing system is scalable, highly
effective, and fault-tolerant.
Farsite is a distributed file system that provides security and reliability by storing
encrypted replicas of each file on multiple desktop machines. To free space for storing these
replicas, the system coalesces incidentally duplicated files, such as shared documents among
workgroups or multiple users’ copies of common application programs. This involves a
cryptosystem that enables identical files to be coalesced even if encrypted with different keys, a
scalable distributed database to identify identical files, a file-relocation system that co-locates
identical files on the same machines, and a single-instance store that coalesces identical files
while retaining separate-file semantics. Simulation using file content data from 585 desktop file
systems shows that the duplicate-file coalescing system is scalable, highly effective, and fault-
tolerant.
Cloud storage service providers such as Dropbox, Mozy, and others perform
deduplication to save space by only storing one copy of each file uploaded. Should clients
conventionally encrypt their files, however, savings are lost. Message-locked encryption (the
most prominent manifestation of which is convergent encryption) resolves this tension. However
it is inherently subject to brute-force attacks that can recover files falling into a known set. We
propose an architecture that provides secure deduplicated storage resisting brute-force attacks,
and realize it in a system called DupLESS. In DupLESS, clients encrypt under message-based
keys obtained from a key-server via an oblivious PRF protocol. It enables clients to store
encrypted data with an existing service, have the service perform deduplication on their behalf,
and yet achieves strong confidentiality guarantees. We show that encryption for deduplicated
storage can achieve performance and space savings close to that of using the storage service with
plaintext data.
We studied the problem of providing secure outsourced storage that both supports
deduplication and resists brute-force attacks. We design a system, DupLESS, that combines a
CE-type base MLE scheme with the ability to obtain message-derived keys with the help of a
key server (KS) shared amongst a group of clients. The clients interact with the KS by a protocol
for oblivious PRFs, ensuring that the KS can cryptographically mix in secret material to the per-
message keys while learning nothing about files stored by clients. These mechanisms ensure that
DupLESS provides strong security against external attacks which compromise the SS and
communication channels (nothing is leaked beyond file lengths, equality, and access patterns),
and that the security of DupLESS gracefully degrades in the face of comprised systems. Should a
client be compromised, learning the plaintext underlying another client’s ciphertext requires
mounting an online bruteforce attacks (which can be slowed by a rate-limited KS). Should the
KS be compromised, the attacker must still attempt an offline brute-force attack, matching the
guarantees of traditional MLE schemes. The substantial increase in security comes at a modest
price in terms of performance, and a small increase in storage requirements relative to the base
system. The low performance overhead results in part from optimizing the client-to-KS OPRF
protocol, and also from ensuring DupLESS uses a low number of interactions with the SS. We
show that DupLESS is easy to deploy: it can work transparently on top of any SS implementing
a simple storage interface, as shown by our prototype for Dropbox and Google Drive.
Prior definitions and schemes for message-locked encryption (MLE) admit only an
adversary who is oblivious to the scheme’s public parameters during the initial interaction. We
explore two avenues for extending security guarantees of MLE towards a more powerful
adversarial model, where the distribution of plaintexts can be correlated with the scheme’s
parameters (lock-dependent messages). In our first construction we augment the definition of
MLE to allow fully random ciphertexts by supporting equality-testing functionality. One
challenging aspect of the construction is ensuring ciphertext consistency in the presence of
random oracles without inflating the length of the ciphertext. We achieve this goal via a
combination of a cut-and-choose technique and NIZKs. The resulting scheme is secure against a
fully adaptive adversary. Our second construction assumes a predetermined bound on the
complexity of distributions specified by the adversary. It fits the original framework of
deterministic MLE while satisfying a stronger security notion.
Ramp secret sharing (SS) schemes can be classified into strong ramp SS schemes and
weak ramp SS schemes. The strong ramp SS schemes do not leak out any part of a secret
explicitly even in the case where some information about the secret leaks from a non-qualified
set of shares, and hence, they are more desirable than weak ramp SS schemes. However, it is not
known how to construct the strong ramp SS schemes in the case of general access structures. In
this paper, it is shown that a strong ramp SS scheme can always be constructed from a SS
scheme with plural secrets for any feasible general access structure. As a byproduct, it is pointed
out that threshold ramp SS schemes based on Shamir’s polynomial interpolation method are not
always strong.
In this paper, we defined PD ramp SS schemes with general access structures, in addition
to weak and strong ramp SS schemes. From the viewpoint of strong ramp SS schemes, it is
pointed out that (k, L, n)-threshold ramp SS schemes based on Shamir’s interpolation method are
not always secure. We also proposed how to construct strong ramp SS schemes with general
access structures. It is shown by our method that strong ramp SS schemes can always be
constructed by a transform matrix T from PD ramp SS schemes without loss of coding rates. We
also clarified the necessary and sufficient condition of the matrix T to realize strong ramp SS
schemes.
Fault-Tolerance and Load-Balance Tradeoff in a Distributed Storage System
In recent years distributed storage systems have been the object of increasing interest by
the research community. They promise improvements on information availability, security and
integrity. Nevertheless, at this point in time, there is no a predominant approach, but a wide
spectrum of proposals in the literature. In this paper we report our findings with a combination of
redundancy techniques intended to simultaneously provide fault tolerance and load balance in a
small-scale distributed storage system. Based on our analysis, we provide general guidelines for
system designers and developers under similar conditions.
In this work we presented a performance study intended to evaluate the mean time to
failure of a distributed storage system. We tested a particular approach that makes use of both
space and information redundancy. An advantage of this combination stands on the fact that both
are parameterized techniques, therefore, they allow us to experiment with different amounts of
redundant resources. System operation can be briefly described as follows. A set of autonomous
stations with storage capacities, called storage nodes, is connected through a fast Ethernet switch.
Initially nodes are classified in active or spare nodes. Subsets of active nodes, called committees,
are scheduled to work according to a distributed procedure called storage scheme. The
committees that make up our proposal are all the subsets of active nodes having a fixed size.
When a storage request arrives at the system, a given committee is called according to a fixed
and cyclic order. A file to be stored is transformed into a certain number of dispersals using
Rabin´s IDA. Each member of the selected committee is in charge of storing one of the resulting
dispersals. Recall that the original file, or any of its dispersals, can be rebuilt provided that a
given amount of redundant information remains available. If an active node crashes, two actions
take place. First, a distributed control starts the recovery procedure using the surviving active
nodes. Then, one spare replaces then missing element and stores the recovered information.
Second, the missing element undergoes a repair procedure. Once it becomes operational again, it
is regarded as a spare node. We tested the effect of 4 different parameters on the system’s
performance. These parameters are the number of active nodes, the number of spares, the
individual node’s lifetime and repair time. Our study mainly shows that the number of active
components defines a compromise between load balance and the overall failure rate. As for
spares, they are important up to a certain operational point where their availability compensates
the repair procedure. Beyond this point, an excess of spares does not pay back any further
improvement. Nevertheless, the most influential parameter turned out to be the node’s lifetime.
Also, it is worth mentioning that even under the worst combination of parameters, our design
renders a mean time to failure longer than the summation of individual lifetimes.
Dekey Secure Deduplication with Efficient and Reliable Convergent Key Management
Using Bowfish Algorithm
Data deduplication is a technique for eradicate reproduction copy of data and has been
extensively used in cloud storage to reduce storage liberty and upload bandwidth. Hopeful as it is
a happen challenge is to perform protected deduplication in cloud storage space. Even though
convergent encryption for comprehensively implement for secure deduplication a dangerous
problem of making convergent encryption realistic is to professionally and dependably handle a
huge number of convergent keys. This paper makes the first effort to formally address the
problem of accomplish efficient and dependable key management in secure reduplicate. We first
establish a baseline come within reach of user clutch an independent master key for encrypting
the convergent keys and outsourcing them to the cloud. However such a baseline key
organization scheme generates a huge number of keys with the growing number of users and
necessitates users to dedicatedly shelter the keys. We proposition dekey a new production in
which users do not need to administer any keys on their own but as an alternative securely
allocate the convergent key contribute to across several servers. Security investigations
demonstrate that Dekey is secure in terms of the description specified in the projected
precautions. A verification of concept we implement Dekey using the secret sharing scheme and
display that Dekey deserves limited overhead in realistic location.
The duplicates copy of the original data is avoided using this secure deduplication with
efficient and reliable convergent key management that is achieved using Bowfish algorithm.
We make a simple and yet important observation that common XOR operations should be
computed first in XOR-based coding. We describe the OXC problem and make a conjecture
about its complexity. Two greedy approaches are proposed, which effectively show that XOR-
based Reed-Solomon codes with optimization can be as efficient and sometimes even more
efficient than the best known specifically designed XORbased codes. Moreover, XOR-based
Reed-Solomon codes with optimization are likely suitable for large scale production storage
systems with higher redundancy requirements.
Today’s cloud storage services must offer storage reliability and fast data retrieval for
large amount of data without sacrificing storage cost. We present SEARS, a cloud-based storage
system which integrates erasure coding and data deduplication to support efficient and reliable
data storage with fast user response time. With proper association of data to storage server
clusters, SEARS provides flexible mixing of different configurations, suitable for real-time and
archival applications. Our prototype implementation of SEARS over Amazon EC2 shows that it
outperforms existing storage systems in storage efficiency and file retrieval time. For 3 MB files,
SEARS delivers retrieval time of 2.5 s compared to 7 s with existing systems.
We describe the design and implementation of a space efficient, data reliable and fast
retrieving cloud-based storage system SEARS which integrates data deduplication and erasure
coding. SEARS provides a flexible combination of various binding schemes to associate server
nodes with data to be stored at different level based on application needs. Evaluation over
Amazon EC2 shows that SEARS outperforms related systems with lower storage usage while
ensuring fast and reliable data access.
We present convergent dispersal, which supports keyless security and deduplication for
cloud-of-clouds storage. Its main idea is to replace random information with deterministic
cryptographic hash information derived from original data. We construct two convergent
dispersal algorithms, namely CRSSS and CAONT-RS. We analyze their deduplication
efficiencies and evaluate their performance via various parameter choices. In future work, we
plan to fully implement and evaluate convergent dispersal in a real-life dispersed setting.
A Secure Cloud Backup System with Assured Deletion and Version Control
Cloud storage is an emerging service model that enables individuals and enterprises to
outsource the storage of data backups to remote cloud providers at a low cost. However, cloud
clients must enforce security guarantees of their outsourced data backups. We present
FadeVersion, a secure cloud backup system that serves as a security layer on top of today’s cloud
storage services. FadeVersion follows the standard version-controlled backup design, which
eliminates the storage of redundant data across different versions of backups. On top of this,
FadeVersion applies cryptographic protection to data backups. Specifically, it enables fine-
grained assured deletion, that is, cloud clients can assuredly delete particular backup versions or
files on the cloud and make them permanently inaccessible to anyone, while other versions that
share the common data of the deleted versions or files will remain unaffected. We implement a
proof-of-concept prototype of FadeVersion and conduct empirical evaluation atop Amazon S3.
We show that FadeVersion only adds minimal performance overhead over a traditional cloud
backup service that does not support assured deletion.
We present the design and implementation of FadeVersion, a system that provides secure
and cost effective backup services on the cloud. FadeVersion is designed for providing assured
deletion for remote cloud backup applications, while allowing version control of data backups.
We use a layered encryption approach to integrate both version control and assured deletion into
one design. Through system prototyping and extensive experiments, we justify the performance
overhead of FadeVersion in terms of time performance, storage space, and monetary cost.
3. ANALYSIS
Introduction
The Systems Development Life Cycle (SDLC), or Software Development Life Cycle in
systems engineering, information systems and software engineering, is the process of creating or
altering systems, and the models and methodologies that people use to develop these systems. In
software engineering the SDLC concept underpins many kinds of software development
methodologies. These methodologies form the framework for planning and controlling the
creation of an information system the software development process.
Existing system
The challenge for data privacy also arises as more and more sensitive data are being
outsourced by users to cloud. Encryption mechanisms have usually been utilized to protect the
confidentiality before outsourcing data into cloud. Most commercial storage service providers
are reluctant to apply encryption over the data because it makes deduplication impossible. The
reason is that the traditional encryption mechanisms, including public key encryption and
symmetric key encryption, require different users to encrypt their data with their own keys. As a
result, identical data copies of different users will lead to different ciphertexts. To solve the
problems of confidentiality and deduplication, the notion of convergent encryption has been
proposed and widely adopted to enforce data confidentiality while realizing deduplication.
However, these systems achieved confidentiality of outsourced data at the cost of decreased error
resilience. Therefore, how to protect both confidentiality and reliability while achieving
deduplication in a cloud storage system is still a challenge.
• Only one copy for each file stored in cloud even if such a file is owned by a huge number
of users.
Proposed System
We introduce the distributed cloud storage servers into deduplication systems to provide
better fault tolerance. To further protect data confidentiality, the secret sharing technique is
utilized, which is also compatible with the distributed storage systems. In more details, a file is
first split and encoded into fragments by using the technique of secret sharing, instead of
encryption mechanisms. These shares will be distributed across multiple independent storage
servers. Furthermore, to support deduplication, a short cryptographic hash value of the content
will also be computed and sent to each storage server as the fingerprint of the fragment stored at
each server. Only the data owner who first uploads the data is required to compute and distribute
such secret shares, while all following users who own the same data copy do not need to
compute and store these shares any more. To recover data copies, users must access a minimum
number of storage servers through authentication and obtain the secret shares to reconstruct the
data. In other words, the secret shares of data will only be accessible by the authorized users who
own the corresponding data copy. Another distinguishing feature of our proposal is that data
integrity, including tag consistency, can be achieved.
Advantages of Proposed System:
SDLC is nothing but Software Development Life Cycle. It is a standard which is used by
software industry to develop good software.
Stages in SDLC:
• Requirement Gathering
• Analysis
• Designing
• Coding
• Testing
• Maintenance
Requirements Gathering stage:
The requirements gathering process takes as its input the goals identified in the high-level
requirements section of the project plan. Each goal will be refined into a set of one or more
requirements. These requirements define the major functions of the intended application, define
operational data areas and reference data areas, and define the initial data entities. Major
functions include critical processes to be managed, as well as mission critical inputs, outputs and
reports. A user class hierarchy is developed and associated with these major functions, data areas,
and data entities. Each of these definitions is termed a Requirement. Requirements are identified
by unique requirement identifiers and, at minimum, contain a requirement title and
textual description.
These requirements are fully described in the primary deliverables for this stage: the
Requirements Document and the Requirements Traceability Matrix (RTM). The requirements
document contains complete descriptions of each requirement, including diagrams and
references to external documents as necessary. Note that detailed listings of database tables and
fields are not included in the requirements document.
The title of each requirement is also placed into the first version of the RTM, along with the
title of each goal from the project plan. The purpose of the RTM is to show that the product
components developed during each stage of the software development lifecycle are formally
connected to the components developed in prior stages.
In the requirements stage, the RTM consists of a list of high-level requirements, or goals, by title,
with a listing of associated requirements for each goal, listed by requirement title. In this
hierarchical listing, the RTM shows that each requirement developed during this stage is
formally linked to a specific product goal. In this format, each requirement can be traced to a
specific product goal, hence the term requirements traceability.
The outputs of the requirements definition stage include the requirements document, the
RTM, and an updated project plan.
Analysis Stage:
The planning stage establishes a bird's eye view of the intended software product, and uses this
to establish the basic project structure, evaluate feasibility and risks associated with the project,
and describe appropriate management and technical approaches.
The most critical section of the project plan is a listing of high-level product requirements, also
referred to as goals. All of the software product requirements to be developed during the
requirements definition stage flow from one or more of these goals. The minimum information
for each goal consists of a title and textual description, although additional information and
references to external documents may be included. The outputs of the project planning stage are
the configuration management plan, the quality assurance plan, and the project plan and
schedule, with a detailed listing of scheduled activities for the upcoming Requirements stage,
and high level estimates of effort for the out stages.
Designing Stage:
The design stage takes as its initial input the requirements identified in the approved
requirements document. For each requirement, a set of one or more design elements will be
produced as a result of interviews, workshops, and/or prototype efforts. Design elements describe
the desired software features in detail, and generally include functional hierarchy diagrams,
screen layout diagrams, tables of business rules, business process diagrams, pseudo code, and a
complete entity-relationship diagram with a full data dictionary. These design elements are
intended to describe the software in sufficient detail that skilled programmers may develop the
software with minimal additional input.
When the design document is finalized and accepted, the RTM is updated to show that each
design element is formally associated with a specific requirement. The outputs of the design
stage are the design document, an updated RTM, and an updated project plan.
The development stage takes as its primary input the design elements described in the
approved design document. For each design element, a set of one or more software artifacts will
be produced. Software artifacts include but are not limited to menus, dialogs, data management
forms, data reporting formats, and specialized procedures and functions. Appropriate test cases
will be developed for each set of functionally related software artifacts, and an online help
system will be developed to guide users in their interactions with the software.
The RTM will be updated to show that each developed artifact is linked to a specific design
element, and that each developed artifact has one or more corresponding test case items. At this
point, the RTM is in its final configuration. The outputs of the development stage include a fully
functional set of software that satisfies the requirements and design elements previously
documented, an online help system that describes the operation of the software, an
implementation map that identifies the primary code entry points for all major system functions,
a test plan that describes the test cases to be used to validate the correctness and completeness of
the software, an updated RTM, and an updated project plan.
During the integration and test stage, the software artifacts, online help, and test data are
migrated from the development environment to a separate test environment. At this point, all test
cases are run to verify the correctness and completeness of the software. Successful execution of
the test suite confirms a robust and complete migration capability. During this stage, reference
data is finalized for production use and production users are identified and linked to their
appropriate roles. The final reference data (or links to reference data source files) and production
user list are compiled into the Production Initiation Plan.
The outputs of the integration and test stage include an integrated set of software, an online help
system, an implementation map, a production initiation plan that describes reference data and
production users, an acceptance plan which contains the final suite of test cases, and an updated
project plan.
During the installation and acceptance stage, the software artifacts, online help, and initial
production data are loaded onto the production server. At this point, all test cases are run to
verify the correctness and completeness of the software. Successful execution of the test suite is
a prerequisite to acceptance of the software by the customer.
After customer personnel have verified that the initial production data load is correct and
the test suite has been executed with satisfactory results, the customer formally accepts the
delivery of the software.
The primary outputs of the installation and acceptance stage include a production
application, a completed acceptance test suite, and a memorandum of customer acceptance of the
software. Finally, the PDR enters the last of the actual labor data into the project schedule and
locks the project as a permanent project record. At this point the PDR "locks" the project by
archiving all software items, the implementation map, the source code, and the documentation
for future reference.
Maintenance:
Outer rectangle represents maintenance of a project, Maintenance team will start with
requirement study, understanding of documentation later employees will be assigned work and
they will under go training on that particular assigned category.
For this life cycle there is no end, it will be continued so on like an umbrella (no ending point to
umbrella sticks).
3.5. Software Requirement Specification
3.5.1. Overall Description
• ECONOMIC FEASIBILITY
A system can be developed technically and that will be used if installed must still be a
good investment for the organization. In the economical feasibility, the development cost in
creating the system is evaluated against the ultimate benefit derived from the new systems.
Financial benefits must equal or exceed the costs. The system is economically feasible. It does
not require any addition hardware or software. Since the interface for this system is developed
using the existing resources and technologies available at NIC, There is nominal expenditure and
economical feasibility for certain.
• OPERATIONAL FEASIBILITY
Proposed projects are beneficial only if they can be turned out into information system.
That will meet the organization’s operating requirements. Operational feasibility aspects of the
project are to be taken as an important part of the project implementation. This system is targeted
to be in accordance with the above-mentioned issues. Beforehand, the management issues and
user requirements have been taken into consideration. So there is no question of resistance from
the users that can undermine the possible application benefits. The well-planned design would
ensure the optimal utilization of the computer resources and would help in the improvement of
performance status.
• TECHNICAL FEASIBILITY
User Interface
The user interface of this system is a user friendly Java Graphical User Interface.
Hardware Interfaces
The interaction between the user and the console is achieved through Java capabilities.
Software Interfaces
Operating Environment
HARDWARE REQUIREMENTS:
• Hard Disk - 20 GB
• Monitor - SVGA
SOFTWARE REQUIREMENTS:
4. DESIGN
UML diagrams
The Unified Modeling Language allows the software engineer to express an analysis model
using the modeling notation that is governed by a set of syntactic semantic and pragmatic rules.
A UML system is represented using five different views that describe the system from distinctly
different perspective. Each view is defined by a set of diagram, which is as follows.
• The bottom part gives the methods or operations the class can take or undertake
Class diagram:
A sequence diagram is a kind of interaction diagram that shows how processes operate with one
another and in what order. It is a construct of a Message Sequence Chart. A sequence diagram
shows object interactions arranged in time sequence. It depicts the objects and classes involved
in the scenario and the sequence of messages exchanged between the objects needed to carry out
the functionality of the scenario. Sequence diagrams are typically associated with use case
realizations in the Logical View of the system under development. Sequence diagrams are
sometimes called event diagrams, event scenarios, and timing diagrams.
Components are wired together by using an assembly connector to connect the required interface
of one component with the provided interface of another component. This illustrates the service
consumer - service provider relationship between the two components.
The nodes appear as boxes, and the artifacts allocated to each node appear as rectangles within
the boxes. Nodes may have sub nodes, which appear as nested boxes. A single node in a
deployment diagram may conceptually represent multiple physical nodes, such as a cluster of
database servers.
Activity diagram is another important diagram in UML to describe dynamic aspects of the
system. It is basically a flow chart to represent the flow form one activity to another activity. The
activity can be described as an operation of the system.
So the control flow is drawn from one operation to another. This flow can be sequential,
branched or concurrent.
4.8.1 Activity diagram:
Login
No
Yes
4.9 Data Flow Diagram:
Data flow diagrams illustrate how data is processed by a system in terms of inputs and outputs.
Data flow diagrams can be used to provide a clear representation of any business function. The
technique starts with an overall picture of the business and continues by analyzing each of the
functional areas of interest. This analysis can be carried out in precisely the level of detail
required. The technique exploits a method called top-down expansion to conduct the analysis in
a targeted way.
As the name suggests, Data Flow Diagram (DFD) is an illustration that explicates the passage of
information in a process. A DFD can be easily drawn using simple symbols. Additionally,
complicated processes can be easily automated by creating DFDs using easy-to-use, free
downloadable diagramming tools. A DFD is a model for constructing and analyzing information
processes. DFD illustrates the flow of information in a process depending upon the inputs and
outputs. A DFD can also be referred to as a Process Model. A DFD demonstrates business or
technical process with the support of the outside data saved, plus the data flowing from the
process to another and the end results.
Cloud Server
6. Save file information
4. Send file data
5. IMPLEMENTATION
5.1. Modules
• Secret Sharing Scheme module
Module Description:
There are two algorithms in a secret sharing scheme, which are Share and Recover. The
secret is divided and shared by using Share. With enough shares, the secret can be extracted and
recovered with the algorithm of Recover.
This approach provides fault tolerance and allows the user to remain accessible even if
any limited subsets of storage servers fail.
In a block-level deduplication system, the user also needs to firstly perform the file-level
deduplication before uploading his file. If no duplicate is found, the user divides this file into
blocks and performs block-level deduplication.
File Upload
To upload a file, the user first performs the file-level deduplication by sending to the
storage servers. If a duplicate is found, the user will perform the file-level deduplication.
Otherwise, if no duplicate is found, the user performs the block-level deduplication as follows.
File Download
To download a file the user first downloads the secret shares of all the blocks in File.
5.2. Introduction of technologies used
About Java:
Initially the language was called as “oak” but it was renamed as “java” in 1995.The primary
motivation of this language was the need for a platform-independent (i.e. architecture
neutral)language that could be used to create software to be embedded in various consumer
electronic devices.
Java has had a profound effect on the Internet. This is because; java expands the Universe of
objects that can move about freely in Cyberspace. In a network, two categories of objects are
transmitted between the server and the personal computer. They are passive information and
Dynamic active programs. in the areas of Security and probability. But Java addresses these
concerns and by doing so, has opened the door to an exciting new form of program called the
Applet.
An application is a program that runs on our Computer under the operating system of that
computer. It is more or less like one creating using C or C++ .Java’s ability to create Applets
makes it important. An Applet I san application, designed to be transmitted over the Internet
and executed by a Java-compatible web browser. An applet I actually a tiny Java program,
dynamically downloaded across the network, just like an image. But the difference is, it is an
intelligent program, not just a media file. It can be react to the user input and dynamically
change.
Java Architecture
Compilation of code
When you compile the code, the Java compiler creates machine code (called byte code)for a
hypothetical machine called Java Virtual Machine(JVM). The JVM is supposed t executed
the byte code. The JVM is created for the overcoming the issue of probability. The code is
written and compiled for one machine and interpreted on all machines .This machine is
called Java Virtual Machine.
During run-time the Java interpreter tricks the byte code file into thinking that it is running
on a Java Virtual Machine. In reality this could be an Intel Pentium windows 95 or sun
SPARCstation running Solaris or Apple Macintosh running system and all could receive code
from any computer through internet and run the Applets.
Simple:
Java was designed to be easy for the Professional programmer to learn and to use effectively.
If you are an experienced C++ Programmer. Learning Java will oriented features of C++.
Most of the confusing concepts from C++ are either left out of Java or implemented in a
cleaner, more approachable manner. In Java there are a small number of clearly defined ways
to accomplish a given task.
Object oriented
Java was not designed to be source-code compatible with any other language. This allowed
the Java team the freedom to design with a blank state. One outcome of this was a clean
usable, pragmatic approach to objects. The object model in Java is simple and easy to extend,
while simple types, such as integers, are kept as high-performance non-objects.
Robust
The multi-platform environment of the web places extraordinary demands on a program,
because the program must execute reliably in a variety of systems. The ability to create
robust programs. Was given a high priority in the design of Java. Java is strictly typed
language; it checks your code at compile time and runtime.
Java virtually eliminates the problems of memory management and deal location, which is
completely automatic. In a well-written Java program, all run-time errors can and should be
managed by your program.
AWT:
The user interface is that part of a program that interacts with the user of the program. GUI is a
type of user interface that allows users to interact with electronic devices with images rather than
text commands. A class library is provided by the Java programming language which is known as
Abstract Window Toolkit (AWT) for writing graphical programs. The Abstract Window Toolkit
(AWT) contains several graphical widgets which can be added and positioned to the display area
with a layout manager.
As the Java programming language, the AWT is not platform-independent. AWT uses system
peers object for constructing graphical widgets. A common set of tools is provided by the
AWT for graphical user interface design. The implementation of the user interface elements
provided by the AWT is done using every platform's native GUI toolkit. One of the AWT's
significance is that the look and feel of each platform can be preserved.
Components:
Types of Components:
Before proceeding ahead, first we need to know what containers are. After learning containers
we learn all components in detail.
Containers:
Components do not stand alone, but rather are found within containers. In order to make
components visible, we need to add all components to the container. Containers contain and
control the layout of components. In the AWT, all containers are instances of class Container or
one of its subtypes. Components must fit completely within the container that contains them. For
adding components to the container we will use add() method.
Types of containers:
• Then you need to setup event handlers for the user interaction with GUI.
A new thread is started by the interpreter for user interaction when an AWT GUI is displayed.
When any event is received by this new thread such as click of a mouse, pressing of key etc then
one of the event handlers is called by the new thread set up for GUI. One important point to note
here is that the event handler code is executed within the thread.
Creating a Frame:
Method1:
In the first method we will be creating frame by extending Frame class which is defined in
java.awt package. Following program demonstrate the creation of a frame.
import java.awt.*;
{
FrameDemo1()
setTitle("Label Frame");
setVisible(true);
setSize(500,500);
setTitle: For setting the title of the frame we will use this method. It takes String as an argument
which will be the title name.
SetVisible: For making our frame visible we will use this method. This method takes Boolean
value as an argument. If we are passing true then window will be visible otherwise window will
not be visible.
SetSize: For setting the size of the window we will use this method. The first argument is width
of the frame and second argument is height of the frame.
Method 2:
In this method we will be creating the Frame class instance for creating frame window.
Following program demonstrate Method2.
import java.awt.*;
{
Frame f = new Frame();
f.setVisible(true);
f.setSize(500,500);
Types of Components:
• Labels :
This is the simplest component of Java Abstract Window Toolkit. This component is generally
used to show the text or string in your application and label never perform any type of action.
Label l1 = new Label("One");
In the above three lines we have created three labels with the name “one, two, three”. In the third
label we are passing two arguments. Second argument is the justification of the label. Now after
creating components we will be adding it to the container.
add(l1);
add(l2);
add(l3);
We can set or change the text in a label by using the setText( ) method. You can obtain the
current label by calling getText( ). These methods are shown here:
String getText( )
• Buttons :
This is the component of Java Abstract Window Toolkit and is used to trigger actions and other
events required for your application. The syntax of defining the button is as follows :
Button l1 = new Button("One");
We can change the Button's label or get the label's text by using the Button.setLabel(String) and
Button.getLabel() method.
• CheckBox:
A check box is a control that is used to turn an option on or off. It consists of a small box
that can either contain a check mark or not. There is a label associated with each check box that
describes what option the box represents. You change the state of a check box by clicking on it.
The syntax of the definition of Checkbox is as follows :
The first form creates a check box whose label is specified in first argument and whose group is
specified in second argument. If this check box is not part of a group, then cbGroup must be
null. (Check box groups are described in the next section.) The value true determines the initial
state of the check box is checked. The second form creates a check box with only one parameter.
To retrieve the current state of a check box, call getState( ). To set its state, call setState( ). You
can obtain the current label associated with a check box by calling getLabel( ). To set the label,
call setLabel( ). These methods are as follows:
boolean getState( )
String getLabel( )
Here, if on is true, the box is checked. If it is false, the box is cleared. The string passed in str
becomes the new label associated with the invoking check box.
• Radio Button:
This is the special case of the Checkbox component of Java AWT package. This is used as a
group of checkboxes which group name is same. Only one Checkbox from a Checkbox Group
can be selected at a time. Syntax for creating radio buttons is as follows:
For Radio Button we will be using CheckBox class. The only difference in Checkboxes and
radio button is in Check boxes we will specify null for checkboxgroup but whereas in radio
button we will be specifiying the checkboxgroup object in the second parameter.
• Choice:
The Choice class is used to create a pop-up list of items from which the user may choose. Thus, a
Choice control is a form of menu. Syntax for creating choice is as follows:
os.add("Windows 98/XP");
os.add("Windows NT/2000");
os.add("Solaris");
os.add("MacOS");
We will be creating choice with the help of Choice class. Pop up list will be creating with the
creation of object, but it will not have any items. For adding items we will be using add() method
defined in Choice class. To determine which item is currently selected, you may call either
getSelectedItem( ) or getSelectedIndex( ). These methods are shown here:
String getSelectedItem( )
int getSelectedIndex( )
The getSelectedItem( ) method returns a string containing the name of the item.
getSelectedIndex( ) returns the index of the item. The first item is at index 0. By default, the first
item added to the list is selected.
• List:
List class is also same as choice but the only difference in list and choice is, in choice user can
select only one item whereas in List user can select more than one item. Syntax for creating list
is as follows:
First argument in the List constructor specifies the number of items allowed in the list. Second
argument specifies whether multiple selections are allowed or not.
os.add("Windows 98/XP");
os.add("Windows NT/2000");
os.add("Solaris");
os.add("MacOS");
In list we can retrieve the items which are selected by the users. In multiple selection user will be
selecting multiple values for retrieving all the values we have a method called
getSelectedValues() whose return type is string array. For retrieving single value again we can
use the method defined in Choice i.e. getSelectedItem().
• TextField:
Text fields allow the user to enter strings and to edit the text using the arrow keys, cut and paste
keys. TextField is a subclass of TextComponent. Syntax for creating list is as follows:
In the first text field we are specifying the size of the text field and the second text field is
created with the default value. TextField (and its superclass TextComponent) provides several
methods that allow you to utilize a text field. To obtain the string currently contained in the text
field, call getText( ). To set the text, call setText( ). These methods are as follows:
String getText( )
boolean isEditable( )
isEditable( ) returns true if the text may be changed and false if not. In setEditable( ), if
canEdit is true, the text may be changed. If it is false, the text cannot be altered.
There may be times when we will want the user to enter text that is not displayed, such as
a password. We can disable the echoing of the characters as they are typed by calling
setEchoChar( ).
• TextArea:
Above code will create one text area with 20 rows and 30 columns. TextArea is a subclass of
TextComponent. Therefore, it supports the getText( ), setText( ), getSelectedText( ), select( ),
isEditable( ), and setEditable( ) methods described in the preceding section.
The append( ) method appends the string specified by str to the end of the current text. insert( )
inserts the string passed in str at the specified index. To replace text, call replaceRange( ). It
replaces the characters from startIndex to endIndex–1, with the replacement text passed in str.
Layout Managers:
A layout manager automatically arranges controls within a window by using some type of
algorithm. Each Container object has a layout manager associated with it. A layout manager is
an instance of any class that implements the LayoutManager interface. The layout manager is
set by the setLayout( ) method. If no call to setLayout( ) is made, then the default layout
manager is used. Whenever a container is resized (or sized for the first time), the layout manager
is used to position each of the components within it. The setLayout( ) method has the following
general form:
Here, layoutObj is a reference to the desired layout manager. If you wish to disable the layout
manager and position components manually, pass null for layoutObj. If we do this, you will need
to determine the shape and position of each component manually, using the setBounds( ) method
defined by Component.
In which first two arguments are the x and y axis. Third argument is width and fourth
argument is height of the component.
Java has several predefined LayoutManager classes, several of which are described
next. You can use the layout manager that best fits your application.
FlowLayout:
FlowLayout is the default layout manager. This is the layout manager that the preceding
examples have used. FlowLayout implements a simple layout style, which is similar to how
words flow in a text editor. Components are laid out from the upper-left corner, left to right and
top to bottom. When no more components fit on a line, the next one appears on the next line. A
small space is left between each component, above and below, as well as left and right. Here are
the constructors for FlowLayout:
FlowLayout( )
FlowLayout(int how)
The first form creates the default layout, which centers components and leaves five pixels of
space between each component. The second form lets you specify how each line is aligned. Valid
values for how are as follows:
FlowLayout.LEFT
FlowLayout.CENTER
FlowLayout.RIGHT
These values specify left, center, and right alignment, respectively. The third form allows you to
specify the horizontal and vertical space left between components in horz and vert, respectively.
BorderLayout:
The BorderLayout class implements a common layout style for top-level windows. It has four
narrow, fixed-width components at the edges and one large area in the center. The four sides are
referred to as north, south, east, and west. The middle area is called the center. Here are the
constructors defined by BorderLayout:
BorderLayout( )
The first form creates a default border layout. The second allows you to specify the horizontal
and vertical space left between components in horz and vert, respectively. BorderLayout defines
the following constants that specify the regions:
BorderLayout.CENTER BorderLayout.SOUTH
BorderLayout.EAST BorderLayout.WEST
BorderLayout.NORTH
When adding components, you will use these constants with the following form of add( ), which
is defined by Container:
Here, compObj is the component to be added, and region specifies where the component will be
added.
GridLayout:
GridLayout( )
GridLayout(int numRows, int numColumns )
The first form creates a single-column grid layout. The second form creates a grid layout with the
specified number of rows and columns. The third form allows you to specify the horizontal and
vertical space left between components in horz and vert, respectively. Either numRows or
numColumns can be zero. Specifying numRows as zero allows for unlimited-length columns.
Specifying numColumns as zero allows for unlimited-length rows.
Swings:
About Swings:
Swing is important to develop Java programs with a graphical user interface (GUI). There are
many components which are used for the building of GUI in Swing. The Swing Toolkit consists
of many components for the building of GUI. These components are also helpful in providing
interactivity to Java applications. Following are components which are included in Swing toolkit:
• list controls
• buttons
• labels
• tree controls
• table controls
All AWT flexible components can be handled by the Java Swing. Swing toolkit contains far
more components than the simple component toolkit. It is unique to any other toolkit in the way
that it supports integrated internationalization, a highly customizable text package, rich undo
support etc. Not only this you can also create your own look and feel using Swing other than the
ones that are supported by it. The customized look and feel can be created using Synth which is
specially designed. Not to forget that Swing also contains the basic user interface such as
customizable painting, event handling, drag and drop etc.
The Java Foundation Classes (JFC) which supports many more features important to a GUI
program comprises of Swing as well. The features which are supported by Java Foundation
Classes (JFC) are the ability to create a program that can work in different languages, the ability
to add rich graphics functionality etc.
There are several components contained in Swing toolkit such as check boxes, buttons, tables,
text etc. Some very simple components also provide sophisticated functionality. For instance,
text fields provide formatted text input or password field behavior. Furthermore, the file
browsers and dialogs can be used according to one's need and can even be customized.
Swings AWT
Swings are the light weight components. AWTs are the heavy weight components.
Swings are developed by using pure java AWTs are developed by using C and C++.
language.
We can have different look and feel in swings. This feature is not available in awt.
Swing has many advanced features like JTabel, This are not available in awt.
JTabbedPane and JTree
Swing Components:
All the components which are supported in AWT same components are also supported in Swings
with a slight change in their class name.
Label JLabel
TextField JTextField
TextArea JTextArea
Choice JComboBox
Checkbox JCheckBox
List JList
Button JButton
- JRadioButton
- JPasswordField
- JTable
- JTree
- JTabbedPane
MenuBar JMenuBar
Menu JMenu
MenuItem JMenuItem
- JFileChooser
- JOptionPane
We will discuss only those components which are not discussed in AWT chapter.
JTabbedPane class:
The JTabbedPane container allows many panels to occupy the same area of the interface, and the
user may select which to show by clicking on a tab.
Constructor
Example program:
import javax.swing.*;
import java.awt.*;
TabbedPaneDemo()
setLayout(new FlowLayout(FlowLayout.LEFT));
setTitle("Tabbed Demo");
setVisible(true);
setSize(500,500);
setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
pane.addTab("Countries",new Count());
pane.addTab("Cities",new Cit());
add(pane);
new TabbedPaneDemo();
Count()
add(b1);
add(b2);
add(b3);
Cit()
add(cb1);
add(cb2);
add(cb3);
A top-level window can have a menu bar associated with it. A menu bar displays a list of top-
level menu choices. Each choice is associated with a drop-down menu. This concept is
implemented in Java by the following classes: JMenuBar, JMenu, and JMenuItem. In general, a
menu bar contains one or more JMenu objects. Each JMenu object contains a list of JMenuItem
objects. Each JMenuItem object represents something that can be selected by the user. To create
a menu bar, first create an instance of JMenuBar. This class only defines the default constructor.
Next, create instances of JMenu that will define the selections displayed on the bar. Following
are the constructors for Menu:
JMenu( )
JMenu(String optionName)
Here, optionName specifies the name of the menu selection. The first form creates an empty
menu. Individual menu items are of type MenuItem. It defines these constructors:
JMenuItem( )
JMenuItem(String itemName)
CloudServer.java
package deduplicate;
import java.awt.BorderLayout;
import java.awt.Color;
import java.awt.Container;
import java.awt.Font;
import javax.swing.JFrame;
import javax.swing.JLabel;
import javax.swing.JPanel;
import java.awt.event.ActionListener;
import java.awt.event.ActionEvent;
import java.util.ArrayList;
import javax.swing.JOptionPane;
import javax.swing.JTextArea;
import javax.swing.JScrollPane;
import javax.swing.UIManager;
import java.net.Socket;
import java.net.ServerSocket;
import java.net.InetAddress;
public class CloudServer extends JFrame implements Runnable{
JLabel l1;
Font f1,f2;
JPanel p1,p2;
Thread thread;
JScrollPane jsp;
JTextArea area;
ServerSocket server;
RequestHandler rh;
public void start(){
try{
server = new ServerSocket(6666);
area.append("Secure Cloud Server1 Started\n\n");
while(true){
Socket socket = server.accept();
socket.setKeepAlive(true);
InetAddress address=socket.getInetAddress();
String ipadd=address.toString();
area.append("Connected Computers :"+ipadd.substring(1,ipadd.length())
+"\n");
rh = new RequestHandler(socket,area);
rh.start();
}
}catch(Exception e){
e.printStackTrace();
}
}
public CloudServer(){
setTitle("Secure Cloud Server1");
getContentPane().setLayout(new BorderLayout());
f1 = new Font("Monospaced",Font.BOLD,22);
p1 = new JPanel();
l1 = new JLabel("<HTML><BODY><CENTER>SECURE DISTRIBUTED
DEDUPLICATION SYSTEMS WITH IMPROVED
RELIABILITY</CENTER></BODY></HTML>");
l1.setFont(this.f1);
l1.setForeground(new Color(125,54,2));
p1.setBackground(Color.pink);
p1.add(l1);
p2 = new JPanel();
p2.setLayout(new BorderLayout());
area = new JTextArea();
area.setEditable(false);
area.setFont(f2);
jsp = new JScrollPane(area);
p2.add(jsp,BorderLayout.CENTER);
getContentPane().add(p1, BorderLayout.NORTH);
getContentPane().add(p2, BorderLayout.CENTER);
ClientScreen.java
package deduplicate;
import java.awt.BorderLayout;
import java.awt.Color;
import java.awt.Container;
import java.awt.Font;
import javax.swing.JFrame;
import javax.swing.JLabel;
import javax.swing.JPanel;
import javax.swing.JButton;
import javax.swing.JScrollPane;
import javax.swing.JTextArea;
import javax.swing.SwingUtilities;
import java.awt.event.ActionListener;
import java.awt.event.ActionEvent;
import javax.swing.JOptionPane;
import javax.swing.JFileChooser;
import java.io.File;
import java.io.BufferedReader;
import java.io.FileReader;
import java.net.Socket;
import java.io.ObjectOutputStream;
import java.io.ObjectInputStream;
import java.util.ArrayList;
import javax.swing.JTextField;
import java.io.FileInputStream;
import java.util.Random;
import java.io.RandomAccessFile;
import org.jfree.ui.RefineryUtilities;
public class ClientScreen extends JFrame{
JPanel p1,p2;
JLabel l1,l2,l3;
JTextField tf1,tf2;
JButton b1,b2,b3,b4,b5,b6,b7;
Font f1,f2;
User user;
JFileChooser chooser;
int port;
byte filedata[];
boolean file_level,block_level;
String username;
File file;
RandomAccessFile raf;
int tot_blocks;
static long total_time,rsss_time;
public long calculateBlock(){
long length = file.length();
tot_blocks=0;
long size = 0;
if(length >= 1000){
size = length/10;
tot_blocks = 10;
}
if(length < 1000 && length > 500){
size = length/5;
tot_blocks = 5;
}
if(length < 500 && length > 1){
size = length/3;
tot_blocks = 3;
}
return size;
}
public Object[][] getBlocks(long size){
Object row[][]=new Object[tot_blocks][2];
try{
raf = new RandomAccessFile(file,"r");
for(int i=0;i<tot_blocks;i++){
byte b[]=new byte[(int)size];
raf.read(b);
raf.seek(raf.getFilePointer());
row[i][0]=file.getName()+"_b"+i;
row[i][1]=b;
}
raf.close();
}catch(Exception e){
e.printStackTrace();
}
return row;
}
public void send(){
try{
int port = 6666;
long size = calculateBlock();
String tag = SHA.sha(filedata);
Object[][] blocks = getBlocks(size);
StringBuilder sb = new StringBuilder();
for(int i=0;i<blocks.length;i++){
String name = (String)blocks[i][0];
byte b[] = (byte[])blocks[i][1];
String block_tag = SHA.sha(b);
Socket socket = new Socket("localhost",port);
ObjectOutputStream out = new
ObjectOutputStream(socket.getOutputStream());
Object
req[]={"blocks",username,file.getName(),tag,name,block_tag,b,Integer.toString(i)};
out.writeObject(req);
out.flush();
ObjectInputStream in = new ObjectInputStream(socket.getInputStream());
Object res[] = (Object[])in.readObject();
String response = (String)res[0];
sb.append(response+"\n");
if(port == 6666)
port = 7777;
else if(port == 7777)
port = 6666;
}
JOptionPane.showMessageDialog(ClientScreen.this,sb.toString());
}catch(Exception e){
e.printStackTrace();
}
}
public ClientScreen(User usr,String uname){
user = usr;
username = uname;
setTitle("Cloud User Screen");
f1 = new Font("Courier New",Font.BOLD+Font.ITALIC,18);
p1 = new JPanel();
l1 = new JLabel("<HTML><BODY><CENTER>SECURE DISTRIBUTED
DEDUPLICATION SYSTEMS WITH IMPROVED
RELIABILITY</CENTER></BODY></HTML>");
l1.setFont(this.f1);
l1.setForeground(new Color(125,254,120));
p1.add(l1);
p1.setBackground(new Color(100,30,40));
p2 = new JPanel();
p2.setLayout(null);
b1 = new JButton("Upload File");
b1.setFont(f2);
b1.setBounds(200,50,400,30);
p2.add(b1);
b1.addActionListener(new ActionListener(){
public void actionPerformed(ActionEvent ae){
int option = chooser.showOpenDialog(ClientScreen.this);
if(option == chooser.APPROVE_OPTION){
file = chooser.getSelectedFile();
try{
file_level = false;
FileInputStream fin = new FileInputStream(file);
filedata = new byte[fin.available()];
fin.read(filedata,0,filedata.length);
fin.close();
}catch(Exception e){
e.printStackTrace();
}
}
}
});
b5 = new JButton("Logout");
b5.setFont(f2);
b5.setBounds(200,350,400,30);
p2.add(b5);
b5.addActionListener(new ActionListener(){
public void actionPerformed(ActionEvent ae){
setVisible(false);
user.setVisible(true);
}
});
getContentPane().add(p1, "North");
getContentPane().add(p2, "Center");
}
public void fileLevel(){
try{
String tag = SHA.sha(filedata);
String cname = "";
int random = getRandom();
int port = 0;
if(random == 0){
port = 6666;
cname = "Cloud Server1";
}
if(random == 1){
port = 7777;
cname = "Cloud Server2";
}
Socket socket = new Socket("localhost",port);
ObjectOutputStream out = new ObjectOutputStream(socket.getOutputStream());
Object req[]={"filelevel",username,file.getName(),tag};
out.writeObject(req);
out.flush();
ObjectInputStream in = new ObjectInputStream(socket.getInputStream());
Object res[] = (Object[])in.readObject();
String response = (String)res[0];
String str[] = response.split("#");
if(str[1].equals("true"))
file_level = true;
JOptionPane.showMessageDialog(ClientScreen.this,str[0]);
}catch(Exception e){
e.printStackTrace();
}
}
public void blockLevel(){
long size = calculateBlock();
Object[][] blocks = getBlocks(size);
StringBuilder sb = new StringBuilder();
for(int i=0;i<blocks.length;i++){
String name = (String)blocks[i][0];
byte b[] = (byte[])blocks[i][1];
blockLevel(b,sb);
}
JOptionPane.showMessageDialog(ClientScreen.this,sb.toString());
}
public void blockLevel(byte block[],StringBuilder sb){
try{
String tag = SHA.sha(block);
String cname = "";
int random = getRandom();
int port = 0;
if(random == 0){
port = 6666;
cname = "Cloud Server1";
}
if(random == 1){
port = 7777;
cname = "Cloud Server2";
}
Socket socket = new Socket("localhost",port);
ObjectOutputStream out = new ObjectOutputStream(socket.getOutputStream());
Object req[]={"blocklevel",username,file.getName(),tag};
out.writeObject(req);
out.flush();
ObjectInputStream in = new ObjectInputStream(socket.getInputStream());
Object res[] = (Object[])in.readObject();
String response = (String)res[0];
sb.append(response+"\n");
}catch(Exception e){
e.printStackTrace();
}
}
public int getRandom(){
Random r = new Random();
return r.nextInt(2);
}
}
6. TESTING
Implementation and Testing:
Implementation is one of the most important tasks in project is the phase in which one has to be
cautions because all the efforts undertaken during the project will be very interactive.
Implementation is the most crucial stage in achieving successful system and giving the users
confidence that the new system is workable and effective. Each program is tested individually at
the time of development using the sample data and has verified that these programs link together
in the way specified in the program specification. The computer system and its environment are
tested to the satisfaction of the user.
Implementation
The implementation phase is less creative than system design. It is primarily concerned with user
training, and file conversion. The system may be requiring extensive user training. The initial
parameters of the system should be modifies as a result of a programming. A simple operating
procedure is provided so that the user can understand the different functions clearly and quickly.
The different reports can be obtained either on the inkjet or dot matrix printer, which is available
at the disposal of the user. The proposed system is very easy to implement. In general
implementation is used to mean the process of converting a new or revised system design into an
operational one.
Testing
Testing is the process where the test data is prepared and is used for testing the modules
individually and later the validation given for the fields. Then the system testing takes place
which makes sure that all components of the system property functions as a unit. The test data
should be chosen such that it passed through all possible condition. Actually testing is the state
of implementation which aimed at ensuring that the system works accurately and efficiently
before the actual operation commence. The following is the description of the testing strategies,
which were carried out during the testing period.
System Testing
Testing has become an integral part of any system or project especially in the field of
information technology. The importance of testing is a method of justifying, if one is ready to
move further, be it to be check if one is capable to with stand the rigors of a particular situation
cannot be underplayed and that is why testing before development is so critical. When the
software is developed before it is given to user to user the software must be tested whether it is
solving the purpose for which it is developed. This testing involves various types through
which one can ensure the software is reliable. The program was tested logically and pattern of
execution of the program for a set of data are repeated. Thus the code was exhaustively
checked for all possible correct data and the outcomes were also checked.
Module Testing
To locate errors, each module is tested individually. This enables us to detect error and correct
it without affecting any other modules. Whenever the program is not satisfying the required
function, it must be corrected to get the required result. Thus all the modules are individually
tested from bottom up starting with the smallest and lowest modules and proceeding to the next
level. Each module in the system is tested separately. For example the job classification module
is tested separately. This module is tested with different job and its approximate execution time
and the result of the test is compared with the results that are prepared manually. The
comparison shows that the results proposed system works efficiently than the existing system.
Each module in the system is tested separately. In this system the resource classification and
job scheduling modules are tested separately and their corresponding results are obtained which
reduces the process waiting time.
Integration Testing
After the module testing, the integration testing is applied. When linking the modules there
may be chance for errors to occur, these errors are corrected by using this testing. In this system
all modules are connected and tested. The testing results are very correct. Thus the mapping of
jobs with resources is done correctly by the system.
Acceptance Testing
When that user fined no major problems with its accuracy, the system passers through a final
acceptance test. This test confirms that the system needs the original goals, objectives and
requirements established during analysis without actual execution which elimination wastage of
time and money acceptance tests on the shoulders of users and management, it is finally
acceptable and ready for the operation.
7. SCREEN SHOTS
To execute the application open cloud1 and start the server by double click on “run”
Click “check file level deduplication” then the entire file will be checked whether duplicate copy
exists or not.
Click “encoding & decoding rsss parameters” to see the execution time of algorithm and the total
time.
Now login as other user and upload different file with the same content to check the data is
duplicate or not.
8. CONCLUSION
We proposed the distributed deduplication systems to improve the reliability of data while
achieving the confidentiality of the users’ outsourced data without an encryption mechanism.
Four constructions were proposed to support file-level and fine-grained block-level data
deduplication. The security of tag consistency and integrity were achieved. We implemented our
deduplication systems using the Ramp secret sharing scheme and demonstrated that it incurs
small encoding/decoding overhead compared to the network transmission overhead in regular
upload/download operations.
9. BIBLIOGRAPHY
• Amazon, “Case Studies,” https://aws.amazon.com/solutions/casestudies/# backup.
• J. Gantz and D. Reinsel, “The digital universe in 2020: Big data, bigger digi tal shadows,
and biggest growth in the far east,” http://www.emc.com/collateral/analyst-reports/idcthe-
digital-universe-in-2020.pdf, Dec 2012.
• M. O. Rabin, “Efficient dispersal of information for security, load balancing, and fault
tolerance,” Journal of the ACM, vol. 36, no. 2, pp. 335–348, Apr. 1989.
• A. Shamir, “How to share a secret,” Commun. ACM, vol. 22, no. 11, pp. 612–613, 1979.