0% found this document useful (0 votes)
65 views

Provable Multicopy Dynamic Data Possession in Cloud Computing Systems

admin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views

Provable Multicopy Dynamic Data Possession in Cloud Computing Systems

admin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO.

3, MARCH 2015

485

Provable Multicopy Dynamic Data Possession


in Cloud Computing Systems
Ayad F. Barsoum and M. Anwar Hasan

Abstract Increasingly more and more organizations


are opting for outsourcing data to remote cloud service providers (CSPs). Customers can rent the CSPs storage infrastructure
to store and retrieve almost unlimited amount of data by
paying fees metered in gigabyte/month. For an increased level of
scalability, availability, and durability, some customers may want
their data to be replicated on multiple servers across multiple
data centers. The more copies the CSP is asked to store, the
more fees the customers are charged. Therefore, customers need
to have a strong guarantee that the CSP is storing all data copies
that are agreed upon in the service contract, and all these copies
are consistent with the most recent modifications issued by the
customers. In this paper, we propose a map-based provable multicopy dynamic data possession (MB-PMDDP) scheme that has
the following features: 1) it provides an evidence to the customers
that the CSP is not cheating by storing fewer copies; 2) it supports
outsourcing of dynamic data, i.e., it supports block-level operations, such as block modification, insertion, deletion, and append;
and 3) it allows authorized users to seamlessly access the file
copies stored by the CSP. We give a comparative analysis of the
proposed MB-PMDDP scheme with a reference model obtained
by extending existing provable possession of dynamic single-copy
schemes. The theoretical analysis is validated through experimental results on a commercial cloud platform. In addition, we show
the security against colluding servers, and discuss how to identify
corrupted copies by slightly modifying the proposed scheme.
Index Terms Cloud computing, data replication, outsourcing
data storage, dynamic environment.

I. I NTRODUCTION

UTSOURCING data to a remote cloud service


provider (CSP) allows organizations to store more data
on the CSP than on private computer systems. Such outsourcing of data storage enables organizations to concentrate on
innovations and relieves the burden of constant server updates
and other computing issues. Moreover, many authorized users
Manuscript received October 18, 2013; revised May 22, 2014 and
December 11, 2014; accepted December 11, 2014. Date of publication December 18, 2014; date of current version January 22, 2015.
This work was supported by the Natural Sciences and Engineering Research Council of Canada. The associate editor coordinating
the review of this manuscript and approving it for publication was
Prof. C.-C. Jay Kuo.
A. F. Barsoum is with the Department of Computer Science, St.
Marys University at Texas, San Antonio, TX 78228 USA (e-mail:
[email protected]).
M. A. Hasan is with the Department of Electrical and Computer
Engineering, University of Waterloo, ON N2L 3G1, Canada (e-mail:
[email protected]).
This
paper
has
supplementary
downloadable
material
at
http://ieeexplore.ieee.org, provided by the authors. The file consists of
Appendices A, B, C, and D. The material is 1 MB in size.
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TIFS.2014.2384391

can access the remotely stored data from different geographic


locations making it more convenient for them.
Once the data has been outsourced to a remote CSP
which may not be trustworthy, the data owners lose the
direct control over their sensitive data. This lack of control
raises new formidable and challenging tasks related to data
confidentiality and integrity protection in cloud computing.
The confidentiality issue can be handled by encrypting sensitive data before outsourcing to remote servers. As such, it is
a crucial demand of customers to have a strong evidence that
the cloud servers still possess their data and it is not being
tampered with or partially deleted over time. Consequently,
many researchers have focused on the problem of provable
data possession (PDP) and proposed different schemes to audit
the data stored on remote servers.
PDP is a technique for validating data integrity over remote
servers. In a typical PDP model, the data owner generates some metadata/information for a data file to be used
later for verification purposes through a challenge-response
protocol with the remote/cloud server. The owner sends the
file to be stored on a remote server which may be untrusted,
and deletes the local copy of the file. As a proof that the
server is still possessing the data file in its original form,
it needs to correctly compute a response to a challenge
vector sent from a verifier who can be the original data
owner or a trusted entity that shares some information with
the owner. Researchers have proposed different variations of
PDP schemes under different cryptographic assumptions; for
example, see [1][9].
One of the core design principles of outsourcing data is
to provide dynamic behavior of data for various applications. This means that the remotely stored data can be not
only accessed by the authorized users, but also updated and
scaled (through block level operations) by the data owner.
PDP schemes presented in [1][9] focus on only static or
warehoused data, where the outsourced data is kept unchanged
over remote servers. Examples of PDP constructions that
deal with dynamic data are [10][14]. The latter are however for a single copy of the data file. Although PDP
schemes have been presented for multiple copies of static data,
see [15][17], to the best of our knowledge, this work is
the first PDP scheme directly dealing with multiple copies
of dynamic data. In Appendix A, we provide a summary of
related work.
When verifying multiple data copies, the overall system
integrity check fails if there is one or more corrupted copies.
To address this issue and recognize which copies have been

1556-6013 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

486

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO. 3, MARCH 2015

corrupted, we discuss a slight modification to be applied to


the proposed scheme.
A. Main Contributions
Our contributions can be summarized as follows:
We propose a map-based provable multi-copy dynamic
data possession (MB-PMDDP) scheme. This scheme
provides an adequate guarantee that the CSP stores all
copies that are agreed upon in the service contract.
Moreover, the scheme supports outsourcing of dynamic
data, i.e., it supports block-level operations such as block
modification, insertion, deletion, and append. The authorized users, who have the right to access the owners file,
can seamlessly access the copies received from the CSP.
We give a thorough comparison of MB-PMDDP with
a reference scheme, which one can obtain by extending existing PDP models for dynamic single-copy data.
We also report our implementation and experiments using
Amazon cloud platform.
We show the security of our scheme against colluding
servers, and discuss a slight modification of the proposed
scheme to identify corrupted copies.
Remark 1: Proof of retrievability (POR) is a complementary
approach to PDP, and is stronger than PDP in the sense
that the verifier can reconstruct the entire file from responses
that are reliably transmitted from the server. This is due to
encoding of the data file, for example using erasure codes,
before outsourcing to remote servers. Various POR schemes
can be found in the literature, for example [18][23], which
focus on static data.
In this work, we do not encode the data to be outsourced
for the following reasons. First, we are dealing with dynamic
data, and hence if the data file is encoded before outsourcing,
modifying a portion of the file requires re-encoding the data
file which may not be acceptable in practical applications due
to high computation overhead. Second, we are considering
economically-motivated CSPs that may attempt to use less
storage than required by the service contract through deletion
of a few copies of the file. The CSPs have almost no financial
benefit by deleting only a small portion of a copy of the
file. Third, and more importantly, unlike erasure codes, duplicating data files across multiple servers achieves scalability
which is a fundamental customer requirement in CC systems.
A file that is duplicated and stored strategically on multiple
servers located at various geographic locations can help
reduce access time and communication cost for users. Besides,
a servers copy can be reconstructed even from a complete
damage using duplicated copies on other servers.
B. Paper Organization
The remainder of the paper is organized as follow. Our
system and assumptions are presented in Section II. The
proposed scheme is elaborated in Section III. The performance
analysis is shown in Section IV. Section V presents the
implementation and experimental results using Amazon cloud
platform. How to identify the corrupted copies is discussed in
Section VI. Concluding remarks are given in Section VII.

Fig. 1.

Cloud computing data storage system model.

II. O UR S YSTEM AND A SSUMPTIONS


A. System Components
The cloud computing storage model considered in this work
consists of three main components as illustrated in Fig. 1:
(i) a data owner that can be an organization originally
possessing sensitive data to be stored in the cloud; (ii) a CSP
who manages cloud servers (CSs) and provides paid storage
space on its infrastructure to store the owners files; and
(iii) authorized users a set of owners clients who have
the right to access the remote data.
The storage model used in this work can be adopted by
many practical applications. For example, e-Health applications can be envisioned by this model where the patients
database that contains large and sensitive information can be
stored on the cloud servers. In these types of applications, the
e-Health organization can be considered as the data owner,
and the physicians as the authorized users who have the
right to access the patients medical history. Many other
practical applications like financial, scientific, and educational
applications can be viewed in similar settings.
B. Outsourcing, Updating, and Accessing
The data owner has a file F consisting of m blocks and the
2 , . . . , F
n } of the owners
1 , F
CSP offers to store n copies { F
file on different servers to prevent simultaneous failure
of all copies in exchange of pre-specified fees metered
in GB/month. The number of copies depends on the nature of
data; more copies are needed for critical data that cannot easily
be reproduced, and to achieve a higher level of scalability.
This critical data should be replicated on multiple servers
across multiple data centers. On the other hand, non-critical,
reproducible data are stored at reduced levels of redundancy.
The CSP pricing model is related to the number of data copies.
For data confidentiality, the owner encrypts his data before
outsourcing to CSP. After outsourcing all n copies of the file,
the owner may interact with the CSP to perform block-level
operations on all copies. These operations includes modify,
insert, append, and delete specific blocks of the outsourced
data copies.
An authorized user of the outsourced data sends a dataaccess request to the CSP and receives a file copy in an
encrypted form that can be decrypted using a secret key shared
with the owner. According to the load balancing mechanism
used by the CSP to organize the work of the servers, the

BARSOUM AND HASAN: PMDDP IN CLOUD COMPUTING SYSTEMS

data-access request is directed to the server with the lowest


congestion, and thus the user is not aware of which copy has
been received.
We assume that the interaction between the owner and the
authorized users to authenticate their identities and share the
secret key has already been completed, and it is not considered
in this work.
C. Threat Model
The integrity of customers data in the cloud may be at risk
due to the following reasons. First, the CSP whose goal
is likely to make a profit and maintain a reputation has an
incentive to hide data loss (due to hardware failure, management errors, various attacks) or reclaim storage by discarding
data that has not been or is rarely accessed. Second, a dishonest
CSP may store fewer copies than what has been agreed upon
in the service contact with the data owner, and try to convince
the owner that all copies are correctly stored intact. Third, to
save the computational resources, the CSP may totally ignore
the data-update requests issued by the owner, or not execute
them on all copies leading to inconsistency between the file
copies. The goal of the proposed scheme is to detect (with high
probability) the CSP misbehavior by validating the number
and integrity of file copies.

487

to delete a specific block from the file copies. UpdateReq


also contains updated (or new) tags for modified (or
inserted/appended) blocks, and it is sent from the data
owner to the CSP in order to perform the requested
update.
F, , UpdateReq). This algo (
F ,  ) ExecUpdate(
rithm is run by the CSP, where the input parameters are the file copies 
F, the tags set , and the
request UpdateReq. It outputs an updated version of the
file copies 
F along with an updated tags set  . The
latter does not require the private key to be generated;
just replacement/insertion/deletion of one item of  by a
new item sent from the owner.
P Prove(
F, , chal). This algorithm is run by the
CSP. It takes as input the file copies 
F, the tags set , and
a challenge chal (sent from a verifier). It returns a proof P
which guarantees that the CSP is actually storing n copies
and all these copies are intact, updated, and consistent.
{1, 0} Verify( pk, P, D). This algorithm is run by a
verifier (original owner or any other trusted auditor).
It takes as input the public key pk, the proof P returned
from the CSP, and the most recent metadata D. The output
is 1 if the integrity of all file copies is correctly verified
or 0 otherwise.

D. Underlying Algorithms

E. Security Requirements

The proposed scheme consists of seven polynomial


time algorithms: KeyGen, CopyGen, TagGen, PrepareUpdate, ExecUpdate, Prove, and Verify. The data owner
runs the algorithms KeyGen, CopyGen, TagGen, and
PrepareUpdate. The CSP runs the algorithms ExecUpdate
and Prove, while a verifier runs the Verify algorithm.
( pk, sk) KeyGen(). This algorithm is run by the data
owner to generate a public key pk and a private key sk.
The private key sk is kept secret by the owner, while
pk is publicly known.

F CopyGen(C Ni , F)1in . This algorithm is run by
the data owner. It takes as input a copy number C Ni
i }1in . The
and a file F, and generates n copies 
F = {F

owner sends the copies F to the CSP to be stored on
cloud servers.
 TagGen(sk, 
F). This algorithm is run by the data
owner. It takes as input the private key sk and the file
copies 
F, and outputs tags/authenticators set , which
is an ordered collection of tags for the data blocks. The
owner sends  to the CSP to be stored along with the
copies 
F.
(D , UpdateReq) PrepareUpdate(D, UpdateInfo).
This algorithm is run by the data owner to update
the outsourced file copies stored by the remote CSP.
The input parameters are a previous metadata D stored
on the owner side, and some information UpdateInfo
about the dynamic operation to be performed on a specific
block. The outputs of this algorithm are a modified metadata D and an update request UpdateReq. This request
may contain a modified version of a previously stored
block, a new block to be inserted, or a delete command

The security of the proposed scheme can be stated


using a game that captures the data possession
property [1], [12], [18]. The data possession game between
an adversary A (acts as a malicious CSP) and a challenger C
(plays the roles of a data owner and a verifier) consists of the
following:
S ETUP . C runs the KeyGen algorithm to generate a key
pair ( pk, sk), and sends pk to A.
I NTERACT . A interacts with C to get the file copies and
the verification tags set . A adaptively selects a file F
and sends it to C. C divides the file into m blocks, runs
the two algorithms CopyGen and TagGen to create n
distinct copies 
F along with the tags set , and returns
both 
F and  to A.
Moreover, A can interact with C to perform dynamic
operations on 
F. A specifies a block to be updated,
inserted, or deleted, and sends the block to C. C runs the
PrepareUpdate algorithm, sends the UpdateReq to A,
and updates the local metadata D. A can further request
challenges {chali }1iL for some parameter L 1 of
As choice, and return proofs {Pi }1iL to C. C runs
the Verify algorithm and provides the verification results
to A. The I NTERACT step can be repeated polynomiallymany times.
C HALLENGE . A decides on a file F previously used
during the I NTERACT step, requests a challenge chal
from C, and generates a proof P Prove(
F , , chal),
where 
F is 
F except that at least one of its file copies
(or a portion of it) is missing or tampered with. Upon
receiving the proof P, C runs the Verify algorithm and if
Verify( pk, P, D) returns 1, then A has won the game.

488

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO. 3, MARCH 2015

Note that D is the latest metadata held by C corresponding to the file F. The C HALLENGE step can be
repeated polynomially-many times for the purpose of data
extraction.
The proposed scheme is secure if the probability that any
probabilistic polynomial-time (PPT) adversary A wins the
game is negligible. In other words, if a PPT adversary A can
win the game with non-negligible probability, then there exists
a polynomial time extractor that can repeatedly execute the
C HALLENGE step until it extracts the blocks of data copies.
III. P ROPOSED MB-PMDDP S CHEME
A. Overview and Rationale
Generating unique differentiable copies of the data file is the
core to design a provable multi-copy data possession scheme.
Identical copies enable the CSP to simply deceive the owner
by storing only one copy and pretending that it stores multiple
copies. Using a simple yet efficient way, the proposed scheme
generates distinct copies utilizing the diffusion property of any
secure encryption scheme. The diffusion property ensures that
the output bits of the ciphertext depend on the input bits of
the plaintext in a very complex way, i.e., there will be an
unpredictable complete change in the ciphertext, if there is a
single bit change in the plaintext [24]. The interaction between
the authorized users and the CSP is considered through this
methodology of generating distinct copies, where the former
can decrypt/access a file copy received from the CSP. In the
proposed scheme, the authorized users need only to keep a
single secret key (shared with the data owner) to decrypt the
file copy, and it is not necessarily to recognize the index of
the received copy.
In this work, we propose a MB-PMDDP scheme allowing
the data owner to update and scale the blocks of file copies
outsourced to cloud servers which may be untrusted. Validating such copies of dynamic data requires the knowledge of the
block versions to ensure that the data blocks in all copies are
consistent with the most recent modifications issued by the
owner. Moreover, the verifier should be aware of the block
indices to guarantee that the CSP has inserted or added the
new blocks at the requested positions in all copies. To this end,
the proposed scheme is based on using a small data structure
(metadata), which we call a map-version table.
B. Map-Version Table
The map-version table (MVT) is a small dynamic data
structure stored on the verifier side to validate the integrity
and consistency of all file copies outsourced to the CSP.
The MVT consists of three columns: serial number (SN ),
block number (BN ), and block version (BV). The SN is an
indexing to the file blocks. It indicates the physical position
of a block in a data file. The BN is a counter used to make
a logical numbering/indexing to the file blocks. Thus, the
relation between BN and SN can be viewed as a mapping
between the logical number BN and the physical position SN .
The BV indicates the current version of file blocks. When a
data file is initially created the BV of each block is 1. If a
specific block is being updated, its BV is incremented by 1.

Remark 2: It is important to note that the verifier keeps


only one table for unlimited number of file copies, i.e., the
storage requirement on the verifier side does not depend on
the number of file copies on cloud servers. For n copies of a
data file of size |F|, the storage requirement on the CSP side
is O(n|F|), while the verifiers overhead is O(m) for all file
copies (m is the number of file blocks).
Remark 3: The MVT is implemented as a linked list to
simplify the insertion deletion of table entries. For actual
implementation, the SN is not needed to be stored in the table;
SN is considered to be the entry/table index, i.e., each table
entry contains just two integers BN and BV (8 bytes). Thus,
the total table size is 8m bytes for all file copies. We further
note that although the table size is linear to the file size, in
practice the former would be smaller by several orders of
magnitude. For example, outsourcing unlimited number of file
copies of a 1GB-file with 16KB block size requires a verifier
to keep MVT of only 512KB (< 0.05% of the file size). More
details on the MVT and how it works will be explained later.
C. Notations
F is a data file to be outsourced, and is composed of a
sequence of m blocks, i.e., F = {b1 , b2 , . . . , bm }.
key () is a pseudo-random permutation (PRP): key
{0, 1}log2 (m) {0, 1}log2 (m) .1
key () is a pseudo-random function (PRF): key
{0, 1} Z p ( p is a large prime).
Bilinear Map/Pairing: Let G1 , G2 , and GT be cyclic
groups of prime order p. Let g1 and g2 be generators
of G1 and G2 , respectively. A bilinear pairing is a map
e : G1 G2 GT with the properties [25]:
1) Bilinear: e(u
a , v b ) = e(u, v)ab u G1 , v G2 ,
and a, b Z p
2) Non-Degenerate: e(g
1 , g2 )
= 1
3) Computable: there exists an efficient algorithm for
computing e
H() is a map-to-point hash function : {0, 1} G1 .
E K is an encryption algorithm with strong diffusion
property, e.g., AES.
Remark
4:
Homomorphic
linear
authenticators
(HLAs) [18], [22], [26] are basic building blocks in the
proposed scheme. Informally, the HLA is a fingerprint/tag
computed by the owner for each block b j that enables a
verifier to validate the data possession on remote servers
by sending a challenge vector C = {c1 , c2 , . . . , cr }. As a
response, the servers can homomorphically
construct a
r
tag authenticating the value
j =1 c j b j . The response is
validated by a verifier, and accepted only if the servers
honestly compute the response using the owners file blocks.
The proposed scheme utilizes the BLS HLAs [18].
D. MB-PMDDP Procedural Steps
 Key Generation: Let e : G1 G2 GT be a bilinear
map and g a generator of G2 . The data owner runs the
1 The number of file blocks (m) will be changed due to dynamic operations
on the file. We use HMAC-SHA-1 with 160-bit output to allow up to
2160 blocks in the file.

BARSOUM AND HASAN: PMDDP IN CLOUD COMPUTING SYSTEMS

489

KeyGen algorithm to generate a private key x Z p and


a public key y = g x G2 .
 Generation of Distinct Copies: For a file F = {b j }1 j m ,
the owner runs the CopyGen algorithm to create n
i }1in , where a copy
differentiable copies 
F = {F


Fi = {bi j }1 j m . The block bi j is generated by concatenating a copy number i with the block b j , then encrypting
using an encryption scheme E K , i.e., bi j = E K (i ||b j ).
The encrypted block bi j is fragmented into s sectors
i = {bi j k }1 j m ,
{bi j 1 , bi j 2 , . . . , bi j s }, i.e., the copy F
1ks

where each sector bi j k Z p for some large prime p.


The number of block sectors s depends on the block size
and the prime p, where s = |block size|/| p| (|.| is the
bit length).
The authorized users need only to keep a single secret
key K . Later, when an authorized user receives a file copy
from the CSP, he decrypts the copy blocks, removes the
copy index from the blocks header, and then recombines
the decrypted blocks to reconstruct the plain form of the
received file copy.
 Generation of Tags: Given the distinct file copies

i = {bi j k }, the data owner chooses
i }, where F
F = {F
s random elements (u 1 , u 2 , . . . , u s ) R G1 and runs the
TagGen algorithm to generate a tag i j for each block

b
bi j as i j = (H(I D F ||BN j ||BV j ) sk=1 u k i j k )x G1
(i : 1 n, j : 1 m, k : 1 s). In the tag
computation, BN j is the logical number of the block
at physical position j , BV j is the current version of
that block, and I D F = Filename||n||u 1|| . . . ||u s is a
unique fingerprint for each file F comprising the file
name, the number of copies for this file, and the random
values {u k }1ks . Note that if the data owner decides to
use different block size (or different p) for his different
files, the number of block sectors s and hence {u k }1ks
will change. We assume that I D F is signed with some
owners signing secret key (different than x), and the CSP
verifies this signature during different scheme operations
to validate the owners identity.
In order to reduce storage overhead on cloud servers
and lower communication cost, the data owner generates
an aggregated tag j forthe blocks at the same indices in
i as j = ni=1 i j G1 . Hence, instead of
each copy F
storing mn tags, the proposed scheme requires the CSP to
store only m tags for the copies 
F. Let us denote the set
of aggregated tags as  = { j }1 j m . The data owner
sends {
F, , I D F } to the CSP, and deletes the copies and
the tags from its local storage. The MVT is stored on the
local storage of the owner (or any trusted verifier).
 Dynamic Operations on the Data Copies: The dynamic
operations are performed at the block level via a request
in the general form I D F , BlockOp, j, {bi}1in , j ,
where I D F is the file identifier and BlockOp corresponds to block modification (denoted by BM), block
insertion (BI), or block deletion (BD). The parameter j indicates the index of the block to be updated,
{bi }1in are the new block values for all copies,

and j is the new aggregated tag for the new


blocks.
 Modification: For a file F = {b1 , b2 , . . . , bm }, suppose the owner wants to modify a block b j with b j
for all file copies 
F. The owner runs the PrepareUpdate algorithm to do the following:
1) Updates BV j = BV j + 1 in the MVT
2) Creates n distinct blocks {bi j }1in , where
bi j = E K (i ||bj ) is fragmented into s sectors
{bi j 1 , bi j 2 , . . . , bi j s }
3) Creates a new tag ij for each block bi j as
b 

ij = (H(I D F ||BN j ||BV j ) sk=1 u k i j k )x
G , then generates an aggregated tag j =
1n

i=1 i j G1
4) Sends
a
modify
request
I D F , BM, j, {bi j }1in , j  to the CSP
Upon receiving the request, the CSP runs the
ExecUpdate algorithm to do the following:
1) Replaces the block bi j with bi j i , and constructs
 }1in
updated file copies 
F = { F
i

2) Replaces j with j in the set , and outputs
 = {1 , 2 , . . . , j , . . . , m }.
 Insertion: Suppose the owner wants to insert a new
block b after position j in a file F = {b1,
b2 , . . . , bm }, i.e., the newly constructed file is F  =
. . . , bm+1 }, where b j +1 = b.

{b1 , b2 , . . . , b j , b,
In the proposed MB-PMDDP scheme, the physical
block index SN is not included in the block tag.
Thus, the insertion operation can be performed without recomputing the tags of all blocks that have been
shifted after inserting the new block. Embedding
the physical index in the tag results in unacceptable
computation overhead, especially for large data files.
To perform the insertion of a new block b after
position j in all file copies 
F, the owner runs the
PrepareUpdate algorithm to do the following:
1) Constructs a new table entry SN , BN , BV =
j +1, (Max{BN j }1 j m )+1, 1, and inserts this
entry in the MVT after position j
2) Creates n distinct blocks {bi }1in , where
is fragmented into s sectors
bi = E K (i ||b)
{bi1 , bi2 , . . . , bis }
3) Creates a new tag i for each block bi as


i = (H(I D F ||BN j +1 ||BV j +1 ) sk=1 u kbik )x


G1 , then generates an aggregated tag =

n
i G1 . Note that BN j +1 is the logical
i=1
number of the new block with current version
BV j +1 = 1
4) Sends
an
insert
request
I D F , BI, j,
{bi }1in ,  to the CSP
Upon receiving the insert request, the CSP runs the
ExecUpdate algorithm to do the following:
1) Inserts the block bi after position j in the file
i i , and constructs a new version of the
copy F
 }1in
file copies 
F = { F
i

490

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO. 3, MARCH 2015

Fig. 2. Changes in the MVT due to different dynamic operations on copies


of a file F = {b j }1 j 8 .

2) Inserts after index j in , and outputs


 = {1 , . . . , j , , . . . , m+1 }, i.e., j +1 =
Remark 5: To prevent the CSP from cheating and
using less storage, the modified or inserted blocks for
the outsourced copies cannot be identical. To this
end, the proposed scheme leaves the control of
creating such distinct blocks in the owner hand. This
illustrates the linear relation between the work done
by the owner during dynamic operations and the
number of copies. The proposed scheme assumes
that the CSP stores the outsourced copies on different
servers to avoid simultaneous failure and achieve a
higher level of availability. Therefore, even if the
CSP is honest to perform part of the owner work, this
is unlikely to significantly reduce the communication
overhead since the distinct blocks are sent to different
servers for updating the copies. The experimental
results show that the computation overhead on the
owner side due to dynamic block operations is
practical.
 Append: Block append operation means adding a
new block at the end of the outsourced data. It can
simply be implemented via insert operation after the
last block of the data file.
 Deletion: When one block is deleted all subsequent
blocks are moved one step forward. To delete a
specific data block at position j from all copies, the
owner deletes the entry at position j from the MVT
and sends a delete request I D F , BD, j, null, null
to the CSP. Upon receiving this request, the CSP runs
the ExecUpdate algorithm to do the following:
1) Deletes the blocks {bi j }1in , and outputs a new
 }1in
version of the file copies 
F = { F
i
2) Deletes j from  and outputs  =
{1 , 2 , . . . , j 1 , j +1 , . . . , m1 .}
Fig. 2 shows the changes in the MVT due to dynamic
operations on the copies 
F of a file F = {b j }1 j 8. When
the copies are initially created (Fig. 2a), SN j = BN j
and BV j = 1: 1 j 8. Fig. 2b shows that BV 5 is
incremented by 1 for updating the block at position 5 for
all copies. To insert a new block after position 3 in 
F,
Fig. 2c shows that a new entry 4, 9, 1 is inserted in
the MVT after SN 3 , where 4 is the physical position
of the newly inserted block, 9 is the new logical block

number computed by incrementing the maximum of all


previous logical block numbers, and 1 is the version of the
new block. Deleting a block at position 2 from all copies
requires deleting the table entry at SN 2 and shifting
all subsequent entries one position up (Fig. 2d). Note
that during all dynamic operations, the SN indicates the
actual physical positions of the data blocks in the file
copies 
F.
 Challenge: For challenging the CSP and validating the
integrity and consistency of all copies, the verifier
sends c (# of blocks to be challenged) and two fresh
keys at each challenge: a PRP() key k1 and a PRF()
key k2 . Both the verifier and the CSP use keyed with k1
and the keyed with k2 to generate a set Q = {( j, r j )}
of c pairs of random indices and random values, where
{ j } = k1 (l)1lc and {r j } = k2 (l)1lc . The set
of random indices { j } is the physical positions (serial
numbers SN ) of the blocks to be challenged.
 Response: The CSP runs the Prove algorithm to generate
a set Q = {( j, r j )} of random indices and values,
and provide an evidence that the CSP is still correctly
possessing the n copies in an updated and consistent
state. 
The CSP responds with a proof
P = {, }, where
r
= ( j,r j )Q j j G1 , ik = ( j,r j )Q r j bi j k
Z p , and = {ik } 1in .
1ks

 Verify Response: Upon receiving the proof P = {, }


from the CSP, the verifier runs the Verify algorithm to check the following verification equation:
?
e(,
g) = e

 
s
n
n 


H(I D F ||BN j ||BV j )r j


u k i=1 ik, y (1)
( j,r j )Q

k=1

The verifier uses the set of random indices { j } (generated


from ) and the MVT to get the logical block number
BN j and the block version BV j of each block being
challenged. If the verification equation passes, the Verify
algorithm returns 1, otherwise 0.
One can attempt to slightly modify the MB-PMDDP
scheme to reduce the communication overhead by a
factor of n via allowing the CSP
 to compute and send
= { k }1ks , where k = ni=1 ik . However, this
modification enables the CSP to simply cheat the verifier
as follows:
n
n
n

ik =
r j bi j k =
rj
k =
bi j k
i=1

i=1 ( j,r j )Q

( j,r j )Q

i=1

Thus,
n the CSP can just keep the sectors summation

i=1 bi j k not the sectors themselves. Moreover, the CSP


can corrupt the sectors and the summation is still valid.
Therefore, we require the CSP send = {ik }1in , and
1ks

the summation ni=1 ik is done on the verifier side. The
challenge response protocol in the MB-PMDDP scheme
is summarized in Fig. 3.
Remark 6: The proposed MB-PMDDP scheme supports
public verifiability where anyone, who knows the owners
public key but is not necessarily the data owner, can send

BARSOUM AND HASAN: PMDDP IN CLOUD COMPUTING SYSTEMS

Fig. 3.

491

Challenge response protocol in the MB-PMDDP scheme.

a challenge vector to the CSP and verify the response. Public


verifiability can solve disputes that may occur between the
data owner and the CSP regarding data integrity. If such
a dispute occurs, a trusted third party auditor (TPA) can
determine whether the data integrity is maintained or not.
Since the owners public key is only needed to perform the
verification step, the owner is not required to reveal his secret
key to the TPA. The security analysis of the MB-PMDDP
scheme is given in Appendix B (included in accompanying
supplementary materials).
IV. R EFERENCE M ODEL AND P ERFORMANCE A NALYSIS
A. Reference Model
It is possible to obtain a provable multi-copy dynamic
data possession scheme by extending existing PDP models
for single-copy dynamic data. Such PDP schemes selected
for extension must meet the following conditions: (i) support of full dynamic operations (modify, insert, append, and
delete), (ii) support of public verifiability, (iii) based on pairing
cryptography in creating block tags (homomorphic authenticators); and (iv) block tags are outsourced along with data
blocks to the CSP (i.e., tags are not stored on the local storage

of the data owner). Meeting these conditions allows us to


construct a PDP reference model that has similar features
to the proposed MB-PMDDP scheme. Therefore, we can
establish a fair comparison between the two schemes and
evaluate the performance of our proposed approach.
Below we drive a scheme by extending PDP models, which
are based on authenticated data structures, see [12], [13].
Using Merkle hash trees (MHTs) [27], we construct a
scheme labelled as TB-PMDDP (tree-based provable multicopy dynamic data possession), but it can also be designed
using authenticated skip lists [12] or other authenticated data
structures. The TB-PMDDP is used as a reference model for
comparing the proposed MB-PMDDP scheme.
1) Merkle Hash Tree: An MHT [27] is a binary tree
structure used to efficiently verify the integrity of the data.
The MHT is a tree of hashes where the leaves of the tree are
the hashes of the data blocks. Let h denotes a cryptographic
hash function (e.g., SHA-2), Fig. 4a shows an example of an
MHT used for verifying the integrity of a file F consisting
of 8 blocks. h j = h(b j ) (1 j 8). h A = h(h 1 ||h 2 ),
h B = h(h 3 ||h 4 ), and so on. Finally, h R = h(h E ||h F ) is the
hash of the root node that is used to authenticate the integrity

492

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO. 3, MARCH 2015

TABLE I
N OTATION OF C RYPTOGRAPHIC O PERATIONS

B. Performance Analysis
Fig. 4. Hashing trees for outsourced data. (a) Merkle Hash Tree. (b) Directory
Tree.

of all data blocks. The data blocks {b1 , b2 , . . . , b8 } are stored


on a remote server, and only the authentic value h R is stored
locally on the verifier side. For example, if the verifier requests
to check the integrity of the blocks b2 and b6 , the server
will send these two blocks along with the authentication paths
A2 = {h 1 , h B } and A6 = {h 5 , h D } that are used to reconstruct
the root of the MHT. A j the authentication path of
b j is a set of node siblings (grey-shaded circles) on the path
from h j to the root of the MHT. The verifier uses the received
blocks and the authentication paths to recompute the root in
the following manner. The verifier constructs h 2 = h(b2 ),
h 6 = h(b6 ), h A = h(h 1 ||h 2 ), h C = h(h 5 ||h 6 ),
h E = h(h A ||h B ), h F = h(h C ||h D ), and h R = h(h E ||h F ).
After computing h R , it is compared with the authentic value
stored locally on the verifier side.
The MHT is commonly used to authenticate the values of
the data blocks. In the dynamic behavior of outsourced data,
we need to authenticate both the values and the positions of
the data blocks, i.e., we need an assurance that a specific value
is stored at a specific leaf node. For example, if a data owner
requires to insert a new block after position j , the verifier
needs to make sure that the server has inserted the new block at
the requested position. To validate the positions of the blocks,
the leaf nodes of the MHT are treated in a specific sequence,
e.g., left-to-right sequence [28]. So, the hash of any internal
node = h(left child || right child), e.g., h A = h(h 1 ||h 2 )
=
h(h 2 ||h 1 ). Besides, the authentication path A j is viewed as
an ordered set, and thus any leaf node is uniquely specified
by following the used sequence of constructing the root of
the MHT.
2) Directory MHT for File Copies: In the TB-PMDDP
scheme an MHT is constructed for each file copy, and then
the roots of the individual trees are used to build a hash tree
which we call a directory MHT. The key idea is to make the
root node of each copys MHT as a leaf node in a directory
MHT used to authenticate the integrity of all file copies
in a hierarchical manner. The directory tree is depicted in
Fig. 4b. The verifier can keep only one hash value (metadata)
M = h(I D F ||h D R ), where I D F is a unique file identifier for
a file F, and h D R is the authenticated directory root value that
can be used to periodically check the integrity of all file copies.
Appendix C contains the procedural steps of the derived
TB-PMDDP scheme.

Here we evaluate the performance of the presented schemes:


MB-PMDDP and TB-PMDDP. The file F used in our performance analysis is of size 64MB with 4KB block size. Without
loss of generality, we assume that the desired security level is
128-bit. Thus, we utilize an elliptic curve defined over Galois
field G F( p) with | p| = 256 bits (a point on this curve can be
represented by 257 bits using compressed representation [29]),
and a cryptographic hash of size 256 bits (e.g., SHA-256).
Similar to [5], [10], and [17] the computation cost is
estimated in terms of the used crypto-operations which are
notated in Table I. G indicates a group of points over a suitable
elliptic curve in the bilinear pairing.
Let n, m, and s denote the number of copies, the number
of blocks per copy, and the number of sectors per block,
respectively. Let c denotes the number of blocks to be challenged, and |F| denotes the size of the file copy. Let the
keys used with the PRP and the PRF be of size 128 bits.
Table II presents a theoretical analysis for the setup, storage,
communication, computation, and dynamic operations costs of
the two schemes: MB-PMDDP and TB-PMDDP.
C. Comments
1) System Setup: Table II shows that the setup cost of the
MB-PMDDP scheme is less than that of the TB-PMDDP
scheme. The TB-PMDDP scheme takes some extra cryptographic hash operations to prepare the MHTs for the file copies
to generate the metadata M.
2) Storage Overhead: The storage overhead on the CSP
for the MB-PMDDP scheme is much less than that of the
TB-PMDDP model. Storage overhead is the additional space
used to store some information other than the outsourced
file copies 
F (n|F| bits are used to store 
F). Both schemes
need some additional space to store the aggregated block
tags  = { j }1 j m , where j is a group element that can
be represented by 257 bits. Besides , the TB-PMDDP
scheme needs to store an MHT for each file copy which costs
additional storage space on the cloud servers. The MHTs
can be computed on the fly during the operations of the
TB-PMDDP scheme. This slight modification can reduce the
storage overhead on the remote servers, but it will negatively
affect the overall system performance. The MHTs are needed
through each dynamic operation of the file blocks and through
the verification phase of the system. Thus, being not explicitly
stored on the CSP can influence the system performance.
The CSP storage overhead of the MB-PMDDP scheme is
independent of the number of copies n, while it is linear in n
for the TB-PMDDP scheme. For 20 copies of the file F,

BARSOUM AND HASAN: PMDDP IN CLOUD COMPUTING SYSTEMS

the overheads on the CSP are 0.50MB and 20.50MB for the
MB-PMDDP and TB-PMDDP schemes, respectively
(about 97% reduction). Reducing the storage overhead on the
CSP side is economically a key feature to reduce the fees
paid by the customers.
On the other hand, the MB-PMDDP scheme keeps a mapversion table on the verifier side compared with M (one hash
value) for the TB-PMDDP. An entry of the map-version table
is of size 8 bytes (two integers), and the total number of
entries equals to the number of file blocks. It is important
to note that during implementation the SN is not needed to
be stored in the table; SN is considered to be the entry/table
index (the map-version table is implemented as a linked list).
Moreover, there is only one table for all file copies which
mitigates the storage overhead on the verifier side. The size
of the map-version table for the file F is only 128KB for
unlimited number of copies.
3) Communication Cost: From Table II, the communication cost of the MB-PMDDP scheme is much less
than that of the TB-PMDDP. During the response phase,
the map-based scheme sends one element (257 bits)
and = {ik } 1in , where ik is represented by
1ks

256 bits. On the other hand, the tree-based approach sends


, , {H(bi j )} 1in , Ai j  1in , where each H(bi j ) is
( j,)Q

( j,)Q

represented by 257 bits, and Ai j is an authentication path of


length O(log2 m). Each node along Ai j is a cryptographic hash
of size 256 bits. The response of the MB-PMDDP scheme
for 20 copies of F is 0.078MB, while it is 4.29MB for the
TB-PMDDP (about 98% reduction). The challenge for both
schemes is about 34 bytes.
4) Computation Cost: For the two schemes, the cost is
estimated in terms of the crypto-operations (see Table I)
needed to generate the proof P and check the verification
equation that validates P. As observed from Table II, the cost
expression of the proof for the MB-PMDDP scheme has two
terms linear in the number of copies n, while the TB-PMDDP
scheme has three terms linear in n. Moreover, the MB-PMDDP
scheme contains only one term linear in n in the verification
cost expression, while there are three terms linear in n in the
verification cost expression for the TB-PMDDP scheme. These
terms affect the total computation time when dealing with a
large number of copies in practical applications.
5) Dynamic Operations Cost: Table II also presents the cost
of dynamic operations for both schemes. The communication
cost of the MB-PMDDP scheme due to dynamic operations is
less than that of the TB-PMDDP scheme for the owner sends
a request I D F , BlockOp, j, {bi}1in , j  to the CSP and
receives no information back. During the dynamic operations
of the TB-PMDDP scheme, the owner sends a request to the
CSP and receives the authentication paths which are of order
O(n log2 (m)). The authentication paths for updating 20 copies
of F equals 8.75KB.
The owner in both schemes uses n E K operations to create
the distinct blocks {bi }1in , and (s + 1)n E G + n HG +
(sn + n 1) MG to generate the aggregated tag j (the
delete operation does not require this computations). For the
MB-PMDDP scheme, the owner updates the state (the map-

493

version table) without usage of cryptographic operations (add,


remove, or modify a table entry). On the other hand, updating
the state (MHTs on the CSP and M on the owner) of the
TB-PMDDP scheme costs n HG + (2n log2 (m) + 3n) h S H A to
update the MHTs of the file copies according to the required
dynamic operations, and regenerate the new directory root
that constructs a new M. The experimental results show that
updating the state of the TB-PMDDP scheme has insignificant
effect on the total computation time of the dynamic operations.
V. I MPLEMENTATION AND E XPERIMENTAL E VALUATION
A. Implementation
We have implemented the proposed MB-PMDDP scheme
and the TB-PMDDP reference model on top of Amazon
Elastic Compute Cloud (Amazon EC2) [30] and Amazon
Simple Storage Service (Amazon S3) [31] cloud platforms.
Through Amazon EC2 customers can lunch and manage
Linux/Unix/Windows server instances (virtual servers) in
Amazons infrastructure. The number of EC2 instances can
be automatically scaled up and down according to customers
needs. Amazon S3 is a web storage service to store and
retrieve almost unlimited amount of data. Moreover, it enables
customers to specify geographic locations for storing their
data.
Our implementation of the presented schemes consists
of three modules: OModule (owner module), CModule
(CSP module), and VModule (verifier module). OModule,
which runs on the owner side, is a library that includes
KeyGen, CopyGen, TagGen, and PrepareUpdate algorithms. CModule is a library that runs on Amazon EC2 and
includes ExecuteUpdate and Prove algorithms. VModule is
a library to be run at the verifier side and includes the Verify
algorithm.
In the experiments, we do not consider the system
pre-processing time to prepare the different file copies and
generate the tags set. This pre-processing is done only once
during the life time of the system which may be for tens of
years. Moreover, in the implementation we do not consider
the time to access the file blocks, as the state-of-the-art hard
drive technology allows as much as 1MB to be read in just
few nanoseconds [5]. Hence, the total access time is unlikely
to have substantial impact on the overall system performance.
1) Implementation Settings: A large Amazon EC2
instance is used to run CModule. Through this instance, a
customers gets total memory of size 7.5GB and 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each).
One EC2 Compute Unit provides the equivalent CPU capacity
of a 1.01.2GHz 2007 Opteron or 2007 Xeon processor [32].
The OModule and VModule are executed on a desktop
computer with Intel(R) Xeon(R) 2GHz processor and 3GB
RAM running Windows XP. We outsource copies of a data file
of size 64MB to Amazon S3. Algorithms (encryption, pairing,
hashing, etc.) are implemented using MIRACL library version
5.4.2. For 128-bit security level, the elliptic curve group we
work on has a 256-bit group order. In the experiments, we
utilize the Barreto-Naehrig (BN) [33] curve defined over prime
field G F( p) with | p| = 256 bits and embedding degree = 12

494

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO. 3, MARCH 2015

TABLE II
P ERFORMANCE OF THE MB-PMDDP AND TB-PMDDP S CHEMES

(the BN curve with these parameters is provided by the


MIRACL library).
B. Experimental Evaluation
We compare the presented two schemes from different
perspectives: proof computation times, verification times, and
cost of dynamic operations. It has been reported in [1] that
if the remote server is missing a fraction of the data, then
the number of blocks that needs to be checked in order to
detect server misbehavior with high probability is constant
independent of the total number of file blocks. For example,
if the server deletes 1% of the data file, the verifier only needs
to check for c = 460-randomly chosen blocks of the file so as
to detect this misbehavior with probability larger than 99%.
Therefore, in our experiments, we use c = 460 to achieve a
high probability of assurance.
1) Proof Computation Time: For different number of copies,
Fig. 5a presents the proof computation times (in seconds) to
provide an evidence that the file copies are actually stored
on the cloud servers in an updated, uncorrupted, and consistent state. The timing curve of the MB-PMDDP scheme
is much less than that of the TB-PMDDP. For 20 copies,
the proof computation times for the MB-PMDDP and the
TB-PMDDP schemes are 1.51 and 5.58 seconds, respectively
( 73% reduction in the computation time). As observed from
Fig. 5a, the timing curve of the TB-PMDDP scheme grows
with increasing number of copies at a rate higher than that of
the MB-PMDDP. That is because the proof cost expression of
the TB-PMDDP scheme contains more terms which are linear
in the number of copies n (Table II).
2) Verification Time: Fig. 5b presents the verification times
(in seconds) to check the responses/proofs received from the
CSP. The MB-PMDDP scheme has verification times less than
that of the TB-PMDDP scheme. For 20 copies, the verification
times for the MB-PMDDP and the TB-PMDDP schemes are
1.58 and 3.13 seconds, respectively (about 49% reduction in

Fig. 5. Computation costs of the MB-PMDDP and TB-PMDDP schemes.


(a) CSP computation times (sec). (b) Verifier computation times (sec).

TABLE III
O WNER C OMPUTATION T IMES (S EC ) D UE TO DYNAMIC
O PERATIONS ON A S INGLE B LOCK

the verification time). The verification timing curve of the


MB-PMDDP scheme is almost constant. There is a very
small increase in the verification time with increasing number
of copies. This is due to the fact that although the term
s(n 1)AZ p in the verification cost of the MB-PMDDP
scheme is linear in n (Table II), in our experiments its
numerical value is quite small compared to those of the
other terms in the cost expression. This feature makes the the
MB-PMDDP scheme computationally cost-effective and more
efficient when verifying a large number of file copies.
3) Dynamic Operations Cost: For different number of
copies, Table III presents the computation times (in seconds)
on the owner side of the two schemes due to dynamic opera-

BARSOUM AND HASAN: PMDDP IN CLOUD COMPUTING SYSTEMS

495

Algorithm 1 BS( List, List, start, end)


begin
len (endstart)+1
/* The list length */
if len = 1 then
List[start]
{k }1ks List[start][k]
?
e(,

 g) =

e( ( j,r j )Q H(I D F ||BN j ||BV j )r j sk=1 u k k , y)


if NOT verified then
invalidList.Add(start)
end
else

len
i=1 List[start+i 1]
{ik }1ilen List[start+i 1][k]
1ks
?

e(, g) =
s
len



e([
H(I D F ||BN j ||BV j )r j ]len
u k i=1 ik, y)
( j,r j )Q

k=1

if NOT verified then


/* work with the left and right halves of
List and List */
mid (start+end)/2 /* List middle */
BS( List, List, start, mid) /* Left part */
BS( List, List, mid+1, end) /* Right */
end
end
end

tions on a single block. The owner computation times for both


schemes are approximately equal. The slight increase of the
TB-PMDDP scheme is due to some additional hash operations
required to regenerate a new directory root that constructs
a new M (Table II). As noted, the computation overhead
on the owner side is practical. It takes about 5 seconds
to modify/insert/append a block of size 4KB on 20 copies
(< 1 minute for 200 copies). In the experiments, we use
only one desktop computer to accomplish the organization (data owner) work. In practice during updating the
outsourced copies, the owner may choose to split the work
among a few devices inside the organization or use a single
device with a multi-core processor which is becoming prevalent these days, and thus the computation time on the owner
side is significantly reduced in many applications.
VI. I DENTIFYING C ORRUPTED C OPIES
Here, we show how the proposed MB-PMDDP scheme can
be slightly modified to identify the indices of corrupted copies.
The proof P = {, } generated by the CSP will be valid
and will pass the verification equation (1) only if all copies
are intact and consistent. Thus, when there is one or more
corrupted copies, the whole auditing procedure fails. To handle
this situation and identify the corrupted copies, a slightly
modified version of the MB-PMDDP scheme can be used.
In this version, the data owner generates a tag i j for each
block b i j , but does not aggregate the tags for the blocks at

Fig. 6.

Verification times with different percentages of corrupted copies.

the same indices in each copy, i.e.,  = {i j } 1in . During


1 j m

the response phase, the CSP computes = {ik }1in as


1ks


before, but = ( j,r j )Q [ ni=1 i j ]r j G1 . Upon receiving
the proof P = {, }, the verifier first validates P using
equation (1). If the verification fails, the 
verifier asks the CSP
rj
to send = {i }1in , where i =
( j,r j )Q i j . Thus,
the verifier has two lists List = {i }1in and List =
{ik }1in (List is a two dimensional list).
1ks

Utilizing a recursive divide-and-conquer approach


(binary search) [34], the verifier can identify the indices
of corrupted copies. Specifically, List and List are
divided into halves: List ( Left: Right), and List
(Left:Right). The verification equation (1) is applied
recursively on Left with Left and Right with Right.
Note that the individual tags in Left or Right are
aggregated via multiplication to generate one that is
used during the recursive application of equation (1). The
procedural steps of identifying the indices of corrupted copies
are indicated in Algorithm 1.
The BS (binary search) algorithm takes four parameters:
List, List, start that indicates the start index of the currently
working lists, and end to indicate the last index of these lists.
The initial call to the BS algorithm takes ( List, List, 1, n).
The invalid indices are stored in invalidList (a global data
structure).
This slight modification to identify the corrupted copies will
be associated with some extra storage overhead on the cloud
servers, where the CSP has to store mn tags for the file copies

F (m tags in the original version). Moreover, the challengeresponse phase may be done in two rounds if the initial round
to verify all copies fails.
We design experiments (using same file/parameters from
Section V) to show the effect of identifying the corrupted
copies on the verification time. We generate 100 copies,
which are verified in 1.584 seconds when all copies are
accurate. A percentage ranging from 1% to 20% of the
file copies is randomly corrupt. Fig. 6 shows the verification time (in seconds) with different corrupted percentages.
The verification time is about 20.58 seconds when 1% of

496

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO. 3, MARCH 2015

the copies are invalid. As observed from Fig. 6, when the


percentages of corrupted copies are up to 15% of the
total copies, the performance of using the BS algorithm in
the verification is more efficient than individual verification
for each copy. It takes about 1.58 seconds to verify one
copy, and thus individual verifications of 100 copies requires
100 1.58 = 158 seconds.
In short, the proposed scheme can be slightly modified
to support the feature of identifying the corrupted copies at
the cost of some extra storage/communication/computation
overheads. For the CSP to remain in business and maintain
a good reputation, invalid responses to verifiers challenges
are sent in very rare situations, and thus the original version
of the proposed scheme is used in most of the time.
VII. S UMMARY AND C ONCLUDING R EMARKS
Outsourcing data to remote servers has become a growing
trend for many organizations to alleviate the burden of local
data storage and maintenance. In this work we have studied
the problem of creating multiple copies of dynamic data
file and verifying those copies stored on untrusted cloud
servers.
We have proposed a new PDP scheme (referred to as
MB-PMDDP), which supports outsourcing of multi-copy
dynamic data, where the data owner is capable of not only
archiving and accessing the data copies stored by the CSP, but
also updating and scaling these copies on the remote servers.
To the best of our knowledge, the proposed scheme is the first
to address multiple copies of dynamic data. The interaction
between the authorized users and the CSP is considered in
our scheme, where the authorized users can seamlessly access
a data copy received from the CSP using a single secret
key shared with the data owner. Moreover, the proposed
scheme supports public verifiability, enables arbitrary number
of auditing, and allows possession-free verification where the
verifier has the ability to verify the data integrity even though
he neither possesses nor retrieves the file blocks from the
server.
Through performance analysis and experimental results, we
have demonstrated that the proposed MB-PMDDP scheme
outperforms the TB-PMDDP approach derived from a class
of dynamic single-copy PDP models. The TB-PMDDP leads
to high storage overhead on the remote servers and high
computations on both the CSP and the verifier sides. The
MB-PMDDP scheme significantly reduces the computation
time during the challenge-response phase which makes it more
practical for applications where a large number of verifiers are
connected to the CSP causing a huge computation overhead
on the servers. Besides, it has lower storage overhead on the
CSP, and thus reduces the fees paid by the cloud customers.
The dynamic block operations of the map-based approach are
done with less communication cost than that of the tree-based
approach.
A slight modification can be done on the proposed scheme
to support the feature of identifying the indices of corrupted
copies. The corrupted data copy can be reconstructed even
from a complete damage using duplicated copies on other

servers. Through security analysis, we have shown that the


proposed scheme is provably secure.
R EFERENCES
[1] G. Ateniese et al., Provable data possession at untrusted stores, in
Proc. 14th ACM Conf. Comput. Commun. Secur. (CCS), New York, NY,
USA, 2007, pp. 598609.
[2] K. Zeng, Publicly verifiable remote data integrity, in Proc. 10th Int.
Conf. Inf. Commun. Secur. (ICICS), 2008, pp. 419434.
[3] Y. Deswarte, J.-J. Quisquater, and A. Sadane, Remote integrity
checking, in Proc. 6th Working Conf. Integr. Internal Control Inf.
Syst. (IICIS), 2003, pp. 111.
[4] D. L. G. Filho and P. S. L. M. Barreto, Demonstrating
data possession and uncheatable data transfer, IACR (International Association for Cryptologic Research) ePrint Archive,
Tech. Rep. 2006/150, 2006.
[5] F. Seb, J. Domingo-Ferrer, A. Martinez-Balleste, Y. Deswarte, and
J.-J. Quisquater, Efficient remote data possession checking in critical
information infrastructures, IEEE Trans. Knowl. Data Eng., vol. 20,
no. 8, pp. 10341038, Aug. 2008.
[6] P. Golle, S. Jarecki, and I. Mironov, Cryptographic primitives enforcing
communication and storage complexity, in Proc. 6th Int. Conf. Financial Cryptograph. (FC), Berlin, Germany, 2003, pp. 120135.
[7] M. A. Shah, M. Baker, J. C. Mogul, and R. Swaminathan, Auditing to
keep online storage services honest, in Proc. 11th USENIX Workshop
Hot Topics Oper. Syst. (HOTOS), Berkeley, CA, USA, 2007, pp. 16.
[8] M. A. Shah, R. Swaminathan, and M. Baker, Privacy-preserving audit
and extraction of digital contents, IACR Cryptology ePrint Archive,
Tech. Rep. 2008/186, 2008.
[9] E. Mykletun, M. Narasimha, and G. Tsudik, Authentication and
integrity in outsourced databases, ACM Trans. Storage, vol. 2, no. 2,
pp. 107138, 2006.
[10] G. Ateniese, R. D. Pietro, L. V. Mancini, and G. Tsudik, Scalable
and efficient provable data possession, in Proc. 4th Int. Conf. Secur.
Privacy Commun. Netw. (SecureComm), New York, NY, USA, 2008,
Art. ID 9.
[11] C. Wang, Q. Wang, K. Ren, and W. Lou. (2009). Ensuring data storage
security in cloud computing, IACR Cryptology ePrint Archive, Tech.
Rep. 2009/081. [Online]. Available: http://eprint.iacr.org/
[12] C. Erway, A. Kp, C. Papamanthou, and R. Tamassia, Dynamic
provable data possession, in Proc. 16th ACM Conf. Comput. Commun.
Secur. (CCS), New York, NY, USA, 2009, pp. 213222.
[13] Q. Wang, C. Wang, J. Li, K. Ren, and W. Lou, Enabling public
verifiability and data dynamics for storage security in cloud computing,
in Proc. 14th Eur. Symp. Res. Comput. Secur. (ESORICS), Berlin,
Germany, 2009, pp. 355370.
[14] Z. Hao, S. Zhong, and N. Yu, A privacy-preserving remote data integrity
checking protocol with data dynamics and public verifiability, IEEE
Trans. Knowl. Data Eng., vol. 23, no. 9, pp. 14321437, Sep. 2011.
[15] A. F. Barsoum and M. A. Hasan. (2010). Provable possession and
replication of data over cloud servers, Centre Appl. Cryptograph. Res.,
Univ. Waterloo, Waterloo, ON, USA, Tech. Rep. 2010/32.
[Online]. Available: http://www.cacr.math.uwaterloo.ca/techreports/2010/
cacr2010-32.pdf
[16] R. Curtmola, O. Khan, R. Burns, and G. Ateniese, MR-PDP: Multiplereplica provable data possession, in Proc. 28th IEEE ICDCS, Jun. 2008,
pp. 411420.
[17] Z. Hao and N. Yu, A multiple-replica remote data possession checking
protocol with public verifiability, in Proc. 2nd Int. Symp. Data, Privacy,
E-Commerce, Sep. 2010, pp. 8489.
[18] H. Shacham and B. Waters, Compact proofs of retrievability, in Proc.
14th Int. Conf. Theory Appl. Cryptol. Inf. Secur., 2008, pp. 90107.
[19] A. Juels and B. S. Kaliski, Jr., Pors: Proofs of retrievability for large
files, in Proc. 14th ACM Conf. Comput. Commun. Secur. (CCS), 2007,
pp. 584597.
[20] R. Curtmola, O. Khan, and R. Burns, Robust remote data checking,
in Proc. 4th ACM Int. Workshop Storage Secur. Survivability, 2008,
pp. 6368.
[21] K. D. Bowers, A. Juels, and A. Oprea, Proofs of retrievability:
Theory and implementation, in Proc. ACM Workshop Cloud Comput.
Secur. (CCSW), 2009, pp. 4354.
[22] Y. Dodis, S. Vadhan, and D. Wichs, Proofs of retrievability via hardness
amplification, in Proc. 6th Theory Cryptograph. Conf. (TCC), 2009,
pp. 109127.

BARSOUM AND HASAN: PMDDP IN CLOUD COMPUTING SYSTEMS

[23] K. D. Bowers, A. Juels, and A. Oprea, HAIL: A high-availability and


integrity layer for cloud storage, in Proc. 16th ACM Conf. Comput.
Commun. Secur. (CCS), New York, NY, USA, 2009, pp. 187198.
[24] C. E. Shannon, Communication theory of secrecy systems, Bell Syst.
Tech. J., vol. 28, no. 4, pp. 656715, 1949.
[25] D. Boneh, B. Lynn, and H. Shacham, Short signatures from the
Weil pairing, in Proc. 7th Int. Conf. Theory Appl. Cryptol. Inf.
Secur. (ASIACRYPT), London, U.K., 2001, pp. 514532.
[26] G. Ateniese, S. Kamara, and J. Katz, Proofs of storage from homomorphic identification protocols, in Proc. 15th Int. Conf. Theory Appl.
Cryptol. Inf. Secur. (ASIACRYPT), Berlin, Germany, 2009, pp. 319333.
[27] R. C. Merkle, Protocols for public key cryptosystems, in Proc. IEEE
Symp. Secur. Privacy, Apr. 1980, p. 122.
[28] C. Martel, G. Nuckolls, P. Devanbu, M. Gertz, A. Kwong, and
S. G. Stubblebine, A general model for authenticated data structures,
Algorithmica, vol. 39, no. 1, pp. 2141, Jan. 2004.
[29] P. S. L. M. Barreto and M. Naehrig, Pairing-Friendly Elliptic Curves
of Prime Order With Embedding Degree 12, IEEE Standard P1363.3,
2006.
[30] Amazon Elastic Compute Cloud (Amazon EC2). [Online]. Available:
http://aws.amazon.com/ec2/, accessed Aug. 2013.
[31] Amazon Simple Storage Service (Amazon S3). [Online]. Available:
http://aws.amazon.com/s3/, accessed Aug. 2013.
[32] Amazon
EC2
Instance
Types.
[Online].
Available:
http://aws.amazon.com/ec2/, accessed Aug. 2013.
[33] P. S. L. M. Barreto and M. Naehrig, Pairing-friendly elliptic curves of
prime order, in Proc. 12th Int. Workshop SAC, 2005, pp. 319331.
[34] A. L. Ferrara, M. Green, S. Hohenberger, and M. . Pedersen, Practical
short signature batch verification, in Proc. Cryptograph. Track RSA
Conf., 2009, pp. 309324.
[35] A. F. Barsoum and M. A. Hasan. (2011). On verifying dynamic multiple
data copies over cloud servers, IACR Cryptology ePrint Archive,
Tech. Rep. 2011/447. [Online]. Available: http://eprint.iacr.org/
[36] Y. Zhu, H. Wang, Z. Hu, G.-J. Ahn, H. Hu, and S. S. Yau, Efficient
provable data possession for hybrid clouds, in Proc. 17th ACM Conf.
Comput. Commun. Secur. (CCS), 2010, pp. 756758.

497

Ayad F. Barsoum is currently an Assistant


Professor with the Department of Computer
Science, St. Marys University, San Antonio,
TX, USA. He received the Ph.D. degree from
the Department of Electrical and Computer
Engineering, University of Waterloo (UW),
Waterloo, ON, Canada, in 2013, where he is a
member of the Centre for Applied Cryptographic
Research.
He has received the Graduate Research
Studentship, the International Doctoral Award, and
the University of Waterloo Graduate Scholarship at UW. He received the
B.Sc. and M.Sc. degrees in computer science from Ain Shams University,
Cairo, Egypt.

M. Anwar Hasan received the B.Sc. degree


in electrical and electronic engineering and the
M.Sc. degree in computer engineering from
the Bangladesh University of Engineering and
Technology, Dhaka, Bangladesh, in 1986 and 1988,
respectively, and the Ph.D. degree in electrical engineering from the University of Victoria, Victoria,
BC, Canada, in 1992.
He joined the Department of Electrical and Computer Engineering, University of Waterloo (UW),
Waterloo, ON, Canada, in 1993, where he has been a
Full Professor since 2002. At UW, he is currently a member of the Centre for
Applied Cryptographic Research, the Center for Wireless Communications,
and the VLSI Research Group.

You might also like