0% found this document useful (0 votes)
88 views

A Secure and Dynamic Multi-Keyword Ranked Search Scheme Over Encrypted Cloud Data

This document proposes a secure multi-keyword ranked search scheme over encrypted cloud data that supports dynamic operations like deletion and insertion of documents. It constructs a special tree-based index structure and uses a "Greedy Depth-first Search" algorithm to provide efficient multi-keyword ranked search in sub-linear time. The scheme utilizes secure kNN algorithm to encrypt index and query vectors while ensuring accurate relevance score calculation between encrypted vectors. It adds phantom terms to blind search results and resist statistical attacks.

Uploaded by

m.muthu lakshmi
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views

A Secure and Dynamic Multi-Keyword Ranked Search Scheme Over Encrypted Cloud Data

This document proposes a secure multi-keyword ranked search scheme over encrypted cloud data that supports dynamic operations like deletion and insertion of documents. It constructs a special tree-based index structure and uses a "Greedy Depth-first Search" algorithm to provide efficient multi-keyword ranked search in sub-linear time. The scheme utilizes secure kNN algorithm to encrypt index and query vectors while ensuring accurate relevance score calculation between encrypted vectors. It adds phantom terms to blind search results and resist statistical attacks.

Uploaded by

m.muthu lakshmi
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 32

A SECURE AND DYNAMIC MULTI-KEYWORD

RANKED SEARCH SCHEME OVER ENCRYPTED


CLOUD DATA.

ABSTRACT:
Due to the increasing popularity of cloud computing, more and more
data owners are motivated to outsource their data to cloud servers for
great convenience and reduced cost in data management. However,
sensitive data should be encrypted before outsourcing for privacy
requirements, which obsoletes data utilization like keyword-based
document retrieval. In this paper, we present a secure multi-keyword
ranked search scheme over encrypted cloud data, which simultaneously
supports dynamic update operations like deletion and insertion of
documents. Specifically, the vector space model and the widely-used TF
x IDF model are combined in the index construction and query
generation. We construct a special tree-based index structure and
propose a Greedy Depth-first Search algorithm to provide efficient
multi-keyword ranked search. The secure kNN algorithm is utilized to
encrypt the index and query vectors, and meanwhile ensure accurate
relevance score calculation between encrypted index and query vectors.
In order to resist statistical attacks, phantom terms are added to the index
vector for blinding search results. Due to the use of our special tree-

based index structure, the proposed scheme can achieve sub-linear


search time and deal with the deletion and insertion of documents
flexibly. Extensive experiments are conducted to demonstrate the
efficiency of the proposed scheme.

INTRODUCTION:
CLOUD computing has been considered as a new model of enterprise IT
infrastructure, which can organize huge resource of computing, storage
and applications, and enable users to enjoy ubiquitous, convenient and
on-demand network access to a shared pool of configurable computing
resources with great efficiency and minimal economic overhead .
Attracted by these appealing features, both individuals and enterprises
are motivated to outsource their data to the cloud, instead of purchasing
software and hardware to manage the data themselves. Despite of the
various advantages of cloud services, outsourcing sensitive information
(such as e-mails, personal health records, company finance data,
government documents, etc.) to remote servers brings privacy concerns.
The cloud service providers (CSPs) that keep the data for users may
access users sensitive information without authorization. A general
approach to protect the data confidentiality is to encrypt the data before
out sourcing. However, this will cause a huge cost in terms of data
usability. For example, the existing techniques on keyword-based

information retrieval, which

are widely used on the plaintext data,

cannot be directly applied on the encrypted data. Downloading all the


data from the cloud and decrypt locally is obviously impractical. In
order to address the above problem, researchers have designed some
general-purpose solutions with fully-homo morphic encryption or
oblivious RAMs. However, these methods are not practical due to their
high computational overhead for both the cloud sever and user. On the
contrary, more practical special purpose solutions, such as searchable
encryption (SE) schemes have made specific contributions in terms of
efficiency, functionality and security. Searchable encryption schemes
enable the client to store the encrypted data to the cloud and execute
keyword search over cipher text domain. So far, abundant works have
been proposed under different threat models to achieve various search
functionality, such as single keyword search, similarity search, multikeyword Boolean search, ranked search, multi-keyword ranked search,
etc. Among them, multi key word ranked search achieves more and more
attention for its practical applicability. Recently, some dynamic schemes
have been proposed to support inserting and deleting operations on
document collection. These are significant works as it is highly possible
that the data owners need to update their data on the cloud server .But
few of the dynamic schemes support efficient multi keyword ranked
search .This paper proposes a secure tree-based search scheme over the
encrypted cloud data, which supports multi key word ranked search and

dynamic operation on the document collection. Specifically, the vector


space model and the widely-used term frequency inverse document
frequency model are combined in the index construction and query
generation to provide multi key word ranked search. In order to obtain
high search Efficiency, we construct a tree-based index structure and
propose a Greedy Depth-first Search algorithm based on this index
tree. Due to the special structure of our tree-based index, the proposed
search scheme can flexibly achieve sub-linear search time and deal with
the deletion and insertion of documents. The secure kNN algorithm is
utilized to encrypt the index and query vectors, and meanwhile ensure
accurate relevance score calculation between encrypted index and query
vectors. To resist different attacks in different threat models, we
construct two secure search schemes: the basic dynamic multi-keyword
ranked search (BDMRS) scheme in the known cipher text model, and
the enhanced dynamic multi-keyword ranked search (EDMRS) scheme
in the known background model. Our contributions are summarized as
follows:
1) We design a searchable encryption scheme that supports both the
accurate multi-keyword ranked search and flexible dynamic operation
on document collection.
2) Due to the special structure of our tree-based index, the search
complexity of the proposed scheme is fundamentally kept to
logarithmic. And in practice, the proposed scheme can achieve higher

search efficiency by executing our Greedy Depth-first Search


algorithm. Moreover, parallel search can be flexibly performed to further
reduce the time cost of search process. The reminder of this paper is
organized as follows. Related work is discussed in Section 2, and
Section 3gives a brief introduction to the system model, threat model,
the design goals, and the preliminaries. Section4 describes the schemes
in detail. Section 5 presents the experiments and performance analysis.
And Section 6 covers the conclusion.

WORKING MODULE:
Searchable encryption schemes enable the clients to store the encrypted
data to the cloud and execute keyword search over cipher text domain.
Due to different cryptography primitives, searchable encryption schemes
can be constructed using public key based cryptography or symmetric
key based cryptography Song et al. proposed the first symmetric
searchable encryption (SSE) scheme, and the search time of their
scheme is linear to the size of the data collection. Goh proposed formal
security definitions for SSE and designed a scheme based on Bloom
filter. The search time of Gohs scheme is O (n), where n is the
cardinality of the document collection. Curtmola et al. proposed two
schemes which achieve the optimal search time. Their SSE-1 scheme is
secure against chosen-keyword attacks and SSE-2 I secure against

adaptive chosen-keyword attacks. These early works are single keyword


Boolean search schemes, which are very simple in terms of
functionality. Afterward, abundant works have been proposed under
different threat models to achieve various search functionality, such as
single keyword search, similarity search multi-keyword Boolean search
ranked search and multi-keyword ranked search etc. Multi-keyword
Boolean search allows the users to input multiple query keywords to
request suitable documents. Among these works, conjunctive keyword
search schemes only return the documents that contain all of the query
keywords. Disjunctive keyword search schemes return all of the
documents that contain a subset of the query keywords. Predicate search
schemes are proposed to support both conjunctive and disjunctive
search. All these multi key word search schemes retrieve search results
based on the existence of keywords, which cannot provide acceptable
result ranking functionality. Ranked search can enable quick search of
the most relevant data. Sending back only the top-k most relevant
documents can effectively decrease network traffic. Some early works
have realized the ranked search using order-preserving techniques, but
they are designed only for single keyword search. Cao et al realized the
first privacy-preserving multi-key word ranked search scheme, in which
documents and queries are represented as vectors of dictionary size.
With the coordinate matching, the documents are ranked according to
the number of matched query keywords. However, Cao et al.s scheme

does not consider the importance of the different keywords, and thus is
not accurate enough. In addition, the search efficiency of the scheme is
linear with the cardinality of document collection. Sun et al. presented a
secure multi-key word search scheme that supports similarity-based
ranking. The authors constructed a searchable index tree based on vector
space model and adopted cosine measure together with TFIDF to
provide ranking results. Sun etal. s search algorithm achieves betterthan-linear search efficiency but results in precision loss. O rencik et al.
proposed a secure multi-keyword search method which utilized local
sensitive hash (LSH) functions to cluster the similar documents. The
LSH algorithm is suitable or similar search but cannot provide exact
ranking. In Zhang et al. proposed a scheme to deal with secure multikeyword ranked search in a multi-owner model. In this scheme, different
data owners use different secret keys to encrypt their documents and
keywords while authorized data users can query without knowing keys
of these different data owners. The authors proposed an Additive Order
Preserving Function to retrieve the most relevant search results.
However, these works dont support dynamic operations. Practically, the
data owner may need to update the document collection after he up load
the collection to the cloud server. Thus, the SE schemes are expected to
support the insertion and deletion of the documents. There are also
several dynamic searchable encryption schemes .In the work of Song et
al. , the each document is considered as a sequence of fixed length

words, and is individually indexed. This scheme supports straight


forward update operations but with low efficiency. Goh proposed a
scheme to generate a sub-index for every document based on keywords.
Then the dynamic operations can be easily realized through Updating of
a Bloom filter along with the corresponding document. However, Gohs
scheme has linear search time and suffers from false positives. In 2012,
Kamara etal. Constructed an encrypted inverted index that can handle
dynamic data efficiently. But, this scheme is very complex to implement.
Subsequently, as and improvement, Kamara et al. proposed a new search
scheme based on tree-based index, which can handle dynamic update on
document data stored in leaf nodes. However, their scheme is designed
only for single keyword Boolean search. In Cash et al. presented a data
structure for keyword/identity tuple named T Set. Then a document
can be represented by a series of independent T-Sets. Based on this
structure, Cash et al. proposed a dynamic searchable encryption scheme.
In their construction, newly added tuples are stored in another database
in the cloud, and deleted tuples are recorded in a revocation list. The
final search result is achieved through excluding tuples in the revocation
list from the ones retrieved from original and newly addeduples. Yet,
Cash et al.s dynamic search scheme doesnt realize the multi-keyword
ranked search functionality.

DATA OWNER, DATA USER AND CLIENT SERVER:

The system model in this paper involves three different entities: data
owner, data user and cloud server, as illustrated.

DATA OWNER:
Has a collection of documents F = {f1; f2: fn} that he wants to outsource
to the cloud server in encrypted form while still keeping the capability to
search on them for effective utilization. In our Scheme, the data owner
firstly builds a secure search able tree index I from document collection
F, and then generates an encrypted document collection C for F.
Afterwards, the data owner outsources the encrypted Collection C and
the secure index I to the cloud server, and securely distributes the key
information of trap do or generation and document decryption to the
authorized data users. Besides, the data owner is responsible for the
update operation of his documents stored in the cloud server. While

updating, the data owner generates the update information locally and
sends it to the server.

DATA USERS:
Are authorized ones to access the documents of data owner. With t query
keywords, the authorized user can generate a trapdoor TD according to
search control mechanisms to fetch k encrypted documents
from cloud server. Then, the data user can decrypt the documents with
the shared secret key. Cloud server stores the encrypted document
collection C and the encrypted searchable tree index I for data owner.
Upon receiving the trapdoor TD from the data user, the cloud server
executes search over the index tree I, and finally returns the
corresponding collection of top-k ranked encrypted documents. Besides,
upon receiving the update information from the data owner, the server
needs to update the index I and document collection C according to the
received information .The cloud server in the proposed scheme is
considered as honest-but-curious, which is employed by lots of works
on secure cloud data search. Specifically, the cloud server honestly and
correctly executes instructions in the designated protocol. Meanwhile, it
is curious to infer and analyze received data, which helps it acquire

additional information. Depending on what information the cloud server


knows, we adopt the two threat models proposed by Cao et al.

CIPHER TEXT MODEL:


In this model, the cloud server only knows the encrypted document
collection C, the searchable index tree I, and the search trap door TD
submitted by the authorized user. That is to say, the Cloud server can
conduct cipher text-only attack (COA) in this model.
BACKGROUND MODEL:
Compared with known cipher text model, the cloud server in this
stronger model is equipped with more knowledge, such as the term
frequency (TF) statistics of the document collection. This statistical
information records how many documents are there for each term
frequency of a specific keyword in the whole document collection, as
which could be used as the keyword identity. Equipped with such
statistical information, the cloud server can conduct TF statistical attack
to deduce or even identify certain Key words through analyzing
histogram and value range of the corresponding frequency distributions.

PROPOSED SCHEME:

To enable secure, efficient, accurate and dynamic multi key word ranked
search over outsourced encrypted cloud data under the above models,
our system has the following design goals.

DYNAMIC:
The proposed scheme is designed to provide not only multi-keyword
query and accurate result ranking, but also dynamic update on document
collections.
SEARCH EFFICIENCY:
The scheme aims to achieve sub linear search efficiency by exploring a
special tree-based index and an efficient search algorithm.
PRIVACY-PRESERVING:
The scheme is designed to prevent the cloud server from learning
additional information about the document collection, the index tree, and
the query. The specific privacy requirements are summarized as follows,
INDEX CONFIDENTIALITY AND QUERY CONFIDENTIALITY:
The underlying plaintext information, including key words in the index
and query, TF values of key words stored in the index, and IDF values of
query keywords, should be protected from cloud server;
TRAP DOOR UNLINKS ABILITY:

The cloud server should not be able to determine whether two encrypted
queries (trapdoors) are generated from the same search request;
KEYWORD PRIVACY:
The cloud server could not identify the specific keyword in query, index
or document collection by analyzing the statistical information like term
frequency. Note that our proposed scheme is not designed to protect
access pattern, i.e., the sequence of returned documents.

DESIGN GOALS:
In this section, we firstly describe the unencrypted dynamic multi-key
word ranked search (UDMRS) scheme which is constructed on the basis
of vector space model and KBB tree. Based on the UDMRS scheme,
two secure search schemes (BDMRS and EDMRS schemes) are
constructed against two threat models, respectively.

INDEX CONSTRUCTION OF UDMRS SCHEME:


We have briefly introduced the KBB index tree structure, which assists
us in introducing the index construction. In the process of index
construction, we first generate a tree node for each document in the
collection. These nodes are the leaf nodes of the index tree. Then, the
internal tree nodes are generated based on these leaf nodes. The formal
construction process of the index is presented in Algorithm 1. An
example of our index tree is shown in Fig. 3. Note that the index tree T

built here is a plaintext. Following are some notations for Algorithm 1.


Besides, the data structure of the tree node is defined as ID;D; Pl; Pr;
FID, where the unique identity ID for each tree node is generated
through the function Gen ID().
CURRENT NODE SET The set of current processing nodes which
have no parents. If the number of nodes is even, the cardinality of the set
is denoted as 2h(h Z+), else the cardinality is denoted as (2h + 1).
TEMP NODE SET The set of the newly generated nodes. In the
index, if Du[i] = 0 for an internal node u, there is at least one path from
the node u to some leaf, which indicates a document containing the
keyword wi. In addition, Du[i] always stores the biggest normalized TF
value of wi among its child nodes. Thus, the possible largest relevance
score of its children can be easily estimated.
SEARCH PROCESS OF UDMRS SCHEME:
The search process of the UDMRS scheme is a recursive procedure upon
the tree, named as Greedy Depth first Search (GDFS) algorithm. We
construct a result list denoted as R List, whose element is defined as
R Score; FID. Here, the R Score is the relevance score of the document
fFID to the query, which is calculated according to Formula (1). The R
List stores the k accessed documents with the largest relevance scores to
the query. The elements of the list are ranked in descending order
according to the R Score, and will be updated timely during the search

process. Following are some other notations, and the GDFS algorithm is
described in Algorithm
R Score(Du;Q) The function to calculate the relevance score for
query vector Q and index vector Du stored in node u, which is defined in
Formula (1).
Kth score The smallest relevance score in current R List, which is
initialized as 0.
h child The child node of a tree node with higher relevance score.
l child The child node of a tree node with lower relevance score.
Since the possible largest relevance score of documents rooted by the
node u can be predicted, only a part of the nodes in the tree are accessed
during the search process shows an example of search process
with the document collection F = {fi|i = 1;:::; 6}, cardinality of the
dictionary m = 4, and query vector Q = (0; 0:92; 0; 0:38).
BDMRS SCHEME:
Based on the UDMRS scheme, we construct the basic dynamic multikeyword ranked search (BDMRS) scheme by using the secure kNN
algorithm [38]. The BDMRS scheme is designed to achieve the goal of
privacy preserving in the known cipher text model, and the four
algorithms included are described as follows:

SK Setup() Initially, the data owner generates the secret key set SK,
including 1) a randomly generated m-bit vector S where m is equal to the
cardinality of dictionary, and 2) two (mm) invertible matrices
M1 and M2. Namely, SK = {S;M1;M2}.
I Gen Index (F; SK) First, the unencrypted index tree T is built on
F
Build Index Tree (F). Secondly, the data owner generates two random
vectors {Du; Du} for index vector Du in each node u, according to the
secret vector S. Specifically, if S[i] = 0, Du[i] and Du[i] will be set
equal to Du[i]; if S[i] = 1, Du[i] and Du[i]will be set as two random
values whose sum equals to Du[i]. Finally, the encrypted index tree I is
built where the node u stores two encrypted index vectors Iu = {MT1 Du
; MT2 Du}.
TD Gen Trap door (Wq; SK) With keyword set Wq, the
unencrypted query vector Q with length of m is generated. If wi Wq,
Q[i] stores the normalized IDF value of wi; else Q[i] is set to 0.
Similarly, the query vector Q is split into two random vectors Q and Q.
The difference is that if S[i] = 0, Q[i] and Q[i] are set to two random
values whose sum
equals to Q[i]; else Q[i] and Q[i] are set as the same as Q[i]. Finally,
the algorithm returns the trapdoor TD = {M11 Q; M12 Q}.
Relevance Score SR Score (Iu;TD) With the trapdoor TD, the cloud
server computes the relevance score of node u in the index tree I to the

query. Note that the relevance score calculated from encrypted vectors is
equal to that from unencrypted vectors as follows

SECURITY ANALYSIS.
We analyze the BDMRS scheme according to the three predefined
privacy requirements in the design goals:
INDEX

CONFIDENTIALITY

AND

QUERY

CONFIDENTIALITY:
In the proposed BDMRS scheme, Iu and TD are an obfuscated vector,
which means the cloud server cannot infer the original vectors Du and Q
without the secret key set SK. The secret keys M1 and M2 are Gaussian
random matrices. According to , the attacker (cloud server) of COA
cannot calculate the matrices merely with cipher text. Thus, the BDMRS
scheme is resilient against cipher text-only attack (COA) and the index
confidentiality and the query confidentiality are well protected.
QUERY UNLINKS ABILITY:
The trapdoor of query vector is generated from a random splitting
operation, which means that the same search requests will be
transformed into different query trapdoors, and thus the query un link
ability is protected. However, the cloud server is able to link the same
search requests according to the same visited path and the same
relevance scores.

KEYWORD PRIVACY:
In this scheme, the confidentiality of the index and query are well
protected that the original vectors are kept from the cloud server. And
the search process merely introduces inner product computing of
encrypted vectors, which leaks no information about any specific
keyword. Thus, the keyword privacy is protected in the known cipher
text model. But in the known background model, the cloud server is
supposed to have more knowledge, such as the term frequency statistics
of keywords. This statistic information can be visualized as TF
distribution histograms which reveal show many documents are there for
every TF value of a specific keyword in the document collection. Then,
due to the specificity of the TF distribution histogram, like the graph
slope and value range, the cloud server could conduct TF statistical
attack to deduce/identify keywords .In the worst case, when there is only
one keyword in the query vector, i.e. the normalized IDF value for the
keyword is 1, the final relevance score distribution is exactly the
normalized TF distribution of this keyword, which is directly exposed to
cloud server. Therefore, the BDMRS scheme cannot resist TF statistical
attack in the known background model.
EDMRS SCHEME:
The security analysis above shows that the BDMRS scheme can protect
the Index Confidentiality and Query Confidentiality in the known cipher

text model. However, the cloud server is able to link the same search
requests by tracking path of visited nodes. In addition, in the known
background model, it is possible for the cloud server to identify a
keyword as the normalized TF distribution of the keyword can be
exactly obtained from the final calculated relevance scores. The primary
cause is that the relevance score calculated from Iu and TD is exactly
equal to that from Du and Q. A heuristic method to further improve the
security is to break such exact equality. Thus, we can introduce some
tunable randomness to disturb the relevance score calculation. In
addition, to suit different users preferences for higher accurate ranked
results or better protected key word privacy, the randomness are set
adjustable. The enhanced EDMRS scheme is almost the same as
BDMRS scheme except that:
SK Setup() In this algorithm, we set the secret vector S as a m-bit
vector, and set M1 and M2 are(m + m) (m + m) invertible matrices,
where m is the number of phantom terms.
I Gen Index (F; SK) Before encrypting the index vector Du, we
extend the vector Du to be a (m+m)-dimensional vector. Each extended
element Du[m+j], j = 1; :::;m, is set as a random number "j .
TD Gen Trap door (Wq; SK) The query vector Qis extended to be
a (m + m)-dimensional vector. Among the extended elements, a number
of m elements are randomly chosen to set as 1, and the rest are set as 0.

Relevance Score SR Score (Iu; TD) After the execution of


relevance evaluation by cloud server, the final relevance score for index
vector Iu equals to Du Q +"v, where v {j|Q[m + j] = 1}.

SECURITY ANALYSIS.
The security of EDMRS scheme is also analyzed according to the three
predefined privacy requirements in the design goals:
INDEX

CONFIDENTIALITY

AND

QUERY

CONFIDENTIALITY:
Inherited from BDMRS scheme, the EDMRS scheme can protect index
confidentiality and query confidentiality n the known background
model. Due to the utilization of phantom terms, the confidentiality is
further enhanced as the transformation matrices are harder to figure out.
QUERIES UNLINK ABILITY:
By introducing the random value ", the same search requests will
generate different query vectors and receive different relevance score
distributions. Thus, the query unlink ability is protected better. However,
since the proposed scheme is not designed to protect access pattern for
efficiency issues, the motivated cloud server can analyze the similarity
of search results to judge whether the retrieved results come from the
same requests. In the proposed EDMRS scheme, the data user can
control the level of unlink ability by adjusting

the value of "v. This is a trade-off between accuracy and privacy, which
is determined by the user.
KEYWORD PRIVACY:
The BDMRS scheme cannot resist TF statistical attack in the known
back ground model, as the cloud server is able to deduce/identify
keywords through analyzing the TF distribution histogram.

DYNAMIC UPDATE OPERATION OF DMRS:


After insertion or deletion of a document, we need to update
synchronously the index. Since the index of DMRS scheme is designed
as a balanced binary tree, the dynamic operation is carried out by
updating nodes in the index tree. Note that the update on index is merely
based on document identifies, and no access to the content of documents
is required. The specific process is presented as follows: {Is; ci}
Gen Update Info(SK; Ts; i; updtype)) This algorithm generates the
update information {Is; ci}which will be sent to the cloud server. In
order to reduce the communication overhead, the data owner stores a
copy of unencrypted index tree .Here, the notion updtype {Ins; Del}
denotes either ran insertion or a deletion for the document fi. The notion
Ts denotes the set consisting of the tree nodes that need to be changed
during the update. For example, if we want to delete the document f4in,
the sub tree Ts includes a set of nodes {r22; r11; r} If updtype is equal to
Del, the data owner deletes from the sub tree the leaf node that stores the

document identity i and updates the vector D of other nodes in sub tree
Ts, so as to generate the updated sub tree T s. In particular, if the
deletion of the leaf node breaks the balance of the binary index tree, we
replace the deleted node with a fake node whose vector is padded with 0
and file identity is null. Then, the data owner encrypts the vectors stored
in the sub tree T s with the key set SK to generate encrypted sub tree I
s, and set the output ci as null. If up type is equal to Ins, the data owner
generates a tree node u = Gen ID (); D; null; null; i for the document
fi, where D[j] = TF fi; wj for j = 1;:::; m. Then, the data owner inserts
this new node into the sub tree Ts as a leaf node and updates the vector
D of other nodes in sub tree Ts according to the Formula so as to
generate the new sub tree T s . Here, the data owner is always preferable
to replace the fake leaf nodes generated by Del operation with newly
inserted nodes, instead of directly inserting new nodes. Next, the data
owner encrypts the vectors stored in sub tree T s with the key set SK as
describe to generate encrypted sub tree Is.
Finally, the document fi is encrypted to ci.
{I; C} Up date (I; C; updtype; Is; ci) In this algorithm, cloud
server replaces the corresponding sub tree Is(the encrypted form of Ts)
with I
s, so as to generate a new index tree I. If updtype is equal to Ins, cloud
server inserts the encrypted document ci into C, obtaining a new

collection C. If updtype is equal to Del, cloud server deletes the


encrypted document ci from C to obtain the new collection C. Similar to
the scheme in, our scheme can also carry out the update operation
without storing the index tree on data owner side. We choose to store the
unencrypted index tree on the data owner side to trade off storage cost
for less communication burdens. In both of the Kamara et al.s scheme
and our design, it needs to change a set of nodes to update a leaf node
because the vector data of an internal node is computed from its
children. If the data owner does not store the unencrypted sub tree, the
whole update process needs two rounds of communications between the
cloud server and the data owner. Specifically, the data owner should
firstly download the involved sub tree in encrypted form from the cloud
server. Secondly, the data owner decrypts the sub tree and updates it
with the newly added or deleted leaf node. Thirdly, the data owner reencrypts the sub tree and uploads the encrypted sub tree to the cloud
server. Finally, the cloud server replaces the old sub tree with the
updated one. Thus, to reduce the communication cost, we store an
unencrypted tree on the data owner side .Then, the data owner can
update the sub tree directly with the newly added or deleted leaf node
and encrypt and upload the updated sub tree to the cloud server. In this
case, the update operation can be finished with one round of
communication between the cloud server and the data owner. As a
dynamic scheme, it is not reasonable to fix the length of vector as the

size of dictionary because the newly-added document may contain the


keywords out of the dictionary. In the proposed scheme, we add some
blank entries in the dictionary and set corresponding entries in each
index vector as 0. If new key words appear while inserting documents,
these blank entries are replaced with new keywords. Then, the index
vectors of newly added documents are generated based on the updated
dictionary, while the other index vectors are not affected and remain the
same as before. After several times of document updating, the real IDF
values of some keywords in the present collection may have obviously
changed. Therefore, as the distributor of the IDF data, the data owner
needs to re calculate the IDF values for all keywords and distribute them
to authorized users. In Table 1, there are three classes of keywords with
different IDF value ranges. The smaller IDF value means the keyword
appears more frequently. Those after adding or deleting 100 and
300documents, the IDF values do not change a lot. Thus, the data owner
is unnecessary to update IDF values every time when he executes update
operation on the dataset. The data owner can flexibly choose to check
the change of IDF values, and distribute the new IDF values when these
values have changed a lot.
PARALLEL EXECUTION OF SEARCH
Owing to the tree-based index structure, the proposed search scheme can
be executed in parallel, which further improves the search efficiency. For

example, we assume there are a set of processors P = {p1; pl} available.


Given a search request, an idle processor pi is used to query the root r. If
the search could be continued on both the children, and there is an idle
processor pj, the processor pi continues to deal with one of the children
while processor pj deals with the other one. If there is no idle processor,
the current processor is used to deal with the child with larger relevance
score, and the other child is put into a waiting queue. Once there is an
idle processor, it takes the oldest node in the queue to continue the
search. Note that all the processors share the same result list R List.
PRECISION AND PRIVACY:
The search precision of scheme is affected by the dummy keywords in
EDMRS scheme. Here, the precision is defined as that in [26]: Pk = k
=k, where k is the number of real top-k documents in the retrieved
documents. If a smaller standard deviation _ is set for the random
variable "v, the EDMRS scheme is supposed to obtain higher
precision, and vice versa. In the EDMRS scheme, phantom terms are
added to the index vector to obscure the relevance score calculation, so
that the cloud server cannot identify key words by analyzing the TF
distributions of special keywords. Here, we quantify the obscureness of
the relevance score by rank privacy, which is defined as:
Pk =|ri ri|=k2; where ri is the rank number of document in the
retrieved top-k documents, and ri is its real rank number in the whole

ranked results. The larger rank privacy denotes the higher security of the
scheme, which is illustrated. In the proposed scheme, data users can
accomplish different requirements on search precision and privacy by
adjusting the standard deviation _, which can be treated as a balance
parameter. We compare our schemes with a recent work proposed by
Sun et al. , which achieves high search efficiency. Note that our BDMRS
scheme retrieves the search results through exact calculation of
document vector and query vector. Thus, top-k search precision of the
BDMRS scheme is 100%. But as a similarity-based multi-keyword
ranked search scheme, the basic scheme in suffers from precision loss
due to the clustering of sub-vectors during index construction. The
precision test of basic scheme is presented In each test, 5 keywords are
randomly chosen as input, and the precision of returned top 100 results
is observed. The test is repeated 16 times, and the average precision is
91%.
EFFICIENCY:
INDEX TREE CONSTRUCTION:
The process of index tree construction for document collection F
includes two main steps: 1) building an unencrypted KBB tree based on
the document collection F, and 2) encrypting the index tree with
splitting operation and two multiplications of a (m m) matrix. The
index structure is constructed following a post order traversal of the tree

based on the document collection F, and O(n) nodes are generated


during the traversal. For each node, generation of an index vector takes
O(m)time, vector splitting process takes O(m) time, and two
multiplications of a (mm) matrix takes O(m2) time. As a whole, the
time complexity for index tree construction is O (nm2). Apparently, the
time cost for building index tree mainly depends on the cardinality of
document collection F and the number of keywords in dictionary W.
That the time cost of index tree construction is almost linear with the
size of document collection, and is proportional to the number of
keywords in the dictionary. Due to the dimension extension, the index
tree construction of EDMRS scheme is slightly more time-consuming
than that of BDMRS scheme. Although the index tree construction
consumes relatively much time at the data owner side, it is noteworthy
that this is a one-time operation. On the other hand, since the underlying
balanced binary tree has space complexity O (n) and every node stores
two m-dimensional vectors, the space complexity of the index tree is
O(nm). As listed in Table 3, when the document collection is fixed (n =
1000), the storage consumption of the index tree is determined by the
size of the dictionary.
TRAPDOOR GENERATION:
The generation of a trapdoor incurs a vector splitting operation and two
multiplications of a (m m) matrix, thus the time complexity is O(m2).
Typical search requests usually consist of just a few key words. That the

number of query keywords has little influence on the overhead of


trapdoor generation when the dictionary size is fixed. Due to the
dimension extension, the time cost of EDMRS scheme is a little higher
than that of BDMRS scheme.
SEARCH EFFICIENCY:
During the search process, if the relevance score at node u is larger than
the minimum relevance score in result list R List, the cloud server
examines the children of the node; else it returns. Thus, lots of nodes are
not accessed during a real search. We denote the number of leaf nodes
that contain one or more keywords in the query as _. Generally, _ is
larger than the number of required documents k, but far less than the
cardinality of the document collection n. As a balanced binary tree, the
height of the index is maintained to be log n, and the complexity of
relevance score calculation is O (m). Thus, the time complexity of search
is O (_mlog n). Note that the real search time is less than _mlog n. It is
because1) many leaf nodes that contain the queried key words are not
visited according to our search algorithm and 2) the accessing paths of
some different leaf nodes share the mutual traversed parts. In addition,
the parallel execution of search process can increase the efficiency
a lot. We test the search efficiency of the proposed scheme on a server
which supports 24 parallel threads. The search performance is tested
respectively by starting 1, 4, 8 and 16 threads. We compare the search
efficiency of our scheme with that of Sun et al. In the implementation of

Suns code, we divide 4000 keywords into 50 levels. Thus, each level
contains 80 keywords. According to, the higher level the query
keywords reside, the higher the search efficiency is. In our experiment,
we choose ten keywords from the 1st level for search efficiency
comparison. That if the query keywords are chosen from the1st level,
our scheme obtains almost the same efficiency as when we start 4
threads. That the search efficiency of our scheme increases a lot when
we increase the number of threads from 1 to 4. However, when we
continue to increase the threads, the search efficiency is not increased
remarkably. Our search algorithm can be executed in parallel to improve
the search efficiency. But all the started threads will share one result list
R List in mutually exclusive manner. When we start too many threads,
the threads will spend a lot of time for waiting to read and write the R
List. An intuitive method to handle this problem is to construct multiple
result lists. However, in our scheme, it will not help to improve the
search efficiency a lot. It is because that we need to find k results for
each result
list and time complexity for retrieving each result list is O(_m log n=l).
In this case, the multiple threads will not save much time, and selecting
k results from the multiple result lists will further increase the time
consumption .In the Fig. 8, we show the time consumption when we
start multiple threads with multiple result lists. The experimental results

prove that our scheme will obtain better search efficiency when we start
multiple threads with only one result list.
UPDATE EFFICIENCY:
In order to update a leaf node, the data owner needs to update log n
nodes. Since it involves an encryption operation for index vector at each
node, which takes O(m2) time, the time complexity of update operation
is thus O(m2 log n). We illustrate the time cost for the deletion of a
document. Fig. 9(a) shows that when the size of dictionary is fixed, the
deletion of a document takes nearly logarithmic time with the size of
document collection. And Fig. 9(b) shows that the update time is
proportional to the size of dictionary when the document collection is
fixed .In addition, the space complexity of each node is O(m). Thus,
space complexity of the communication package of updating a document
is O (mlog n).

CONCLUSION AND FUTURE WORK:


In this paper, a secure, efficient and dynamic search scheme is proposed,
which supports not only the accurate multi-keyword ranked search but
also the dynamic deletion and insertion of documents. We construct
a special keyword balanced binary tree as the index, and propose a
Greedy Depth-first Search algorithm to obtain better efficiency than
linear search. In addition, the parallel search process can be carried out

to further reduce the time cost. The security of the scheme is protected
against two threat models by using the secure kNN algorithm.
Experimental results demonstrate the efficiency of our proposed scheme.
There are still many challenge problems in symmetric SE schemes. In
the proposed scheme, the data owner is responsible for generating
updating information and sending them to the cloud server. Thus, the
data owner needs to store the unencrypted index tree and the information
that are necessary to recalculate the IDF values. Such an active data
owner may not be very suitable for the cloud computing model. It could
be a meaning full but difficult future work to design a dynamic
searchable encryption scheme whose updating operation can be
completed by cloud server only, meanwhile reserving the ability to
support multi-keyword ranked search. In addition, as the most of works
about searchable encryption, our scheme mainly considers the challenge
from the cloud server. Actually, there are many secure challenges in a
multi-user scheme. Firstly, all the users usually keep the same secure key
for trapdoor generation in asymmetric SE scheme. In this case, the
revocation of the user is big challenge. If it is needed to revoke a user in
this scheme, we need to rebuild the index and distribute the new secure
keys to all the authorized users. Secondly, symmetric SE schemes
usually assume that all the data users are trustworthy. It is not practical
and a dishonest data user will lead to many secure problems. For
example, a dishonest data user may search the documents and distribute

the decrypted documents to the unauthorized ones. Even more, a


dishonest data user may distribute his/her secure keys to the
unauthorized ones. In the future works, we will try to improve the SE
scheme to handle these challenge problems.

You might also like