Towards Green Iot Networking: Performance Optimization of Network Coding Based Communication and Reliable Storage
Towards Green Iot Networking: Performance Optimization of Network Coding Based Communication and Reliable Storage
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2706328, IEEE Access
Abstract—Internet of things (IoT) is expanding its outreach to for processing the data. Upon these components are various
almost every aspect of our daily life. By utilizing network coding applications such as e-transportation, e-heath, smart home
in Internet of things, the IoT energy consumption can be reduced. and so on. Communication networks such as 4G and 5G
Thus it is worthwhile studying and improving the applications
in Internet of things where network coding is incorporated. In networks [9] interconnect these major components.
this paper, we optimize the performance of network coding based
communication and reliable storage in two important components
of Internet of things, including the IoT core network, where data
Applications: e‐transportation, e‐health, smart home...
is sensed and transmitted, and the distributed cloud storage,
where the data generated by the IoT core network is stored.
First, we propose an adaptive network coding (ANC) scheme Cloud Distributed
IoT core network
in the IoT core network to improve the transmission efficiency. computing cloud storage
We demonstrate the efficacy of the scheme and the performance
advantage over existing schemes through simulations. Next we
Communication networks
introduce the optimal storage allocation problem in the network
coding based distributed cloud storage, which aims at searching
for the most reliable allocation that distributes the n data
components into N data centers, given the failure probability p Fig. 1. High level view of Internet of things
of each data center. Then we propose a polynomial-time optimal
storage allocation (OSA) scheme to solve the problem. Both the The IoT core network is responsible for generating the
theoretical analysis and the simulation results show that the data for Internet of things. Smart devices sense various data
storage reliability could be greatly improved by the OSA scheme.
and send out the data through the networks constituted by
these devices [10], [11]. Because the smart devices are mostly
Index Terms—Internet of things, wireless sensor networks, battery-driven, plenty of researches aim at devising energy-
distributed cloud storage, green networking.
efficient schemes to prolong the operation time of the network,
such as in [12], [13], [14]. Moreover, since the communication
I. I NTRODUCTION of the IoT core network is largely through wireless, the packet
loss may be high due to fadings and interferences, which
I NTERNET of things (IoT) [1], [2] is an integral part in
today’s development of smart city. People could remotely
access and interact with a wide range of devices integrated
would bring unnecessary energy consumptions. Thanks to the
emerging of software defined wireless networking [15], we can
with sensors, from home appliances, wearable electronics to apply sophisticated algorithms to improve the communication
environmental monitors. With such enormous coverage poten- quality of the IoT core network and conserve energy.
tial in our daily life, IoT with reduced energy consumption During the operation of Internet of things, data collected
(the ’green’ attribute) has attracted more and more attention. from a vast number of sensors in the IoT core network could
In recent years, energy-efficient networking and computing [3] explode. The distributed cloud storage is the best candidate
have been extensively studied from many perspectives, such to safely and reliably store these data. The distributed data
as the framework design [4], the algorithm design [5] and the storage architecture model distributes the database to multiple
resource reusing [6]. servers in many locations across the participating network in
The high level view of Internet of things is shown in the storage cloud. Each location is directly and independently
Fig. 1, which includes IoT core network for data sensing plugged into the Internet. If something unexpected happens
and transmission, distributed cloud storage [7] for storing to the data in one location, generally only a small amount of
the data generated by the core network, cloud computing [8] backed up data is impacted. Thus the data could be recovered
with much less energy consumption, using the data stored in
Jian Li, Yun Liu, Zhenjiang Zhang are with the School of Electronic and the rest of the locations. Besides distributing the data, many
Information Engineering, Beijing Jiaotong University, Beijing 100044, China.
Email: {lijian,liuyun,zhjzhang1}@bjtu.edu.cn researches also study proactive approaches to ensure the data
Jian Ren is with the Department of Electrical & Computer Engineering, availability such as in [16].
Michigan State University, East Lansing, MI 48824, United States. Email: Network coding provides a trade-off between communica-
[email protected]
Nan Zhao is with the School of Telecommunications Engineering, Xidian tion capacity and computational complexity in network envi-
University, Xi’an, Shanxi 710071, China. Email: [email protected] ronment by enabling the intermediate relay nodes to encode
2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2706328, IEEE Access
the incoming packets before forwarding them. Since network Thanks to the advantages of SDWN, it is much easier
coding can improve the throughput and robustness of the to implement algorithms which can improve network perfor-
network, the unnecessary energy consumption due to high mance into IoT core network. In [18] the authors propose
loss rate communication or high failure rate storage can be to combine network coding and software defined networking,
saved. Application of network coding in Internet of things can where the code rate of the network coding is fixed. Although
contribute to the ‘green IoT networking’. With such desirable this approach can improve the communication throughput, the
merit, in this paper we optimize the performance of network strategy is not flexible to cope with the changing channel
coding based communication and reliable storage in Internet qualities in wireless environments. In this paper, we will show
of things. The main contributions of this paper are: that the transmission efficiency of the IoT core network can
• We propose an adaptive network coding scheme (ANC be greatly improved through our adaptive network coding
scheme) for the IoT core network and demonstrate that scheme, where the code rate of the network coding can be
the scheme can improve the transmission efficiency and dynamically adjusted in a centralized manner with the global
the performance is better than existing schemes. view of the whole network.
• For the distributed cloud storage utilizing network coding 2) Distributed Cloud Storage: The volume of all global
that stores the data generated by the IoT core network, data will be boosted dramatically with the developing of
we introduce the optimal storage allocation problem and Internet of things, where there will be hundreds of thousands
propose an optimal storage allocation (OSA) scheme. of sensors deployed to create more and more data. To the year
Simulation results show that the storage reliability can 2020, the amount of data would grow to 40 zettabytes. How
be greatly improved. to properly store the data has become a major challenge in
Internet of things.
The paper is organized as follows: in Section II we briefly
Traditional centralized data center is not suitable for the
review the IoT core network and the distributed cloud storage.
context of Internet of things. If something unexpected happens
The concept of network coding and its advantages in commu-
such as power outage or military actions, the precious data
nication and storage are also introduced in Section II. Next
stored in the data center could be lost and unrecoverable. To
we propose and analyze our adaptive network coding scheme
ensure a high reliability of the data storage, a typical solution is
for the IoT core network in Section III. After that we study
to store the data across multiple servers in the distributed cloud
the optimal storage allocation problem in the distributed cloud
storage. The main idea is that instead of storing the entire data
storage utilizing network coding in Section IV. At last is the
in one server, we can split the data into n data components
Conclusion.
and store the components separately. The original data can
be recovered only when the required (threshold) number of
II. P RELIMINARIES AND R ELATED W ORK components, say k, are collected. The storage efficiency is
much higher than simply replicating the data over multiple
A. Internet of Things
servers. The distributed cloud storage can also increase data
The objective of Internet of things is to equip everything availability while reducing network congestion, thus leading
related to human beings with smart chips integrating sensors, to increased resiliency. A popular approach is to employ an
actuators and transceivers. Smart devices equipped with smart (n, k) maximum distance separable (MDS) code, such as the
chips within a certain range can communicate with each other Reed-Solomon (RS) code in the Total Recall system [19]. In
and form networks. These networks can be further connected later sections, we will show that by applying network coding
to the Internet through proper interconnecting. In this paper, in the distributed cloud storage we can further improve the
we will mainly focus on the IoT core network and the performance of data storage.
distributed cloud storage which stores the data generated by
the IoT core network as shown in Fig. 1.
1) IoT Core Network: IoT core network consists of the B. Network Coding
smart devices mentioned above and the networks among these In this section, we will briefly introduce the concept of
devices. Several challenges of the IoT core network exists. network coding and its advantages in improving the communi-
First, it lacks of a unified infrastructure and protocol stack. cation throughput and the distributed cloud storage reliability,
Second, the monitor and control of the network lacks of which could eventually contribute to the ‘green IoT network-
flexibility. Third, the functionality of the network cannot be ing’. Network coding was first introduced in the seminal paper
changed without reprogramming the smart devices when the by Ahlswede et al. in [20]. By allowing the intermediate relay
application environment changes. nodes to encode the incoming packets, the network could
To overcome these shortcomings, software defined wireless achieve the maximum multicast capacity.
networking (SDWN) [15] was proposed based on the paradigm A network is equivalent to a directed graph G = (V, E),
of software defined networking (SDN) [17]. In the context where V represents the set of vertices corresponding to the
of SDWN, the network elements in the data plane are smart network nodes and E represents all the directed edges between
devices which act as both end users and switches. The data vertices corresponding to the communication link. The start
flow is separated from the control flow. We can easily change vertex v of an edge e is called the tail of e and written as
the network behaviors through exchange of the control flow v = tail (e), while the end vertex u of an edge e is called the
among smart devices. head of of e and written as u = head(e). For a source node
2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2706328, IEEE Access
Data Center 1 Data Center 2
x1 Sink
node Node1 Node2 Node3 Node4
R1 u1+3u2 2u1+u2 5u1+2u2 u1+4u2
x1 x1 x1+x2
Source u3+3u4 2u3+u4 5u3+2u4 u3+4u4
node
R3 Data Center 1 Data Center 2
x2 x2 x1+x2
Sink
x2
node
R2 u1+3u2+u3+3u4 New Node3 u1+4u2+u3+4u4
3u1+4u2+3u3+4u4
Fig. 2. A simple example of network coding 2u1+u2+2u3+u4
2u1+7u2+2u3+7u4
u, there is a set of symbols X(u) = (x 1, . . . , x k ) to be sent. Fig. 3. network coding based distributed cloud storage
Each of the symbol is from the finite field GF (2m ), where m
is a positive integer. For a link e between intermediate nodes
r 1 and r 2 , written as e = (r 1, r 2 ), the symbol ye transmitted file using the same (n, k) code to recover the encoded part
on it is the function of all the ye0 such that head(e 0 ) = r 1 . of the file stored in the failed node. This approach is a waste
And ye can be written as: of bandwidth because the whole file has to be downloaded to
X recover a fraction of it.
ye = βe0,e · ye0, (1)
To overcome this drawback, Dimakis et al. [25] introduced
e0 :head(e0 )=r1
the conception of (n, k, d, α, β, B) regenerating code based on
in which the encoding coefficients βe0,e ∈ GF (2m ). For a sink the network coding. In the context of regenerating code, the
node v, there is a set of incoming symbols ye0 (e 0 : tail (e 0 ) = contents stored in a failed node can be regenerated by the
v) to be decoded. replacement node through downloading β help symbols from
1) Network Coding in Communication: The main idea of each of d helper nodes. This regeneration is identical to the
network coding can be illustrated through Fig. 2. Assume the encoding process of the intermediate nodes in network coding.
capacity of all the edges is C, the capacity of this network The bandwidth consumption for the failed node regeneration
is 2C according to the max-flow min-cut theorem. Only by could be far less than the whole file. Thus the energy con-
encoding the incoming packet symbols x 1, x 2 at node R3, this sumption in data regeneration could be greatly reduced.
network can achieve the maximum capacity. Since node R3 In [25], a tradeoff between the regeneration bandwidth
only need to send out one symbol x 1 + x 2 instead of two γ = d β and the storage requirement α was derived based on
symbols, the energy consumption of R3 in data transmission network coding theory and two extreme points were found:
could be reduced by 50%. minimum storage regeneration (MSR) point in which the
In [21], [22] the authors have shown that linear codes storage parameter α is minimized:
with random selected coefficients are sufficient to achieve the !
B Bd
multicast capacity by coding on a large enough field. Sink (α M SR, γ M SR ) = , , (2)
nodes that have received more linear independent encoded k k (d − k + 1)
symbols than the original symbol generated by the source and minimum bandwidth regeneration (MBR) point in which
nodes can easily decode the original symbols by solving a set the bandwidth γ is minimized:
of linear equations. Moreover, it has been demonstrated that !
2Bd 2Bd
network coding can improve the communication throughput. (α M BR, γ M BR ) = , . (3)
As an example, the authors in [23] have applied the principles 2kd − k 2 + k 2kd − k 2 + k
of random network coding to the context of peer-to-peer (P2P) Fig. 3 is an illustrative example of regenerating code with
content distribution, and have shown that file downloading parameters n = 4, k = 2, d = 3, α = 2, β = 1, B = 4. 4
times can be reduced. Thus in this paper we propose to apply symbols u1, u2, u3, u4 are stored in 4 storage nodes, and can
adaptive random linear network coding in the IoT core network be retrieved from any 2 of the storage nodes. A failed node
to improve the network transmission efficiency. In [24], the can be regenerated by downloading 1 symbol each from the
authors develop an OpenCoding protocol to improve the 3 remaining nodes. Here we suppose node 3 fails. For the
network throughput through intra-flow network coding for storage systems simply employing RS code, 4 symbols have
wireless mesh networks. to be downloaded first to decode the original symbols. Then
2) Distributed Cloud Storage Utilizing Network Coding: we have to encode the 4 decoded symbols again to regenerate
When a storage node in the distributed cloud storage network the symbols in the failed node 3. So the bandwidth needed
that employing (n, k) RS code (such as Total Recall [19]) fails, for repairing the failed node 3 is 4. For the regenerating code
the replacement node connects to k nodes and downloads the solution in Fig. 3, by linearly combing the 3 downloaded
data of the same amount as the whole file first to decode the symbols u1 + 3u2 + u3 + 3u4 , 2u1 + u2 + 2u3 + u4 and
original file. Then the replacement node encodes the original u1 + 4u2 + u3 + 4u4 into 2 symbols 3u1 + 4u2 + 3u3 + 4u4 and
2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2706328, IEEE Access
2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2706328, IEEE Access
2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2706328, IEEE Access
2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2706328, IEEE Access
As an example, for the regenerating code in Fig. 3, n = 4 Algorithm 4 OSA scheme - stage I
encoded parts are stored in N = 2 data centers. Suppose Input: the number of encoded parts n and the number of
the failure probability of each data center is p = 0.01. Two storage centers N
storage allocation strategies are shown in the figure. For the Output: all the valid allocations S(n, N, l), (1 ≤ l ≤ P(n, N ))
first allocation strategy S = {3, 1} (blue data centers with dash 1: function F INDA LL A LLOCATIONS (n, N)
lines), 3 encoded parts are stored in data center 1 and 1 encode 2: for i = 1 → n do
part is stored in data center 2. It is easy to calculate the failure 3: P(i, 1) ⇐ 1
probability of this allocation strategy is 0.01. For the second 4: S(i, 1, 1) ⇐ i
allocation strategy S = {2, 2} (orange data centers with solid 5: for j = 2 → N do
lines), 2 encoded parts are stored in each of the two data 6: if i ≥ j then
centers. The failure probability of this allocation strategy is 7: if i − j < j then
0.0001, which is much lower than that of the first strategy. 8: P(i, j) ⇐ P(i − 1, j − 1)
9: S(i, j, l) ⇐ S(i − 1, j − 1, l) ∪ {1}, for
1 ≤ l ≤ P(i, j)
B. Optimal Storage Allocation Scheme
10: else
In this section, we will show our optimal storage allocation 11: P(i, j) ⇐ P(i − 1, j − 1) + P(i − j, j)
(OSA) scheme to solve the storage allocation problem. The 12: S(i, j, l) ⇐ S(i − 1, j − 1, l) ∪ {1}, for
OSA scheme includes two stages: the first stage is to find all 1 ≤ l ≤ P(i − 1, j − 1)
out all the possible valid allocations S and the second stage 13: S(i, j, P(i − 1, j − 1) + l) ⇐ S(i − j, j, l)+
is to calculate the failure probability P for each S. Then we 1, for all 1 ≤ l ≤ P(i − j, j)
can output the allocation with the lowest failure probability 14: end if
through comparison. 15: end if
1) Stage I: Find out All the Possible Valid allocations S: 16: end for
The naive approach to find out all the possible valid S is to 17: end for
search all the possible combinations of n1, n2, . . . , n N such that 18: end function
i=1 ni = n. However, this approach will take exponential time
PN
thus is not practical. In our OSA scheme, we first change this
problem into an integer partition problem [27]: to allocate the l th valid allocation out of the P(n, N ) allocations for n
n encoded parts into N storage centers is the same as to encoded parts and N storage centers.
partition an integer n into N parts. Take n = 7, N = 3 as
an example, there are 4 ways to partition 7 into 3 parts: Proof. Algorithm 4 calculates S(i, j, l) (1 ≤ l ≤ P(i, j)) for
{1, 1, 5}, {1, 2, 4}, {1, 3, 3} and {2, 2, 3}, which also consist all 1 ≤ j ≤ N from i = 1 to i = n through a bottom-up manner
the possible valid allocations. Then we can solve the integer and we can get S(n, N, l) (1 ≤ l ≤ P(n, N )) for i = n, j = N.
partition problem using dynamic programming based on the For each i, line 3 to line 4 first calculate P(i, 1) = 1 and
following recurrence equation: S(i, 1, 1) = i, corresponding to the case of allocating i encoded
data parts into one data center. Then for each j = 2, . . . , N,
P(n, N ) = P(n − 1, N − 1) + P(n − N, N ), (7) there will be two cases:
• Line 8 to line 9 correspond to the case with i − j < j,
where P(i, j) is the total number of ways of partitioning integer where at least one storage node will be allocated only
i into j parts. The first part of equation (7) is the subproblem 1 encoded data part. The second part of equation (7)
where at least one 1 exists in the partition and the second part does not exist. So the number of ways of allocating i
of the equation is the subproblem where no 1 exists in the encoded data parts into j storage nodes will be equal
partition. Thus the solution to the original problem perfectly to that of allocating i − 1 encoded data parts into j − 1
incorporates these two subproblems, which make it feasible to storage nodes: P(i, j) = P(i − 1, j − 1). And each of the
solve using dynamic programming. We propose Algorithm 4 valid allocations S(i, j, l) will be the union of each already
to find out all the possible valid allocations S. In the algorithm, calculated allocations S(i − 1, j − 1, l) with the set {1}.
we use S(i, j, k) to represent the k th valid allocation out of the • Line 11 to line 13 correspond to the case with i − j ≥ j,
P(i, j) allocations for i encoded parts and j storage centers. ∪ where P(i, j) is the summation of two previously calcu-
is the union operation between two sets. The addition between lated parts as shown in equation (7). The computation of
a set S and a number x is defined as the additions between the first part and the corresponding valid allocations is
every element of the set and the number: the same as in line 8 to line 9. The second part is the
S + x := {ni + x|ni ∈ S for 1 ≤ i ≤ N }. (8) number of ways of allocating i − j encoded data parts
into j storage nodes P(i − j, j), where each of the storage
After the execution of the algorithm, we can get all the possible node will be allocated at least 2 encoded data parts. Thus
valid allocations S(n, N, k) (1 ≤ k ≤ P(n, N )). It is easy to each of the valid allocations S(i, j, P(i − 1, j − 1) + l) will
see that the algorithm runs in polynomial time. be each of the already calculated allocations S(i − j, j, l)
Theorem 1. Algorithm 4 can output all the valid allocations plus 1 as defined in equation (8).
S(n, N, l) for 1 ≤ l ≤ P(n, N ), where S(n, N, l) represents
2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2706328, IEEE Access
value summation results of the subsets S j(i) and count the total
(7,1) (7,2) (7,3)
number of S j(i) which have that summation value. Then for
(6,1) (6,2) (6,3) the subsets that have summation results larger than n − k, we
(5,1) (5,2) (5,3) can calculate the corresponding failure probability according
to equation (10). In the algorithm, T, L, R represent three
(4,1) (4,2) (4,3) auxiliary lists for subset summation. For a auxiliary list X,
(3,1) (3,2) (3,3) we use X .length to denote the number of elements of the
(2,1) (2,2) list, X .index to denote the current index number of the list,
VX ( j) to denote the value of j th element in X, and CX (i, j) to
(1,1) denote the total number of subsets that have the same element
number i and the same summation value VX ( j). Although the
Fig. 7. The calculation of Algorithm 4 for n = 7, N = 3 total number of subsets is 2 N , Algorithm 5 is a polynomial
time algorithm:
Fig. 7 illustrates the algorithm for n = 7 encoded data Theorem 2. The complexity of Algorithm 5 is O(nN ).
parts and N = 3 data centers. Each (i, j) pair represent the
calculation of P(i, j) and S(i, j, l). The pairs without shades Proof. Since the summation of a valid allocation S itself is the
are calculated using line 8 to line 9 (the first case) while the largest in all the summations of the subsets of S, the element
pairs in shades are calculated using line 11 to line 13 (the number T .length in T cannot exceed n. Through the merge
second case). The solid lines correspond to the first part of of subsets with the same summation values, each of the N
equation (7) and the dashed lines correspond to the second for-loops has the complexity O(n). So the total complexity is
part. From the figure we can clearly see that (7, 3) can be O(nN ).
efficiently calculated using the results of (6, 2) and (4, 3),
which have already been calculated the same way as illustrated Theorem 3. Algorithm 5 can output the failure probability PS
in Fig. 7. for the input allocation.
2) Stage II: Calculate the Failure Probability P for Each
Valid Allocation S: After we get all the possible valid alloca- Proof. In line 3 we initialize the auxiliary list L with an
tions S, we can calculate the failure probability PS for each of empty element 000, representing the summation result of 0
them. The goal function of equation (6) can be further written element of the input allocation S. Line 4 to line 31 calculate
as: the summations of every subset of the input allocation. At the
X X beginning of each round i of the for loop i = 1, . . . , n, the
PS = P *. ni > n − k +/ auxiliary list L is the list containing the summation results
∀S j ⊆S ,ni ∈S j of every subset of the first l (0 ≤ l < i) elements of the
- (9)
X input allocation S. Line 7 to line 8 calculate the auxiliary list
= p S j | (1 − p) N − | S j | ,
|
R by adding the new element ni to L: R = L + ni . Since
ni >n−k
P
∀S j ⊆S, s.t. n i ∈S j
the first element in L is the empty 000, CR (1, 1) will be 1,
where p is the failure probability of each storage center, indicating that the total number of subsets that have 1 element
S is the number of elements in subset S . If we try to and summation value ni is 1. Then the rest value of CR (l, j)
j j
directly calculate PS for every subset S j ∈ S, the order of will be CL (l − 1, j) for 2 ≤ j ≤ L.length because of the
the number of !subsets to be calculated ! will be approximate addition of ni to L. The elements of allocation S are sorted
PN N N in non-descending order, thus the elements in both L and R
to ≈ 2 N , where denotes the number of
| S j | =1 S j S j are also in non-descending order. From line 10 to line 29,
S -combinations of the set S, thus making it infeasible to we merge the elements of the auxiliary lists L and R into
j a temporary auxiliary list T one by one, following the rules
calculate in practice.
In the second stage of the OSA scheme (Algorithm 5), we below:
propose to change the exhaust search problem into a number • If the value of the current element VL (L.index) in L is
counting problem. More specifically, for each i (1 ≤ i ≤ N), equal to the current element VR (R.index) in R, add the
we count the total number of subsets S j(i) such that S j(i) denotes value into T. The corresponding counter CT (l, T .length)
the subsets with exactly i elements and the summation of every is equal to the sum of the two counters: CT (l, T .length) =
element in S j(i) is larger than n − k: CL (l, L.index) + CR (l, R.index) for 1 ≤ l ≤ i.
• If the value of the current element VL (L.index) in L is
N smaller than the current element in R, add the element
VL (L.index) into T. Set the counter CT (l, T .length) to
X X
(i)
PS = ni > n − k pi (1 − p) N −i .
Sj | (10)
CL (l, L.index) for 1 ≤ l ≤ i.
i=1 ni ∈S (i)
j
• If the value of the current element VR (R.index) in R is
In Algorithm 5, we first calculate the summations of every smaller than the current element in L, add the element
subset, which can be viewed as a variant of the subset-sum VR (R.index) into T. Set the counter CT (l, T .length) to
problem [28]. For each i (1 ≤ i ≤ N), we merge the same- CR (l, R.index) for 1 ≤ l ≤ i.
2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2706328, IEEE Access
Round 1
L 0 R 1 T 0 1
Algorithm 5 OSA scheme - stage II CL 0 CR 1 CT 0 1
Input: a valid allocation S(n, N, l), (1 ≤ l ≤ P(n, N ))
Round 2
Output: the failure probability PS of the allocation L 0 1 R 2 3 T 0 1 2 3
1: function C ALCULATE P ROBABILITY (S(n, N, l)) CL 0 1 CR 1 0 CT 0 1 1 0
2: {n1, n2, . . . , n N } ⇐ sort the allocation S(n, N, l) in 0 0 0 1 0 0 0 1
non-descending order
Round 3
3: L ⇐ {0}
L 0 1 2 3 R 2 3 4 5 T 0 1 2 3 4 5
4: . calculate summations of every subset CL 0 1 1 0 CR 1 0 0 0 CT 0 1 2 0 0 0
5: for i = 1 → N do 0 0 0 1 0 1 1 0 0 0 0 2 1 0
6: T⇐φ 0 0 0 0 0 0 0 1 0 0 0 0 0 1
7: R ⇐ L + ni , CR (1, 1) ⇐ 1
8: CR (l, j) ⇐ CL (l − 1, j), for all nonzero CL (l, j), Fig. 8. The calculation of Algorithm 5 for S = {1, 2, 2}
2 ≤ j ≤ L.length, 2 ≤ l ≤ i, i ≥ 2
9: L.index, R.index ⇐ 1
10: while L.index ≤ L.length do • Since the last element in L is smaller than some elements
11: if VL (L.index) == VR (R.index) then in R, after merging L into T, we can directly merge the
12: T ⇐ T ∪ VR (R.index) remaining elements of R into T through line 28 to line
13: for all 1 ≤ l ≤ i, CT (l, T .length) ⇐ 29.
CL (l, L.index) + CR (l, R.index) At the end of each for loop, the merged list T is assigned back
14: increase L.index, R.index by 1 to L for the next round of calculation. After N th round, list T
15: else has the summation results of all the subsets in S.
16: if VL (L.index) < VR (R.index) then Then the failure probability of S can be easily calculated
17: T ⇐ T ∪ VL (L.index) from line 34 to line 42 by counting the number of subsets
18: CT (l, T .length) ⇐ CL (l, L.index), for with the summation results larger than n − k.
all 1 ≤ l ≤ i
19: increase L.index by 1 Fig. 8 illustrates the summations for all the subsets of S =
20: else {1, 2, 2}. For i = 1, L = {0}, CL (1, 1) = 0, R = {1}, CR (1, 1) =
21: T ⇐ T ∪ VR (R.index) 1. The merged list T = {0, 1}, CT = {0, 1}. For the second
22: CT (l, T .length) ⇐ CR (l, R.index), for round, L, CL are assigned the values of T, CT . According to
all 1 ≤ l ≤ i line 7 and line 8 of Algorithm 5, R = L + n2 = {2, 3} and
23: increase R.index by 1 CR (2, 2) = CL (1, 2) = 1. At the end of the third round, we
24: end if can get the summation results T = {1, 2, 3, 4, 5} and the counter
25: end if matrix CT , which correctly record the number of subsets that
26: end while have the same summation value. As an example, CT (2, 3) = 2,
27: oldLn ⇐ T .length indicating that there are two 2-element subsets ({n1 = 1, n2 =
28: T ⇐ T ∪ {VR (R.index), VR (R.index + 1), . . . , 2}, {n1 = 1, n3 = 2}) that have the same summation value
VR (R.length)} VT (3) = 3.
29: {CT (l, oldLn + 1), . . . , CT (l, T .length)} ⇐ 3) OSA Scheme: Based on the algorithms of the two
{CR (l, R.index), . . . , CR (l, R.length)}, 1 ≤ l ≤ i stages, we can achieve the optimal storage allocation through
30: L ⇐T Algorithm 6. And it is straightforward to see:
31: end for Theorem 4. The OSA scheme is a polynomial time algorithm.
32: PS ⇐ 0
33: . count the number of subsets with the summation
results larger than n − k Algorithm 6 OSA scheme
34: for i = 1 → N do Input: the number of encoded parts n and the number of
35: sum ⇐ 0 storage centers N
36: for j = 1 → T .length do Output: the allocation with the lowest failure probability
37: if VT ( j) > n − k then function OSA(n, N)
38: sum ⇐ sum + CT (i, j) S(n, N, l) ⇐ F INDA LL A LLOCATIONS(n, N) (1 ≤ l ≤
39: end if P(n, N ))
40: end for for l = 1 → P(n, N ) do
41: PS ⇐ PS + sum × pi (1 − p) N −i PS ⇐ C ALCULATE P ROBABILITY(S(n, N, l))
42: end for end for
43: end function output the allocation with the lowest PS
end function
2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2706328, IEEE Access
10
-1
10 10 -2
OSA scheme
even allocation
OSA scheme
even allocation
-2
10 10 -3
failure probability
failure probability
10 -3 10 -4
-4
10 10 -5
10 -5 10 -6
16 18 20 22 24 26 28 30 32 4 6 8 10 12 14 16 18
k N
Fig. 9. Performance of the optimal storage allocation for different k Fig. 10. Performance of the optimal storage allocation for different number
of storage centers
2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2706328, IEEE Access
11
2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.