SlideShare a Scribd company logo
International Journal of Distributed and Parallel Systems (IJDPS) Vol.10, No.2/3, May 2019
DOI:10.5121/ijdps.2019.10301 1
ADVANCED DIFFUSION APPROACH TO DYNAMIC
LOAD-BALANCING FOR CLOUD STORAGE
Eman Daraghmi1
and Yousef-Awwad Daraghmi2
1
Department of Applied Computing, Palestine Technical University Kadoori (PTUK),
Tulkarm, Palestine
2
Department of Computer Systems Engineering, Palestine Technical University Kadoori
(PTUK), Tulkarm, Palestine
ABSTRACT
Load-balancing techniques have become a critical function in cloud storage systems that consist of
complex heterogeneous networks of nodes with different capacities. However, the convergence rate of any
load-balancing algorithm as well as its performance deteriorated as the number of nodes in the system, the
diameter of the network and the communication overhead increased. Therefore, this paper presents an
approach aims at scaling the system out not up - in other words, allowing the system to be expanded by
adding more nodes without the need to increase the power of each node while at the same time increasing
the overall performance of the system. Also, our proposal aims at improving the performance by not only
considering the parameters that will affect the algorithm performance but also simplifying the structure of
the network that will execute the algorithm. Our proposal was evaluated through mathematical analysis as
well as computer simulations, and it was compared with the centralized approach and the original diffusion
technique. Results show that our solution outperforms them in terms of throughput and response time.
Finally, we proved that our proposal converges to the state of equilibrium where the loads in all in-domain
nodes are the same since each node receives an amount of load proportional to its capacity. Therefore, we
conclude that this approach would have an advantage of being fair, simple and no node is privileged.
KEYWORDS
Load balancing, cloud storage, Heterogeneous, Simulation, Task assignment
1. INTRODUCTION
Load-balancing techniques have become a critical function in cloud storage systems that consist
of hundreds of independent storage nodes (or nodes for short). In such systems [1], nodes
simultaneously serve computing and storage functions where a file is partitioned into a large
number of disjointed and fixed-size pieces (or file chunks), and each file chunk is assigned to a
different cloud storage node so that the load of a node is typically proportional to the number of
file chunks the node possesses. Thus, it is possible to improve the overall performance of the
cloud storage system by balancing the load among the distributed nodes. In general, load-
balancing algorithms are designed to distribute the loads over multiple nodes in a way that
ensures expanding resource utilization, maximizing throughput, minimizing response time, and
avoiding the overload situation where one node is heavily loaded with excess of loads while
another node is lightly loaded or idle.
Practically, distributed file systems for clouds [2], such as GFS, utilize the centralized approach to
simplify the design as well as the implementation of a distributed file system, to manage the
metadata information of the systems and to balance the loads of storage nodes based on that
metadata. However, when increasing the number of storage nodes, the number of files to be
stored and the number of files to be accessed, central nodes become a bottleneck. Additionally, if
the central nodes fail, then the whole file system fails as well. As a solution, many studies have
proposed a number of dynamic load-balancing algorithms to eliminate the dependence on central
International Journal of Distributed and Parallel Systems (IJDPS) Vol.10, No.2/3, May 2019
2
nodes by allowing the storage nodes balance their loads spontaneously. The main objective of
these previous studies was to propose a better algorithm and to develop a new approach to
remedy shortcomings in previous efforts. In fact, previous algorithms are designed to be scalable,
portable, easy to use and more improved. Improvements include the derivation of a faster
algorithm that transfers less work to achieve a balanced state than other algorithms, and a
mechanism for selecting and transferring the loads to other machines in order to improve the
algorithm performance. They found that the performance of any load-balancing algorithm as well
as its convergence rate deteriorated as the number of nodes in the system, the diameter of the
network and the communication-overhead increased. They concluded that increasing the number
of nodes in the system, from one hand; make it not feasible for a node to collect the load-
information from all nodes in the system and, from the other hand, leads to difficulties in using
the collected load-information as most of this information will be out of date which result in
lower performance. In other word, the more complex the system is, the less performance
achieved. Therefore, our proposed solution attempts to bolster the previous approaches efforts by
applying the going-vertical principle for the dynamic diffusion load-balancing technique to
consider both the virtual structure of the system and the variables of the load-balancing algorithm.
In other words, this principle aims at scaling the system out not up – allowing the system to be
expanded by adding more nodes without increasing the system complexity nor the need to
increase the power of each node while at the same time increasing the overall performance of the
system. The key idea is to simplify the structure of the system by breaking down the entire
complex heterogeneous network into simpler domains or clusters of homogeneous nodes based
on the property of each node in the system. As a result, the number of nodes, the diameter of the
network as well as the communication overhead are decreased and thus the overall performance is
increased.
In summary, the objectives of this paper are: (1) to improve the performance of cloud storage
systems by applying dynamic load-balancing technique that employs the going on vertical
principle; (2) to propose an algorithm that re-balancing tasks to storage nodes by allowing the
storage nodes balance their loads spontaneously such that each node obtains load proportional to
its capacity and thus achieving the fairness state which in turn eliminates dependence on central
nodes. Our proposal was evaluated through computer simulations as well as mathematical
analysis, and it was compared with the centralized approach and the original diffusion technique.
Results show that our solution outperforms them in terms of throughput and response time.
Finally, we proved that our proposal converges to the state of equilibrium where the loads in all
in-domain nodes are the same since each node receives an amount of load proportional to its
capacity. Therefore, we conclude that this approach would have an advantage of being fair,
simple and no node is privileged.
2. RELATED WORK
Presented here is a summary of work related to the approaches and techniques used in this paper
[2, 4, 5, 6, 7]. In complex huge systems, it is not feasible for a node to collect the information of
all other nodes in the system. Thus, the Going-Vertical principle is a technique aims at converting
the complex heterogeneous system, by breaking it out, to several simple clusters of homogeneous
nodes. This technique considers both the properties of each node in the system, such as its
functionalities and/or capacity, as well as the objective of the system that will execute the load-
balancing algorithm to group those nodes who have similar properties into clusters or domains
and thus virtually simplifying the structure of the complex system. As mentioned before, load-
balancing becomes harder when more loads need to be balanced across a larger system.
Moreover, the performance of any load-balancing algorithm (i.e. throughput) as well as the
convergence rate of it deteriorated as the number of nodes, the diameter of the network and the
communication overhead increased; thus, employing this technique to any load-balancing
algorithm aims at scaling the system out not up means allowing the system to be expanded by
International Journal of Distributed and Parallel Systems (IJDPS) Vol.10, No.2/3, May 2019
3
adding more nodes while at the same time increasing the performance of the system load-
balancing algorithm without the need to increase the power of each node, and maintaining the
homogeneity property in-domain. Fig.1 shows the concept of the principle.
Figure.1. The concept of the going vertical principle
The diffusion approach is a dynamic load balancing technique that allows the nodes to
communicate and migrate tasks with other nodes. Each node balances the load among the other
nodes in the hope that after a number of iterations the whole system will approach the balanced
state. In the diffusion approach, each node simultaneously sends the excessive load to its under-
loaded neighbor nodes and receives loads from its neighbor nodes with higher load. Under the
synchronous assumption, the diffusion method has been proven to converge in polynomial time
for any initial load distribution given the quiescent assumption that no new load is generated and
no existing load is completed during execution of the algorithm. Since it is not necessary to have
a global coordinator, the diffusion approach is inherently 1oca1, fault tolerant and scalable. Hence
this approach is a natural choice for load balancing in a highly dynamic environment [8]. In 1989,
Cybenko [9] proposed the first diffusion scheme for dynamic load balancing on a message
passing multiprocessor networks. According to his method the load distribution at time t is
quantified by a vector where t
i
l is the load of node i at time t ≥ 0. In each
round t, node i and node j compare their load and node j sends tokens to node i if node j has
more loads than node i. Cybenko method requires ld(n) steps, where n is the number of
processors and ld denotes the logarithm to base 2. The method utilizes the topology of the
hypercube machine for its efficiency, but ignores any dependencies between the individual items
of data moved. In 1990, Boillat [10] et al. presented an approach to solve the load balancing
problem for parallel programs. They presented a fully distributed load balancing algorithm,
consisting of the same process running in parallel on each processor of a given network. No
assumption has to be made concerning the structure of the underlying network. They show the
number of iterations in several cases to be of the form 2
(n )O where n is the number of processors.
Practically, neighborhood load balancing algorithms are diffusion algorithms that have the
advantage that they are very simple and that the vertices do not need any global information to
base their balancing decisions on. Another advantage is that balancing with neighbors has the
tendency to keep load items initiated by one vertex in the neighborhood of that vertex.
3. LOAD BALANCING PROBLEM FORMULATION
Formally, a large-scale cloud storage file system is modeled as an undirected
graph ( ,E)G V where V represents the set of chunkservers or nodes and E describes the
connections among them. The cardinality of V is | |V n wheren can be one thousand, ten
International Journal of Distributed and Parallel Systems (IJDPS) Vol.10, No.2/3, May 2019
4
thousand, or more. The n chunkservers in the system i
v V stores a number or a set of
files F where any file f F is partitioned into a number of disjointed, fixed size chunks denoted
by f
C . For example, in Google File System GFS, each chunk has 64 Mbytes [11]. Since each
chunkserver i hosts a number of fixed size files chunk, the load i
L of a chunkserver is proportional
to the number of chunks hosted by the server. To simulate the worst case, we assume that the
chunkservers are heterogeneous in which each server has different capacity. Moreover, the files in
F may be arbitrarily created, deleted, and appended. The net effect results in file chunks not being
uniformly distributed to the chunkservers. Our objectives in this paper are to design a load-
balancing algorithm to reallocate the file chunks such that the chunks can be distributed to the
system as uniformly as possible and each chunkserver hosts a number of file chunk proportional
to its capacity.
Definition 1 (Going Vertical Principle).
Given a network of heterogeneous chunkservers or nodes ( , )G V E such as each node has its
capacity and with any assigned file-chunks, a principle or a relation Rto be found that classifies
the nodes based on their capacities into domains or clusters of homogeneous nodes and then the
load will be transferred among only nodes in the same domain is the going vertical principle.
More formally, the semantic of the relation is defined as follows:
where
1
[a ,...,a ]n
is the set of attribute names unique such as capacity to N and 1
t[a ,...,a ]n
is the restriction
of t to this set. It is usually required that the attribute names in the header of D are a subset of
those of N because otherwise the result of the operation will always be empty.
Definition 2 (Dynamic Load-Balancing Problem).
Given a large scale distributed file system ( , )G V E of | | nV  heterogeneous chunkservers and a
set of files F such that each file is partitioned into fixed size chunks, the dynamic load balancing
problem is to employ the going vertical principle to convert the set of heterogeneous
chunkservers into several clusters or domains of homogeneous chunkservers so that the load-
balancing algorithm efficiently redistribute the file chunks among the in-domain chunk servers
such that if G is stable in a sufficient time period, the file-chunks allocated at each chunkserver in
one domain ; is fair, that is, 1 2
... n
L L L   . Considers that for all i
v V when the load in all one
domain nodes are equal 1 2
... n
L L L   . When this happens, the domainG is said to have achieved
local fairness. Obviously achieving the local fairness in all domains means the entire system
achieves the fairness state.
4. OUR PROPOSAL
Our proposed algorithm is shown in algorithm1. NeighborLB. Each node in
in the systemG that
executes the same algorithm in parallel has a unique node id and a capacity, which defined as the
maximum number of file chunk that can the chunk server host. First, the structure of the system is
simplified by applying the going-vertical principle; thus, all the chunkservers that have the same
capacity form one local-domain or cluster such that chunkservers in one cluster are neighbors and
they can exchange their metadata load information and a file chunk can only be migrated to
another chunkserver in the domain. As a result, the graph diameter, the number of nodes that will
exchange the load information and communication overhead is decreased. In this paper, because
of the size limitation, we only focus of the load-balancing algorithm not the management of the
domains. Following sub-sections illustrate the proposed algorithm in details.
International Journal of Distributed and Parallel Systems (IJDPS) Vol.10, No.2/3, May 2019
5
4.1. Initialization
Each chunkserver i
v
hosts a number of fixed size file-chunks . Each
chunkserver i
v
initializes its state (initialization stage) in steps 1 through step 3. First, by applying
the going vertical principle, all nodes who have the same capacity form one domain. This pre-
initialization step means converting the heterogeneous system into several clusters of
homogeneous nodes. Step 1: Each chunkserver i
v
defines a set info to store its information and the
in-domain chunkservers neighbors inf { , }i i
o v L  
, where iv
is the node id and iL
is its load i.e. the
number of chunks the node hosts. Step 2: Each chunkserver i
v
defines an array .to mig to store the
amount of the migrated load that node in
will transfer to its in-domain neighbors. Step 3: Each
node iv
computes its initial load
|CF |i i
L  
4.2. Information Broadcasting
Step 4: Each chunkserver i
v
then broadcasts its initial state to all its in-domain neighbors. Note
that, each node maintains a FIFO message queue which holds the incoming messages. Each
message has the format
, ," ",gf f
v L T 
where the message came from f
v
, its load f
L
,T is the type
of the message, and g
is the migration information. There are three types of messages:
1. Request message (“R”): iv
receives a message to be informed that additional load submitted to
it.
2. Load Migration message (“G”): iv
sends a “G”-message to jv
to tell it that iv
wants to migrate g
units of load to jv
.
3. Broadcast message (“B”): broadcast the status (i.e. id, load to all in-domain neighbors).
Step 5: The main part of the algorithm starts when the node takes the first message from the
queue and processes the message according to its type. Initially, first messages received by each
node iv
are “B” type messages.
4.3. Computing the Average In-Domain-Load and Finding In-Domain Assistant
Neighbors
Step 6: After receiving the information of all in-domain neighbors, each node in
computes the
average in-domain load in which a node is located. The average in-domain load is defined
as
inf
|Cf |
|info|
i
i o
avg
L 


. Step 7: According to the set info of node in
and the average in-domain-load,
node in
defines a set of assistant neighbors lower
N
whose loads is less than the in-domain average
load.
The transferring strategy
Step 8: The decision of calling a procedure LB to migrate the excess files chunks or not depends
on the difference between the current load of the node iv
and the average in-domain load.
Therefore, the requests will be migrated if the load difference is positive. Hence, we will show
later that the local domain will rapidly converge to a state where
0i avg
L L 
for all edges. The
pseudo-code of the procedure Load-Balance is given in Procedure LB.
International Journal of Distributed and Parallel Systems (IJDPS) Vol.10, No.2/3, May 2019
6
1 1
lg 1.
: The node where the algorithm is executed.
( ) { }The set of in-domain neighbors
={f c ,...,f c } The set of hosted file chunks by node i
Begin
1.Let info { ,L }
2. Let
i
i i
i m s
i i
A orithm NeighborLB
n
Adj n n
Cf
n
  
  
mig( )=0 for all ( )
3.Compute the initial Load: L |Cf |
4.For each node ( ) do
send message< ,L ,"B",0>
5.Read messages from the messages queue
a. if T="B" then info= info { ,L
j j i
i i
j i
i i
f f
n n Adj n
n Adj n
n
n



 

inf
}
b. if T="G" then
info= info { ,L , ,L }
For each node ( ) do
send message< ,L g,"B",0>
6.Compute the average in-domain-load L
| ( )|
7.Define the se
i i f f
j i
i i
j
j o
avg
i
n g n g
n Adj n
n
L
Adj n

     




t of Assistant Neighbors
For each node ( ) do
if L then N N
8.Let load-difference (L - L )
9.If 0 then exit;
else
( ,N , )
EndBegin
j i
j avg lower lower j
i avg
i lower
n Adj n
L n
LD
LD
LB Cf LD

 


4.4. Load-Balancing Mechanism
In the procedure LB, the load difference LD, the set of in-domain assistant neighbors and the set
of the hosted file chunks are formed the procedure input parameters. In this step all the over-
loaded nodes call the procedure LB to migrate the excess file chunks to the under-loaded nodes.
Each overloaded node sorts the set of assistant neighbors in ascending order. The file chunks that
will be migrated from the overloaded node is the load difference between the load of the
overloaded node and the average in-domain load or the difference between the average load and
the assistant neighbor. In addition, this amount spread to the assistant neighbors ensures the node
who will receive the file chunk maintains the under-loaded status. Figure.4 shows the load-
balancing mechanism.
Pr LB( ,LD ,N )
Begin
1. Sort the nodes in N in acsending order
2. for nodes in N
compute =L
if LD then send message <n ,L ,"G",LD >
else
if LD then send me
i i lower
lower
lower
avg j
i i i i
i
ocedure Cf
L




 ssage <n ,L ,"G", >
LD = LD -
i i
i i
Endfor
EndBegin


International Journal of Distributed and Parallel Systems (IJDPS) Vol.10, No.2/3, May 2019
7
Figure. 2. Load-balancing example
5. ALGORITHM ANALYSIS
5.1. Time Complexity Analysis
Most of the operations in the proposed algorithm take (1)O time. Since the broadcasting
operations have time complexities O( )i
d
, the idea of our approach is to improve the performance
of load-balancing algorithm by considering both the algorithm parameter and the virtual structure
of the system that will execute that algorithm and thus the number of in-domain neighbors
| ( )|i i
d Adj n
of each node is decreased. That has benefits in reducing the communication delay
and the amount of out of date information. Also, if “info” and “to.mig” objects are implemented
as arrays, then steps 1 and 2 have time complexities O( )i
d
and thus each individual update takes
(1)O time. Moreover, the sorting steps for the LB procedure has the worst case time complexity
of O( log )c c where c is the number of in-domain assistant neighbor, suppose that merge sort is
used. The for-loop only takes O( )c since each entry in sorted is referenced only once. So
NeighborLB algorithm runs only at O( log )c c time.
5.2. The Convergence
In this section we prove that the NeighborLB algorithm converges to the state of fairness given
sufficient time.
Lemma 1. Given 1 2
( , ,..., )t t t t
n
L l l l
is the loads array of the in-domain nodes at time t where 1
t
l
is
load of node 1 at time . In time t, if there is at least one overloaded node (i.e. 0LD  ) then
1t
L 
is lexicographically greater than
t
L means the lightly-loaded nodes at time t will receive load at
time t+1.
Proof:
Let X  be the set of overloaded nodes in domain (i.e. nodes with 0LD  ) who needs to migrate
some loads to other nodes at time t . In reality, a nodei X will also host additional loads in timet .
Thus, the nodes that migrate loads in time t will reduce their load at time t+1. Let  be the node
that has lowest load at time t+1. Assume that  occupies the
th
k position of the array
1t
L 
where
1t
L 
similarly is the load array of in-domain nodes at time t+1 sorting in ascending order. Let
International Journal of Distributed and Parallel Systems (IJDPS) Vol.10, No.2/3, May 2019
8
 1 2 1
, , ,t t t
t k
Q l l l 
 
be the array of the loads in first k-1 positions of
t
L . In order to prove this lemma
we have to consider two cases: A set 1t
Q  contains a node i that received loads in timet . Thus, node
i belongs to both t
Q
and 1t
Q  , and its load value is increased in time 1t  since it will received some
migrated load. Therefore, there will be at least one load value in set 1t
Q  strictly greater than one
value in t
Q
. Accordingly,
1t
L 
is lexicographically greater than
t
L .There are nodes in which
located in 1t
Q  and did not migrate or receive loads at timet . In this case, the load value
at
th
k position at time 1t  is strictly greater than the load value in the same position at timet which
has received load from
t
X and therefore,
1t
L 
is lexicographically greater than
t
L .
Theorem 1 Convergence. In heterogeneous system, if each domain of nodes executes the
NeighborLB algorithm, then the system converges to a balanced state.
Proof: Given a heterogeneous unbalanced system. Assume that the going-vertical principle, first,
applied to this system to spread the nodes into several domains and some of these domains are
unbalanced. Each domain separately execute the NeighborLB algorithm in order to achieve the
convergence state. Consider one domain N . Let i be the most heavily loaded node in that domain,
i.e.
0i avg
l l 
, and all other nodes in-domain j N who have j i
l l
and j avg
l l
form the set of
assistant neighbors of node i. When i avg
l l
and thus
0i avg
l l 
. Thus, the result of Lemma 3 shall
be used, which guarantees that the array of loads sorted in ascending order, in the next time
moment, is lexicographically greater than the array of the current step. Given that the
NeighborLB algorithm is executed in some domains in a given time t . Let S N be the domain of
nodes executed the algorithm in time t . Let
t
L be the array of loads in one domain sorted in
ascending order in time t . It has been proven in lemma 1 that
1t
L 
is lexicographically greater
than
t
L . Let min
S
be the lightly loaded node in time t . There exists at least one node min
v S
which is
in-domain neighbor to node k such that
t t
k v
l l
. Now by using the proposed algorithm, node k
migrates a portion of its excess load to node v , but v does not migrate any loads in time t because
v is under loaded when compared to the average load of the domain. In time 1t  the effective-
load of node k decreases; however, its load value never become less than the load value of node
v which is given by
1t t
k avg v
l l l
 
. Thus,
1t
L 
is lexicographically greater than
t
L meaning that the
sorted array of load values of nodes in time 1t  are lexicographically greater than the sorted array
of the load values of nodes at timet .
6. SIMULATIONS
The performance of our proposed algorithm was examined through a computer simulation that
was implemented with CloudSim [12,13] and was compared to the original diffusion
neighborhood method and the centralized approach. Initially, the test of our proposed method was
based on two parameters: the number of chunk files and the number of storage nodes. The
measurement of the performance of the proposed algorithm was based on two metrics: throughput
and the response time. Only one parameter was changed each time so that any changes in the
performance would be based solely on this parameter. Therefore, two tests were produced for
each parameter to allow a rough average and standard deviation to be obtained. In fact, results
achieved from these tests were used to study: (1) the behavior of different load-balancing
algorithms under the same condition; (2) the behavior of the algorithms for random systems with
International Journal of Distributed and Parallel Systems (IJDPS) Vol.10, No.2/3, May 2019
9
different number of storage nodes; (3) the behavior of the algorithms for different load
distributions.
6.1. Changing the Number of File Chunks
To study the effects of changing the number of file chunks on the average response time and the
throughput, the number of file chunks was varied from 1000-10,000 chunks and the distribution
of the chunks among the storage nodes were carried in the following manner.
 The initial distributions varying 25% from the in-domain average load to represent a
situation where all nodes have similar loads at the beginning and those loads are close to
the in-domain average load; in other word, the initial situation is quite balanced.
 The initial load distributions varying 50% from the average load to constitute the
intermediate situations.
 The initial load distributions varying 75% from the average load to constitute the higher
intermediate situations.
 The initial load distributions varying 100% from the average load to constitute the
advanced unbalanced situations.
6.1.1. Average Response Time
The total time taken for the three algorithms increased as the number of file chunks was increased
as shown in Figure.3. This is expected as the more files to be stored, the longer it takes to
complete the storing tasks. However, it was observed that our proposed method performed better
than the centralized scheme and the original diffusion algorithm. In addition, when comparing the
results of the method and the centralized algorithm, it is observed that the gap between these two
curves was widening as the assigned loads was increased. This shows that the method actually
reduced the completion time by a considerable amount (greater speedup) in comparison to the
centralized algorithm as amount of loads increased.
Figure. 3. Number of File Chunks vs. Response Time
International Journal of Distributed and Parallel Systems (IJDPS) Vol.10, No.2/3, May 2019
10
6.1.2. Throughput
As shown in Figure.4, our method outperformed the original diffusion neighborhood algorithm in
terms of the system throughput in all loads distribution cases. The throughput using our method
was in the range of 89-98 percent while the nearest neighborhood algorithm only had a utilization
of 80-94 percent.
6.2. Varying the Number of Storage Nodes
To study the effects of changing the number of storage nodes on the average response time and
the throughputs, the number of nodes were varied from 10– 100 nodes and the distribution of the
overloaded nodes were carried in the following manner.
 25% of storage nodes are idle, 75% of storage nodes are overloaded.
 5o% of storage nodes are idle, 50% of storage nodes are overloaded.
 75% of storage nodes are idle, 25% of storage nodes are overloaded.
Figure. 4. Number of File Chunks vs. Throughput
6.2.1 Response Time
Figure. 4 shows that the response time improved when the number of nodes was increased.
However, this improvement was mainly caused by the fact that more nodes were used for larger
domain. Therefore, even though there were more loads to be scheduled in each round, the extra
load was easily handled by the additional nodes.
International Journal of Distributed and Parallel Systems (IJDPS) Vol.10, No.2/3, May 2019
11
Figure. 5. Number of nodes vs. Response time
6.2.2. Throughput
Figure. 5 shows that the throughput in the original neighborhood algorithm decreased as the
number of nodes in the system increased. However, in our proposed method, the number of nodes
is divided into several domains keeping the number of node in a domain reasonable. This shows
that load-balancing is harder when more tasks are to be balanced out across a larger system.
Figure. 6. Number of Nodes vs. Throughput
6.3. Discussion
This section summarized the performance of the proposed solution as compared to the original
diffusion method and the centralized scheme (see table 1 below). Each of the test used the
response time and the throughput as performance measures. The performance of these methods
was compared in many cases by changing the parameters of the algorithm. The parameters varied
(one at a time) were the assigned loads to be executed and the number of nodes. It was observed
that our proposed method performed better than the other approaches. The number of nodes, the
network diameter and the communication delay affect the convergence rate of any load-balancing
algorithm as well as its performance. It is intuitive that a graph or a system with longer diameter
will take longer time to converge as the number of iterations to propagate the loads to all assistant
neighbors is proportional to the network diameter. In addition, more communication delays lead
to out of date information. Our proposed method considers both the structure of the network that
will execute the algorithm and the algorithm parameters. It works, first, by simplifying the
structure of the system which in turn decreases, from one hand, the communication overhead
between the in-domain neighbors which lead to faster response time and, from the other hand, the
time need to choose the assistant neighbors and the target node that will receive the migrated
loads. This effect appears clearly when the assigned loads and the number of nodes increased.
Moreover, reducing the communication delay improves the load evaluation since the effect of out
of date information will be decreased. Also, considering the processing speed and thus the
processing capacity of each node leads to more accurate average load evaluation which improves
the algorithm performance. The importance of the average load appears when deciding the
amount of loads to be migrated; if the migrated loads to one node is too small, then the load
distribution will take longer and so the convergence rate. In contrast, if the migrated loads to one
node is too large, then the overloaded node may transfer too much load to its neighbor and thus
this overloaded node will not have sufficient load to transfer to the remaining neighbors. Thus, by
using the in-domain average load, each node obtains an amount of load proportional to its
International Journal of Distributed and Parallel Systems (IJDPS) Vol.10, No.2/3, May 2019
12
capacity and thus no node is privileged. This indicates reliable performance of the method when
the assigned loads increases that is very valuable from a practical point of view.
Table 1. A comparison between our proposed solution and both the centralized approach as well as the
diffusion method
Centralized Original Diffusion Our Method (diffusion + applying
the GV principle)
Pros Simple design,
simple
implementation,
good
performance.
Good performance,
solve the bottleneck and
the failure problems in
centralized approach.
Good performance even with large
systems, simplify the network
structure, solve the cons of the
original diffusion method and the
centralized approach.
Cons Bottleneck
problem, failure
possibility
Communication
overhead, low
convergence rate and
low performance for
large heterogeneous
systems.
Need a good skills to define the
properties for each node.
7. CONCLUSION
This paper considered load-balancing mechanism in cloud storage systems. As the convergence
rate of any load-balancing algorithm as well as its performance deteriorated as the number of
nodes in the system, the diameter of the network and the communication overhead increased, our
proposal that employed the going-vertical principle has been very effective especially in the case
of a large number of nodes and dense loads. In fact, a going-vertical based scheme works better
when the number of nodes is large since the key idea of the proposed method is that the
communication occurs between only the in-domain node reduces the impact of communication
delay on freshness of the load information which in turn allows the method to handle all load-
balancing information and thus all load-balancing decisions with minimal inter node
communication. In other words, we aimed at not only considering the parameters that will affect
the algorithm performance but also simplifying the structure of the network that will execute the
algorithm. Finally, we proved that the proposed algorithm under this approach converge to the
state of equilibrium where the load in all nodes is the same since each node receives an amount of
load proportional to its capacity. Therefore, we conclude that this approach would have an
advantage of being fair, simple and no node is privileged.
REFERENCES
[1] H.-C. Hsiao, H.-Y. Chung, H. Shen and Y.-C. Chao, "Load Rebalancing for Distributed File Systems
in Clouds," IEEE Transactions on Parallel and Distributed Systems, vol. 24, no. 5, pp. 951-962, 2013.
[2] E. Y. Daraghmi and S. M. Yuan, "In-domain neighborhood approach to heterogeneous dynamic load
balancing in real world network," in 14'th International Conference on Parallel and Distributed
Computing, Applications and Technologies (PDCAT'13), Taipei, Taiwn, 2013.
[3] C. P. A. a. P. Berenbrink., "Distributed selfish load balancing with weights and speeds.," in The 2012
ACM symposium on Principles of distributed computing, New York, USA, 2012.
[4] J. Bahi, R. Couturier and F. Vernier, "Synchronous Distributed Load Balancing on Totally Dynamic
Networks," in Parallel and Distributed Processing Symposium, 2007.
International Journal of Distributed and Parallel Systems (IJDPS) Vol.10, No.2/3, May 2019
13
[5] E. Luque, A. Ripoll and A. C. a. T. Margalef, "A distributed diffusion method for dynamic load
balancing on parallel computers," in Euromicro Workshop on Parallel and Distributed Processing,
1995.
[6] P.Neelakantan, "Decentralized Load Balancing In Heterogeneous Systems Using Diffusion
Approach," International Journal of Distributed and Parallel systems (IJDPS), vol. 3, no. 1, pp. 229 -
239, 2012.
[7] C.-C. Hui and S. Chanson, "A hydro-dynamic approach to heterogeneous dynamic load balancing in a
network of computers," in Proceedings of the 1996 International Conference on Parallel Processing
Software., 1996.
[8] G. Cybenko, "Dynamic load balancing for distributed memory multiprocessors," Journal of Parallel
and Distributed Computing, vol. 7, no. 2, pp. 279-301, 1989.
[9] J. E. Boillat, "Load balancing and Poisson equation in a graph," Concurrency: Practice and
Experience, vol. 2, no. 4, pp. 289-313, 1990.
[10] "Google File System," [Online]. Available: http://en.wikipedia.org/wiki/Google_File_System.
[11] R. N. Calheiros, R. Ranjan, A. Beloglazo, C. A. F. D. Rose and R. Buyya, "CloudSim: a toolkit for
modeling and simulation of cloud," SOFTWARE – PRACTICE AND EXPERIENCE, vol. 41, no. 1,
pp. 23-50, 2010.
[12] R. M. I. Stoica, D. Liben-Nowell, D. R. Karger, M. F. Kaashoek, F. Dabek and H. Balakrishnan,
"Chord: a Scalable Peer-to-Peer Lookup Protocol for Internet Applications," in Proceedings of the
2001 conference on Applications, technologies, architectures, and protocols, San Diego, California,
USA, 2001.
Ad

Recommended

F0963440
F0963440
IOSR Journals
 
ICICCE0298
ICICCE0298
IJTET Journal
 
Mobile Data Gathering with Load Balanced Clustering and Dual Data Uploading i...
Mobile Data Gathering with Load Balanced Clustering and Dual Data Uploading i...
1crore projects
 
SECTOR TREE-BASED CLUSTERING FOR ENERGY EFFICIENT ROUTING PROTOCOL IN HETEROG...
SECTOR TREE-BASED CLUSTERING FOR ENERGY EFFICIENT ROUTING PROTOCOL IN HETEROG...
IJCNCJournal
 
Efficient load rebalancing for distributed file system in Clouds
Efficient load rebalancing for distributed file system in Clouds
IJERA Editor
 
B0781013215
B0781013215
Prafull Maktedar
 
Public Cloud Partition Using Load Status Evaluation and Cloud Division Rules
Public Cloud Partition Using Load Status Evaluation and Cloud Division Rules
IJSRD
 
A New Paradigm for Load Balancing in WMNs
A New Paradigm for Load Balancing in WMNs
CSCJournals
 
Ns1
Ns1
Senthilvel S
 
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
IJORCS
 
Advanced delay reduction algorithm based on GPS with Load Balancing
Advanced delay reduction algorithm based on GPS with Load Balancing
ijdpsjournal
 
Mobile elements scheduling for periodic sensor applications
Mobile elements scheduling for periodic sensor applications
ijwmn
 
Cloud computing – partitioning algorithm
Cloud computing – partitioning algorithm
ijcseit
 
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
Narendra Singh Yadav
 
Mobile data gathering with load balanced clustering and dual data uploading i...
Mobile data gathering with load balanced clustering and dual data uploading i...
shanofa sanu
 
Data Dissemination in Wireless Sensor Networks: A State-of-the Art Survey
Data Dissemination in Wireless Sensor Networks: A State-of-the Art Survey
CSCJournals
 
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
IDES Editor
 
Communication synchronization in cluster based wireless sensor network a re...
Communication synchronization in cluster based wireless sensor network a re...
eSAT Journals
 
tankala srinivas, palasa
tankala srinivas, palasa
shiva782
 
Power Aware Cluster to Minimize Load In Mobile Ad Hoc Networks
Power Aware Cluster to Minimize Load In Mobile Ad Hoc Networks
IJRES Journal
 
Data gathering in wireless sensor networks using intermediate nodes
Data gathering in wireless sensor networks using intermediate nodes
IJCNCJournal
 
MuMHR: Multi-path, Multi-hop Hierarchical Routing
MuMHR: Multi-path, Multi-hop Hierarchical Routing
M H
 
Eh33798804
Eh33798804
IJERA Editor
 
A Novel Grid Based Dynamic Energy Efficient Routing Approach for Highly Dense...
A Novel Grid Based Dynamic Energy Efficient Routing Approach for Highly Dense...
ijasuc
 
A Novel Grid Based Dynamic Energy Efficient Routing Approach for Highly Dense...
A Novel Grid Based Dynamic Energy Efficient Routing Approach for Highly Dense...
ijsptm
 
Current Issue: May 2019, Volume 10, Number 2/3--- Table of Contents
Current Issue: May 2019, Volume 10, Number 2/3--- Table of Contents
ijdpsjournal
 
D1803062126
D1803062126
IOSR Journals
 
Load balancing in Distributed Systems
Load balancing in Distributed Systems
Richa Singh
 
Load Balancing in Cloud Nodes
Load Balancing in Cloud Nodes
INFOGAIN PUBLICATION
 
Load Balancing in Cloud Nodes
Load Balancing in Cloud Nodes
INFOGAIN PUBLICATION
 

More Related Content

What's hot (17)

Ns1
Ns1
Senthilvel S
 
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
IJORCS
 
Advanced delay reduction algorithm based on GPS with Load Balancing
Advanced delay reduction algorithm based on GPS with Load Balancing
ijdpsjournal
 
Mobile elements scheduling for periodic sensor applications
Mobile elements scheduling for periodic sensor applications
ijwmn
 
Cloud computing – partitioning algorithm
Cloud computing – partitioning algorithm
ijcseit
 
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
Narendra Singh Yadav
 
Mobile data gathering with load balanced clustering and dual data uploading i...
Mobile data gathering with load balanced clustering and dual data uploading i...
shanofa sanu
 
Data Dissemination in Wireless Sensor Networks: A State-of-the Art Survey
Data Dissemination in Wireless Sensor Networks: A State-of-the Art Survey
CSCJournals
 
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
IDES Editor
 
Communication synchronization in cluster based wireless sensor network a re...
Communication synchronization in cluster based wireless sensor network a re...
eSAT Journals
 
tankala srinivas, palasa
tankala srinivas, palasa
shiva782
 
Power Aware Cluster to Minimize Load In Mobile Ad Hoc Networks
Power Aware Cluster to Minimize Load In Mobile Ad Hoc Networks
IJRES Journal
 
Data gathering in wireless sensor networks using intermediate nodes
Data gathering in wireless sensor networks using intermediate nodes
IJCNCJournal
 
MuMHR: Multi-path, Multi-hop Hierarchical Routing
MuMHR: Multi-path, Multi-hop Hierarchical Routing
M H
 
Eh33798804
Eh33798804
IJERA Editor
 
A Novel Grid Based Dynamic Energy Efficient Routing Approach for Highly Dense...
A Novel Grid Based Dynamic Energy Efficient Routing Approach for Highly Dense...
ijasuc
 
A Novel Grid Based Dynamic Energy Efficient Routing Approach for Highly Dense...
A Novel Grid Based Dynamic Energy Efficient Routing Approach for Highly Dense...
ijsptm
 
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
IJORCS
 
Advanced delay reduction algorithm based on GPS with Load Balancing
Advanced delay reduction algorithm based on GPS with Load Balancing
ijdpsjournal
 
Mobile elements scheduling for periodic sensor applications
Mobile elements scheduling for periodic sensor applications
ijwmn
 
Cloud computing – partitioning algorithm
Cloud computing – partitioning algorithm
ijcseit
 
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
Narendra Singh Yadav
 
Mobile data gathering with load balanced clustering and dual data uploading i...
Mobile data gathering with load balanced clustering and dual data uploading i...
shanofa sanu
 
Data Dissemination in Wireless Sensor Networks: A State-of-the Art Survey
Data Dissemination in Wireless Sensor Networks: A State-of-the Art Survey
CSCJournals
 
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
IDES Editor
 
Communication synchronization in cluster based wireless sensor network a re...
Communication synchronization in cluster based wireless sensor network a re...
eSAT Journals
 
tankala srinivas, palasa
tankala srinivas, palasa
shiva782
 
Power Aware Cluster to Minimize Load In Mobile Ad Hoc Networks
Power Aware Cluster to Minimize Load In Mobile Ad Hoc Networks
IJRES Journal
 
Data gathering in wireless sensor networks using intermediate nodes
Data gathering in wireless sensor networks using intermediate nodes
IJCNCJournal
 
MuMHR: Multi-path, Multi-hop Hierarchical Routing
MuMHR: Multi-path, Multi-hop Hierarchical Routing
M H
 
A Novel Grid Based Dynamic Energy Efficient Routing Approach for Highly Dense...
A Novel Grid Based Dynamic Energy Efficient Routing Approach for Highly Dense...
ijasuc
 
A Novel Grid Based Dynamic Energy Efficient Routing Approach for Highly Dense...
A Novel Grid Based Dynamic Energy Efficient Routing Approach for Highly Dense...
ijsptm
 

Similar to ADVANCED DIFFUSION APPROACH TO DYNAMIC LOAD-BALANCING FOR CLOUD STORAGE (20)

Current Issue: May 2019, Volume 10, Number 2/3--- Table of Contents
Current Issue: May 2019, Volume 10, Number 2/3--- Table of Contents
ijdpsjournal
 
D1803062126
D1803062126
IOSR Journals
 
Load balancing in Distributed Systems
Load balancing in Distributed Systems
Richa Singh
 
Load Balancing in Cloud Nodes
Load Balancing in Cloud Nodes
INFOGAIN PUBLICATION
 
Load Balancing in Cloud Nodes
Load Balancing in Cloud Nodes
INFOGAIN PUBLICATION
 
Load rebalancing for distributed file systems in clouds
Load rebalancing for distributed file systems in clouds
JPINFOTECH JAYAPRAKASH
 
A SURVEY ON STATIC AND DYNAMIC LOAD BALANCING ALGORITHMS FOR DISTRIBUTED MULT...
A SURVEY ON STATIC AND DYNAMIC LOAD BALANCING ALGORITHMS FOR DISTRIBUTED MULT...
IRJET Journal
 
Application of Fuzzy Logic in Load Balancing of Homogenous Distributed Systems1
Application of Fuzzy Logic in Load Balancing of Homogenous Distributed Systems1
CSCJournals
 
AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...
AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...
ijcsit
 
ANALYSIS ON LOAD BALANCING ALGORITHMS IMPLEMENTATION ON CLOUD COMPUTING ENVIR...
ANALYSIS ON LOAD BALANCING ALGORITHMS IMPLEMENTATION ON CLOUD COMPUTING ENVIR...
AM Publications
 
Survey on Load Rebalancing for Distributed File System in Cloud
Survey on Load Rebalancing for Distributed File System in Cloud
AM Publications
 
J0210053057
J0210053057
researchinventy
 
LOAD BALANCING IN CLOUD COMPUTING
LOAD BALANCING IN CLOUD COMPUTING
IRJET Journal
 
Modified Active Monitoring Load Balancing with Cloud Computing
Modified Active Monitoring Load Balancing with Cloud Computing
ijsrd.com
 
Dynamic Load Calculation in A Distributed System using centralized approach
Dynamic Load Calculation in A Distributed System using centralized approach
IJARIIT
 
Cloud partitioning with load balancing a new load balancing technique for pub...
Cloud partitioning with load balancing a new load balancing technique for pub...
IAEME Publication
 
Cloud partitioning with load balancing a new load balancing technique for pub...
Cloud partitioning with load balancing a new load balancing technique for pub...
IAEME Publication
 
A Survey on Task Scheduling and Load Balanced Algorithms in Cloud Computing
A Survey on Task Scheduling and Load Balanced Algorithms in Cloud Computing
IRJET Journal
 
A Novel Switch Mechanism for Load Balancing in Public Cloud
A Novel Switch Mechanism for Load Balancing in Public Cloud
IJMER
 
Grid computing for load balancing strategies
Grid computing for load balancing strategies
International Journal of Science and Research (IJSR)
 
Current Issue: May 2019, Volume 10, Number 2/3--- Table of Contents
Current Issue: May 2019, Volume 10, Number 2/3--- Table of Contents
ijdpsjournal
 
Load balancing in Distributed Systems
Load balancing in Distributed Systems
Richa Singh
 
Load rebalancing for distributed file systems in clouds
Load rebalancing for distributed file systems in clouds
JPINFOTECH JAYAPRAKASH
 
A SURVEY ON STATIC AND DYNAMIC LOAD BALANCING ALGORITHMS FOR DISTRIBUTED MULT...
A SURVEY ON STATIC AND DYNAMIC LOAD BALANCING ALGORITHMS FOR DISTRIBUTED MULT...
IRJET Journal
 
Application of Fuzzy Logic in Load Balancing of Homogenous Distributed Systems1
Application of Fuzzy Logic in Load Balancing of Homogenous Distributed Systems1
CSCJournals
 
AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...
AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...
ijcsit
 
ANALYSIS ON LOAD BALANCING ALGORITHMS IMPLEMENTATION ON CLOUD COMPUTING ENVIR...
ANALYSIS ON LOAD BALANCING ALGORITHMS IMPLEMENTATION ON CLOUD COMPUTING ENVIR...
AM Publications
 
Survey on Load Rebalancing for Distributed File System in Cloud
Survey on Load Rebalancing for Distributed File System in Cloud
AM Publications
 
LOAD BALANCING IN CLOUD COMPUTING
LOAD BALANCING IN CLOUD COMPUTING
IRJET Journal
 
Modified Active Monitoring Load Balancing with Cloud Computing
Modified Active Monitoring Load Balancing with Cloud Computing
ijsrd.com
 
Dynamic Load Calculation in A Distributed System using centralized approach
Dynamic Load Calculation in A Distributed System using centralized approach
IJARIIT
 
Cloud partitioning with load balancing a new load balancing technique for pub...
Cloud partitioning with load balancing a new load balancing technique for pub...
IAEME Publication
 
Cloud partitioning with load balancing a new load balancing technique for pub...
Cloud partitioning with load balancing a new load balancing technique for pub...
IAEME Publication
 
A Survey on Task Scheduling and Load Balanced Algorithms in Cloud Computing
A Survey on Task Scheduling and Load Balanced Algorithms in Cloud Computing
IRJET Journal
 
A Novel Switch Mechanism for Load Balancing in Public Cloud
A Novel Switch Mechanism for Load Balancing in Public Cloud
IJMER
 
Ad

Recently uploaded (20)

How to Customize Quotation Layouts in Odoo 18
How to Customize Quotation Layouts in Odoo 18
Celine George
 
Hurricane Helene Application Documents Checklists
Hurricane Helene Application Documents Checklists
Mebane Rash
 
IIT KGP Quiz Week 2024 Sports Quiz (Prelims + Finals)
IIT KGP Quiz Week 2024 Sports Quiz (Prelims + Finals)
IIT Kharagpur Quiz Club
 
ECONOMICS, DISASTER MANAGEMENT, ROAD SAFETY - STUDY MATERIAL [10TH]
ECONOMICS, DISASTER MANAGEMENT, ROAD SAFETY - STUDY MATERIAL [10TH]
SHERAZ AHMAD LONE
 
Public Health For The 21st Century 1st Edition Judy Orme Jane Powell
Public Health For The 21st Century 1st Edition Judy Orme Jane Powell
trjnesjnqg7801
 
CRYPTO TRADING COURSE BY FINANCEWORLD.IO
CRYPTO TRADING COURSE BY FINANCEWORLD.IO
AndrewBorisenko3
 
Filipino 9 Maikling Kwento Ang Ama Panitikang Asiyano
Filipino 9 Maikling Kwento Ang Ama Panitikang Asiyano
sumadsadjelly121997
 
English 3 Quarter 1_LEwithLAS_Week 1.pdf
English 3 Quarter 1_LEwithLAS_Week 1.pdf
DeAsisAlyanajaneH
 
NSUMD_M1 Library Orientation_June 11, 2025.pptx
NSUMD_M1 Library Orientation_June 11, 2025.pptx
Julie Sarpy
 
M&A5 Q1 1 differentiate evolving early Philippine conventional and contempora...
M&A5 Q1 1 differentiate evolving early Philippine conventional and contempora...
ErlizaRosete
 
How to Manage Different Customer Addresses in Odoo 18 Accounting
How to Manage Different Customer Addresses in Odoo 18 Accounting
Celine George
 
Q1_ENGLISH_PPT_WEEK 1 power point grade 3 Quarter 1 week 1
Q1_ENGLISH_PPT_WEEK 1 power point grade 3 Quarter 1 week 1
jutaydeonne
 
Paper 106 | Ambition and Corruption: A Comparative Analysis of ‘The Great Gat...
Paper 106 | Ambition and Corruption: A Comparative Analysis of ‘The Great Gat...
Rajdeep Bavaliya
 
A Visual Introduction to the Prophet Jeremiah
A Visual Introduction to the Prophet Jeremiah
Steve Thomason
 
Great Governors' Send-Off Quiz 2025 Prelims IIT KGP
Great Governors' Send-Off Quiz 2025 Prelims IIT KGP
IIT Kharagpur Quiz Club
 
Q1_TLE 8_Week 1- Day 1 tools and equipment
Q1_TLE 8_Week 1- Day 1 tools and equipment
clairenotado3
 
Photo chemistry Power Point Presentation
Photo chemistry Power Point Presentation
mprpgcwa2024
 
OBSESSIVE COMPULSIVE DISORDER.pptx IN 5TH SEMESTER B.SC NURSING, 2ND YEAR GNM...
OBSESSIVE COMPULSIVE DISORDER.pptx IN 5TH SEMESTER B.SC NURSING, 2ND YEAR GNM...
parmarjuli1412
 
2025 June Year 9 Presentation: Subject selection.pptx
2025 June Year 9 Presentation: Subject selection.pptx
mansk2
 
How payment terms are configured in Odoo 18
How payment terms are configured in Odoo 18
Celine George
 
How to Customize Quotation Layouts in Odoo 18
How to Customize Quotation Layouts in Odoo 18
Celine George
 
Hurricane Helene Application Documents Checklists
Hurricane Helene Application Documents Checklists
Mebane Rash
 
IIT KGP Quiz Week 2024 Sports Quiz (Prelims + Finals)
IIT KGP Quiz Week 2024 Sports Quiz (Prelims + Finals)
IIT Kharagpur Quiz Club
 
ECONOMICS, DISASTER MANAGEMENT, ROAD SAFETY - STUDY MATERIAL [10TH]
ECONOMICS, DISASTER MANAGEMENT, ROAD SAFETY - STUDY MATERIAL [10TH]
SHERAZ AHMAD LONE
 
Public Health For The 21st Century 1st Edition Judy Orme Jane Powell
Public Health For The 21st Century 1st Edition Judy Orme Jane Powell
trjnesjnqg7801
 
CRYPTO TRADING COURSE BY FINANCEWORLD.IO
CRYPTO TRADING COURSE BY FINANCEWORLD.IO
AndrewBorisenko3
 
Filipino 9 Maikling Kwento Ang Ama Panitikang Asiyano
Filipino 9 Maikling Kwento Ang Ama Panitikang Asiyano
sumadsadjelly121997
 
English 3 Quarter 1_LEwithLAS_Week 1.pdf
English 3 Quarter 1_LEwithLAS_Week 1.pdf
DeAsisAlyanajaneH
 
NSUMD_M1 Library Orientation_June 11, 2025.pptx
NSUMD_M1 Library Orientation_June 11, 2025.pptx
Julie Sarpy
 
M&A5 Q1 1 differentiate evolving early Philippine conventional and contempora...
M&A5 Q1 1 differentiate evolving early Philippine conventional and contempora...
ErlizaRosete
 
How to Manage Different Customer Addresses in Odoo 18 Accounting
How to Manage Different Customer Addresses in Odoo 18 Accounting
Celine George
 
Q1_ENGLISH_PPT_WEEK 1 power point grade 3 Quarter 1 week 1
Q1_ENGLISH_PPT_WEEK 1 power point grade 3 Quarter 1 week 1
jutaydeonne
 
Paper 106 | Ambition and Corruption: A Comparative Analysis of ‘The Great Gat...
Paper 106 | Ambition and Corruption: A Comparative Analysis of ‘The Great Gat...
Rajdeep Bavaliya
 
A Visual Introduction to the Prophet Jeremiah
A Visual Introduction to the Prophet Jeremiah
Steve Thomason
 
Great Governors' Send-Off Quiz 2025 Prelims IIT KGP
Great Governors' Send-Off Quiz 2025 Prelims IIT KGP
IIT Kharagpur Quiz Club
 
Q1_TLE 8_Week 1- Day 1 tools and equipment
Q1_TLE 8_Week 1- Day 1 tools and equipment
clairenotado3
 
Photo chemistry Power Point Presentation
Photo chemistry Power Point Presentation
mprpgcwa2024
 
OBSESSIVE COMPULSIVE DISORDER.pptx IN 5TH SEMESTER B.SC NURSING, 2ND YEAR GNM...
OBSESSIVE COMPULSIVE DISORDER.pptx IN 5TH SEMESTER B.SC NURSING, 2ND YEAR GNM...
parmarjuli1412
 
2025 June Year 9 Presentation: Subject selection.pptx
2025 June Year 9 Presentation: Subject selection.pptx
mansk2
 
How payment terms are configured in Odoo 18
How payment terms are configured in Odoo 18
Celine George
 
Ad

ADVANCED DIFFUSION APPROACH TO DYNAMIC LOAD-BALANCING FOR CLOUD STORAGE

  • 1. International Journal of Distributed and Parallel Systems (IJDPS) Vol.10, No.2/3, May 2019 DOI:10.5121/ijdps.2019.10301 1 ADVANCED DIFFUSION APPROACH TO DYNAMIC LOAD-BALANCING FOR CLOUD STORAGE Eman Daraghmi1 and Yousef-Awwad Daraghmi2 1 Department of Applied Computing, Palestine Technical University Kadoori (PTUK), Tulkarm, Palestine 2 Department of Computer Systems Engineering, Palestine Technical University Kadoori (PTUK), Tulkarm, Palestine ABSTRACT Load-balancing techniques have become a critical function in cloud storage systems that consist of complex heterogeneous networks of nodes with different capacities. However, the convergence rate of any load-balancing algorithm as well as its performance deteriorated as the number of nodes in the system, the diameter of the network and the communication overhead increased. Therefore, this paper presents an approach aims at scaling the system out not up - in other words, allowing the system to be expanded by adding more nodes without the need to increase the power of each node while at the same time increasing the overall performance of the system. Also, our proposal aims at improving the performance by not only considering the parameters that will affect the algorithm performance but also simplifying the structure of the network that will execute the algorithm. Our proposal was evaluated through mathematical analysis as well as computer simulations, and it was compared with the centralized approach and the original diffusion technique. Results show that our solution outperforms them in terms of throughput and response time. Finally, we proved that our proposal converges to the state of equilibrium where the loads in all in-domain nodes are the same since each node receives an amount of load proportional to its capacity. Therefore, we conclude that this approach would have an advantage of being fair, simple and no node is privileged. KEYWORDS Load balancing, cloud storage, Heterogeneous, Simulation, Task assignment 1. INTRODUCTION Load-balancing techniques have become a critical function in cloud storage systems that consist of hundreds of independent storage nodes (or nodes for short). In such systems [1], nodes simultaneously serve computing and storage functions where a file is partitioned into a large number of disjointed and fixed-size pieces (or file chunks), and each file chunk is assigned to a different cloud storage node so that the load of a node is typically proportional to the number of file chunks the node possesses. Thus, it is possible to improve the overall performance of the cloud storage system by balancing the load among the distributed nodes. In general, load- balancing algorithms are designed to distribute the loads over multiple nodes in a way that ensures expanding resource utilization, maximizing throughput, minimizing response time, and avoiding the overload situation where one node is heavily loaded with excess of loads while another node is lightly loaded or idle. Practically, distributed file systems for clouds [2], such as GFS, utilize the centralized approach to simplify the design as well as the implementation of a distributed file system, to manage the metadata information of the systems and to balance the loads of storage nodes based on that metadata. However, when increasing the number of storage nodes, the number of files to be stored and the number of files to be accessed, central nodes become a bottleneck. Additionally, if the central nodes fail, then the whole file system fails as well. As a solution, many studies have proposed a number of dynamic load-balancing algorithms to eliminate the dependence on central
  • 2. International Journal of Distributed and Parallel Systems (IJDPS) Vol.10, No.2/3, May 2019 2 nodes by allowing the storage nodes balance their loads spontaneously. The main objective of these previous studies was to propose a better algorithm and to develop a new approach to remedy shortcomings in previous efforts. In fact, previous algorithms are designed to be scalable, portable, easy to use and more improved. Improvements include the derivation of a faster algorithm that transfers less work to achieve a balanced state than other algorithms, and a mechanism for selecting and transferring the loads to other machines in order to improve the algorithm performance. They found that the performance of any load-balancing algorithm as well as its convergence rate deteriorated as the number of nodes in the system, the diameter of the network and the communication-overhead increased. They concluded that increasing the number of nodes in the system, from one hand; make it not feasible for a node to collect the load- information from all nodes in the system and, from the other hand, leads to difficulties in using the collected load-information as most of this information will be out of date which result in lower performance. In other word, the more complex the system is, the less performance achieved. Therefore, our proposed solution attempts to bolster the previous approaches efforts by applying the going-vertical principle for the dynamic diffusion load-balancing technique to consider both the virtual structure of the system and the variables of the load-balancing algorithm. In other words, this principle aims at scaling the system out not up – allowing the system to be expanded by adding more nodes without increasing the system complexity nor the need to increase the power of each node while at the same time increasing the overall performance of the system. The key idea is to simplify the structure of the system by breaking down the entire complex heterogeneous network into simpler domains or clusters of homogeneous nodes based on the property of each node in the system. As a result, the number of nodes, the diameter of the network as well as the communication overhead are decreased and thus the overall performance is increased. In summary, the objectives of this paper are: (1) to improve the performance of cloud storage systems by applying dynamic load-balancing technique that employs the going on vertical principle; (2) to propose an algorithm that re-balancing tasks to storage nodes by allowing the storage nodes balance their loads spontaneously such that each node obtains load proportional to its capacity and thus achieving the fairness state which in turn eliminates dependence on central nodes. Our proposal was evaluated through computer simulations as well as mathematical analysis, and it was compared with the centralized approach and the original diffusion technique. Results show that our solution outperforms them in terms of throughput and response time. Finally, we proved that our proposal converges to the state of equilibrium where the loads in all in-domain nodes are the same since each node receives an amount of load proportional to its capacity. Therefore, we conclude that this approach would have an advantage of being fair, simple and no node is privileged. 2. RELATED WORK Presented here is a summary of work related to the approaches and techniques used in this paper [2, 4, 5, 6, 7]. In complex huge systems, it is not feasible for a node to collect the information of all other nodes in the system. Thus, the Going-Vertical principle is a technique aims at converting the complex heterogeneous system, by breaking it out, to several simple clusters of homogeneous nodes. This technique considers both the properties of each node in the system, such as its functionalities and/or capacity, as well as the objective of the system that will execute the load- balancing algorithm to group those nodes who have similar properties into clusters or domains and thus virtually simplifying the structure of the complex system. As mentioned before, load- balancing becomes harder when more loads need to be balanced across a larger system. Moreover, the performance of any load-balancing algorithm (i.e. throughput) as well as the convergence rate of it deteriorated as the number of nodes, the diameter of the network and the communication overhead increased; thus, employing this technique to any load-balancing algorithm aims at scaling the system out not up means allowing the system to be expanded by
  • 3. International Journal of Distributed and Parallel Systems (IJDPS) Vol.10, No.2/3, May 2019 3 adding more nodes while at the same time increasing the performance of the system load- balancing algorithm without the need to increase the power of each node, and maintaining the homogeneity property in-domain. Fig.1 shows the concept of the principle. Figure.1. The concept of the going vertical principle The diffusion approach is a dynamic load balancing technique that allows the nodes to communicate and migrate tasks with other nodes. Each node balances the load among the other nodes in the hope that after a number of iterations the whole system will approach the balanced state. In the diffusion approach, each node simultaneously sends the excessive load to its under- loaded neighbor nodes and receives loads from its neighbor nodes with higher load. Under the synchronous assumption, the diffusion method has been proven to converge in polynomial time for any initial load distribution given the quiescent assumption that no new load is generated and no existing load is completed during execution of the algorithm. Since it is not necessary to have a global coordinator, the diffusion approach is inherently 1oca1, fault tolerant and scalable. Hence this approach is a natural choice for load balancing in a highly dynamic environment [8]. In 1989, Cybenko [9] proposed the first diffusion scheme for dynamic load balancing on a message passing multiprocessor networks. According to his method the load distribution at time t is quantified by a vector where t i l is the load of node i at time t ≥ 0. In each round t, node i and node j compare their load and node j sends tokens to node i if node j has more loads than node i. Cybenko method requires ld(n) steps, where n is the number of processors and ld denotes the logarithm to base 2. The method utilizes the topology of the hypercube machine for its efficiency, but ignores any dependencies between the individual items of data moved. In 1990, Boillat [10] et al. presented an approach to solve the load balancing problem for parallel programs. They presented a fully distributed load balancing algorithm, consisting of the same process running in parallel on each processor of a given network. No assumption has to be made concerning the structure of the underlying network. They show the number of iterations in several cases to be of the form 2 (n )O where n is the number of processors. Practically, neighborhood load balancing algorithms are diffusion algorithms that have the advantage that they are very simple and that the vertices do not need any global information to base their balancing decisions on. Another advantage is that balancing with neighbors has the tendency to keep load items initiated by one vertex in the neighborhood of that vertex. 3. LOAD BALANCING PROBLEM FORMULATION Formally, a large-scale cloud storage file system is modeled as an undirected graph ( ,E)G V where V represents the set of chunkservers or nodes and E describes the connections among them. The cardinality of V is | |V n wheren can be one thousand, ten
  • 4. International Journal of Distributed and Parallel Systems (IJDPS) Vol.10, No.2/3, May 2019 4 thousand, or more. The n chunkservers in the system i v V stores a number or a set of files F where any file f F is partitioned into a number of disjointed, fixed size chunks denoted by f C . For example, in Google File System GFS, each chunk has 64 Mbytes [11]. Since each chunkserver i hosts a number of fixed size files chunk, the load i L of a chunkserver is proportional to the number of chunks hosted by the server. To simulate the worst case, we assume that the chunkservers are heterogeneous in which each server has different capacity. Moreover, the files in F may be arbitrarily created, deleted, and appended. The net effect results in file chunks not being uniformly distributed to the chunkservers. Our objectives in this paper are to design a load- balancing algorithm to reallocate the file chunks such that the chunks can be distributed to the system as uniformly as possible and each chunkserver hosts a number of file chunk proportional to its capacity. Definition 1 (Going Vertical Principle). Given a network of heterogeneous chunkservers or nodes ( , )G V E such as each node has its capacity and with any assigned file-chunks, a principle or a relation Rto be found that classifies the nodes based on their capacities into domains or clusters of homogeneous nodes and then the load will be transferred among only nodes in the same domain is the going vertical principle. More formally, the semantic of the relation is defined as follows: where 1 [a ,...,a ]n is the set of attribute names unique such as capacity to N and 1 t[a ,...,a ]n is the restriction of t to this set. It is usually required that the attribute names in the header of D are a subset of those of N because otherwise the result of the operation will always be empty. Definition 2 (Dynamic Load-Balancing Problem). Given a large scale distributed file system ( , )G V E of | | nV  heterogeneous chunkservers and a set of files F such that each file is partitioned into fixed size chunks, the dynamic load balancing problem is to employ the going vertical principle to convert the set of heterogeneous chunkservers into several clusters or domains of homogeneous chunkservers so that the load- balancing algorithm efficiently redistribute the file chunks among the in-domain chunk servers such that if G is stable in a sufficient time period, the file-chunks allocated at each chunkserver in one domain ; is fair, that is, 1 2 ... n L L L   . Considers that for all i v V when the load in all one domain nodes are equal 1 2 ... n L L L   . When this happens, the domainG is said to have achieved local fairness. Obviously achieving the local fairness in all domains means the entire system achieves the fairness state. 4. OUR PROPOSAL Our proposed algorithm is shown in algorithm1. NeighborLB. Each node in in the systemG that executes the same algorithm in parallel has a unique node id and a capacity, which defined as the maximum number of file chunk that can the chunk server host. First, the structure of the system is simplified by applying the going-vertical principle; thus, all the chunkservers that have the same capacity form one local-domain or cluster such that chunkservers in one cluster are neighbors and they can exchange their metadata load information and a file chunk can only be migrated to another chunkserver in the domain. As a result, the graph diameter, the number of nodes that will exchange the load information and communication overhead is decreased. In this paper, because of the size limitation, we only focus of the load-balancing algorithm not the management of the domains. Following sub-sections illustrate the proposed algorithm in details.
  • 5. International Journal of Distributed and Parallel Systems (IJDPS) Vol.10, No.2/3, May 2019 5 4.1. Initialization Each chunkserver i v hosts a number of fixed size file-chunks . Each chunkserver i v initializes its state (initialization stage) in steps 1 through step 3. First, by applying the going vertical principle, all nodes who have the same capacity form one domain. This pre- initialization step means converting the heterogeneous system into several clusters of homogeneous nodes. Step 1: Each chunkserver i v defines a set info to store its information and the in-domain chunkservers neighbors inf { , }i i o v L   , where iv is the node id and iL is its load i.e. the number of chunks the node hosts. Step 2: Each chunkserver i v defines an array .to mig to store the amount of the migrated load that node in will transfer to its in-domain neighbors. Step 3: Each node iv computes its initial load |CF |i i L   4.2. Information Broadcasting Step 4: Each chunkserver i v then broadcasts its initial state to all its in-domain neighbors. Note that, each node maintains a FIFO message queue which holds the incoming messages. Each message has the format , ," ",gf f v L T  where the message came from f v , its load f L ,T is the type of the message, and g is the migration information. There are three types of messages: 1. Request message (“R”): iv receives a message to be informed that additional load submitted to it. 2. Load Migration message (“G”): iv sends a “G”-message to jv to tell it that iv wants to migrate g units of load to jv . 3. Broadcast message (“B”): broadcast the status (i.e. id, load to all in-domain neighbors). Step 5: The main part of the algorithm starts when the node takes the first message from the queue and processes the message according to its type. Initially, first messages received by each node iv are “B” type messages. 4.3. Computing the Average In-Domain-Load and Finding In-Domain Assistant Neighbors Step 6: After receiving the information of all in-domain neighbors, each node in computes the average in-domain load in which a node is located. The average in-domain load is defined as inf |Cf | |info| i i o avg L    . Step 7: According to the set info of node in and the average in-domain-load, node in defines a set of assistant neighbors lower N whose loads is less than the in-domain average load. The transferring strategy Step 8: The decision of calling a procedure LB to migrate the excess files chunks or not depends on the difference between the current load of the node iv and the average in-domain load. Therefore, the requests will be migrated if the load difference is positive. Hence, we will show later that the local domain will rapidly converge to a state where 0i avg L L  for all edges. The pseudo-code of the procedure Load-Balance is given in Procedure LB.
  • 6. International Journal of Distributed and Parallel Systems (IJDPS) Vol.10, No.2/3, May 2019 6 1 1 lg 1. : The node where the algorithm is executed. ( ) { }The set of in-domain neighbors ={f c ,...,f c } The set of hosted file chunks by node i Begin 1.Let info { ,L } 2. Let i i i i m s i i A orithm NeighborLB n Adj n n Cf n       mig( )=0 for all ( ) 3.Compute the initial Load: L |Cf | 4.For each node ( ) do send message< ,L ,"B",0> 5.Read messages from the messages queue a. if T="B" then info= info { ,L j j i i i j i i i f f n n Adj n n Adj n n n       inf } b. if T="G" then info= info { ,L , ,L } For each node ( ) do send message< ,L g,"B",0> 6.Compute the average in-domain-load L | ( )| 7.Define the se i i f f j i i i j j o avg i n g n g n Adj n n L Adj n            t of Assistant Neighbors For each node ( ) do if L then N N 8.Let load-difference (L - L ) 9.If 0 then exit; else ( ,N , ) EndBegin j i j avg lower lower j i avg i lower n Adj n L n LD LD LB Cf LD      4.4. Load-Balancing Mechanism In the procedure LB, the load difference LD, the set of in-domain assistant neighbors and the set of the hosted file chunks are formed the procedure input parameters. In this step all the over- loaded nodes call the procedure LB to migrate the excess file chunks to the under-loaded nodes. Each overloaded node sorts the set of assistant neighbors in ascending order. The file chunks that will be migrated from the overloaded node is the load difference between the load of the overloaded node and the average in-domain load or the difference between the average load and the assistant neighbor. In addition, this amount spread to the assistant neighbors ensures the node who will receive the file chunk maintains the under-loaded status. Figure.4 shows the load- balancing mechanism. Pr LB( ,LD ,N ) Begin 1. Sort the nodes in N in acsending order 2. for nodes in N compute =L if LD then send message <n ,L ,"G",LD > else if LD then send me i i lower lower lower avg j i i i i i ocedure Cf L      ssage <n ,L ,"G", > LD = LD - i i i i Endfor EndBegin  
  • 7. International Journal of Distributed and Parallel Systems (IJDPS) Vol.10, No.2/3, May 2019 7 Figure. 2. Load-balancing example 5. ALGORITHM ANALYSIS 5.1. Time Complexity Analysis Most of the operations in the proposed algorithm take (1)O time. Since the broadcasting operations have time complexities O( )i d , the idea of our approach is to improve the performance of load-balancing algorithm by considering both the algorithm parameter and the virtual structure of the system that will execute that algorithm and thus the number of in-domain neighbors | ( )|i i d Adj n of each node is decreased. That has benefits in reducing the communication delay and the amount of out of date information. Also, if “info” and “to.mig” objects are implemented as arrays, then steps 1 and 2 have time complexities O( )i d and thus each individual update takes (1)O time. Moreover, the sorting steps for the LB procedure has the worst case time complexity of O( log )c c where c is the number of in-domain assistant neighbor, suppose that merge sort is used. The for-loop only takes O( )c since each entry in sorted is referenced only once. So NeighborLB algorithm runs only at O( log )c c time. 5.2. The Convergence In this section we prove that the NeighborLB algorithm converges to the state of fairness given sufficient time. Lemma 1. Given 1 2 ( , ,..., )t t t t n L l l l is the loads array of the in-domain nodes at time t where 1 t l is load of node 1 at time . In time t, if there is at least one overloaded node (i.e. 0LD  ) then 1t L  is lexicographically greater than t L means the lightly-loaded nodes at time t will receive load at time t+1. Proof: Let X  be the set of overloaded nodes in domain (i.e. nodes with 0LD  ) who needs to migrate some loads to other nodes at time t . In reality, a nodei X will also host additional loads in timet . Thus, the nodes that migrate loads in time t will reduce their load at time t+1. Let  be the node that has lowest load at time t+1. Assume that  occupies the th k position of the array 1t L  where 1t L  similarly is the load array of in-domain nodes at time t+1 sorting in ascending order. Let
  • 8. International Journal of Distributed and Parallel Systems (IJDPS) Vol.10, No.2/3, May 2019 8  1 2 1 , , ,t t t t k Q l l l    be the array of the loads in first k-1 positions of t L . In order to prove this lemma we have to consider two cases: A set 1t Q  contains a node i that received loads in timet . Thus, node i belongs to both t Q and 1t Q  , and its load value is increased in time 1t  since it will received some migrated load. Therefore, there will be at least one load value in set 1t Q  strictly greater than one value in t Q . Accordingly, 1t L  is lexicographically greater than t L .There are nodes in which located in 1t Q  and did not migrate or receive loads at timet . In this case, the load value at th k position at time 1t  is strictly greater than the load value in the same position at timet which has received load from t X and therefore, 1t L  is lexicographically greater than t L . Theorem 1 Convergence. In heterogeneous system, if each domain of nodes executes the NeighborLB algorithm, then the system converges to a balanced state. Proof: Given a heterogeneous unbalanced system. Assume that the going-vertical principle, first, applied to this system to spread the nodes into several domains and some of these domains are unbalanced. Each domain separately execute the NeighborLB algorithm in order to achieve the convergence state. Consider one domain N . Let i be the most heavily loaded node in that domain, i.e. 0i avg l l  , and all other nodes in-domain j N who have j i l l and j avg l l form the set of assistant neighbors of node i. When i avg l l and thus 0i avg l l  . Thus, the result of Lemma 3 shall be used, which guarantees that the array of loads sorted in ascending order, in the next time moment, is lexicographically greater than the array of the current step. Given that the NeighborLB algorithm is executed in some domains in a given time t . Let S N be the domain of nodes executed the algorithm in time t . Let t L be the array of loads in one domain sorted in ascending order in time t . It has been proven in lemma 1 that 1t L  is lexicographically greater than t L . Let min S be the lightly loaded node in time t . There exists at least one node min v S which is in-domain neighbor to node k such that t t k v l l . Now by using the proposed algorithm, node k migrates a portion of its excess load to node v , but v does not migrate any loads in time t because v is under loaded when compared to the average load of the domain. In time 1t  the effective- load of node k decreases; however, its load value never become less than the load value of node v which is given by 1t t k avg v l l l   . Thus, 1t L  is lexicographically greater than t L meaning that the sorted array of load values of nodes in time 1t  are lexicographically greater than the sorted array of the load values of nodes at timet . 6. SIMULATIONS The performance of our proposed algorithm was examined through a computer simulation that was implemented with CloudSim [12,13] and was compared to the original diffusion neighborhood method and the centralized approach. Initially, the test of our proposed method was based on two parameters: the number of chunk files and the number of storage nodes. The measurement of the performance of the proposed algorithm was based on two metrics: throughput and the response time. Only one parameter was changed each time so that any changes in the performance would be based solely on this parameter. Therefore, two tests were produced for each parameter to allow a rough average and standard deviation to be obtained. In fact, results achieved from these tests were used to study: (1) the behavior of different load-balancing algorithms under the same condition; (2) the behavior of the algorithms for random systems with
  • 9. International Journal of Distributed and Parallel Systems (IJDPS) Vol.10, No.2/3, May 2019 9 different number of storage nodes; (3) the behavior of the algorithms for different load distributions. 6.1. Changing the Number of File Chunks To study the effects of changing the number of file chunks on the average response time and the throughput, the number of file chunks was varied from 1000-10,000 chunks and the distribution of the chunks among the storage nodes were carried in the following manner.  The initial distributions varying 25% from the in-domain average load to represent a situation where all nodes have similar loads at the beginning and those loads are close to the in-domain average load; in other word, the initial situation is quite balanced.  The initial load distributions varying 50% from the average load to constitute the intermediate situations.  The initial load distributions varying 75% from the average load to constitute the higher intermediate situations.  The initial load distributions varying 100% from the average load to constitute the advanced unbalanced situations. 6.1.1. Average Response Time The total time taken for the three algorithms increased as the number of file chunks was increased as shown in Figure.3. This is expected as the more files to be stored, the longer it takes to complete the storing tasks. However, it was observed that our proposed method performed better than the centralized scheme and the original diffusion algorithm. In addition, when comparing the results of the method and the centralized algorithm, it is observed that the gap between these two curves was widening as the assigned loads was increased. This shows that the method actually reduced the completion time by a considerable amount (greater speedup) in comparison to the centralized algorithm as amount of loads increased. Figure. 3. Number of File Chunks vs. Response Time
  • 10. International Journal of Distributed and Parallel Systems (IJDPS) Vol.10, No.2/3, May 2019 10 6.1.2. Throughput As shown in Figure.4, our method outperformed the original diffusion neighborhood algorithm in terms of the system throughput in all loads distribution cases. The throughput using our method was in the range of 89-98 percent while the nearest neighborhood algorithm only had a utilization of 80-94 percent. 6.2. Varying the Number of Storage Nodes To study the effects of changing the number of storage nodes on the average response time and the throughputs, the number of nodes were varied from 10– 100 nodes and the distribution of the overloaded nodes were carried in the following manner.  25% of storage nodes are idle, 75% of storage nodes are overloaded.  5o% of storage nodes are idle, 50% of storage nodes are overloaded.  75% of storage nodes are idle, 25% of storage nodes are overloaded. Figure. 4. Number of File Chunks vs. Throughput 6.2.1 Response Time Figure. 4 shows that the response time improved when the number of nodes was increased. However, this improvement was mainly caused by the fact that more nodes were used for larger domain. Therefore, even though there were more loads to be scheduled in each round, the extra load was easily handled by the additional nodes.
  • 11. International Journal of Distributed and Parallel Systems (IJDPS) Vol.10, No.2/3, May 2019 11 Figure. 5. Number of nodes vs. Response time 6.2.2. Throughput Figure. 5 shows that the throughput in the original neighborhood algorithm decreased as the number of nodes in the system increased. However, in our proposed method, the number of nodes is divided into several domains keeping the number of node in a domain reasonable. This shows that load-balancing is harder when more tasks are to be balanced out across a larger system. Figure. 6. Number of Nodes vs. Throughput 6.3. Discussion This section summarized the performance of the proposed solution as compared to the original diffusion method and the centralized scheme (see table 1 below). Each of the test used the response time and the throughput as performance measures. The performance of these methods was compared in many cases by changing the parameters of the algorithm. The parameters varied (one at a time) were the assigned loads to be executed and the number of nodes. It was observed that our proposed method performed better than the other approaches. The number of nodes, the network diameter and the communication delay affect the convergence rate of any load-balancing algorithm as well as its performance. It is intuitive that a graph or a system with longer diameter will take longer time to converge as the number of iterations to propagate the loads to all assistant neighbors is proportional to the network diameter. In addition, more communication delays lead to out of date information. Our proposed method considers both the structure of the network that will execute the algorithm and the algorithm parameters. It works, first, by simplifying the structure of the system which in turn decreases, from one hand, the communication overhead between the in-domain neighbors which lead to faster response time and, from the other hand, the time need to choose the assistant neighbors and the target node that will receive the migrated loads. This effect appears clearly when the assigned loads and the number of nodes increased. Moreover, reducing the communication delay improves the load evaluation since the effect of out of date information will be decreased. Also, considering the processing speed and thus the processing capacity of each node leads to more accurate average load evaluation which improves the algorithm performance. The importance of the average load appears when deciding the amount of loads to be migrated; if the migrated loads to one node is too small, then the load distribution will take longer and so the convergence rate. In contrast, if the migrated loads to one node is too large, then the overloaded node may transfer too much load to its neighbor and thus this overloaded node will not have sufficient load to transfer to the remaining neighbors. Thus, by using the in-domain average load, each node obtains an amount of load proportional to its
  • 12. International Journal of Distributed and Parallel Systems (IJDPS) Vol.10, No.2/3, May 2019 12 capacity and thus no node is privileged. This indicates reliable performance of the method when the assigned loads increases that is very valuable from a practical point of view. Table 1. A comparison between our proposed solution and both the centralized approach as well as the diffusion method Centralized Original Diffusion Our Method (diffusion + applying the GV principle) Pros Simple design, simple implementation, good performance. Good performance, solve the bottleneck and the failure problems in centralized approach. Good performance even with large systems, simplify the network structure, solve the cons of the original diffusion method and the centralized approach. Cons Bottleneck problem, failure possibility Communication overhead, low convergence rate and low performance for large heterogeneous systems. Need a good skills to define the properties for each node. 7. CONCLUSION This paper considered load-balancing mechanism in cloud storage systems. As the convergence rate of any load-balancing algorithm as well as its performance deteriorated as the number of nodes in the system, the diameter of the network and the communication overhead increased, our proposal that employed the going-vertical principle has been very effective especially in the case of a large number of nodes and dense loads. In fact, a going-vertical based scheme works better when the number of nodes is large since the key idea of the proposed method is that the communication occurs between only the in-domain node reduces the impact of communication delay on freshness of the load information which in turn allows the method to handle all load- balancing information and thus all load-balancing decisions with minimal inter node communication. In other words, we aimed at not only considering the parameters that will affect the algorithm performance but also simplifying the structure of the network that will execute the algorithm. Finally, we proved that the proposed algorithm under this approach converge to the state of equilibrium where the load in all nodes is the same since each node receives an amount of load proportional to its capacity. Therefore, we conclude that this approach would have an advantage of being fair, simple and no node is privileged. REFERENCES [1] H.-C. Hsiao, H.-Y. Chung, H. Shen and Y.-C. Chao, "Load Rebalancing for Distributed File Systems in Clouds," IEEE Transactions on Parallel and Distributed Systems, vol. 24, no. 5, pp. 951-962, 2013. [2] E. Y. Daraghmi and S. M. Yuan, "In-domain neighborhood approach to heterogeneous dynamic load balancing in real world network," in 14'th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT'13), Taipei, Taiwn, 2013. [3] C. P. A. a. P. Berenbrink., "Distributed selfish load balancing with weights and speeds.," in The 2012 ACM symposium on Principles of distributed computing, New York, USA, 2012. [4] J. Bahi, R. Couturier and F. Vernier, "Synchronous Distributed Load Balancing on Totally Dynamic Networks," in Parallel and Distributed Processing Symposium, 2007.
  • 13. International Journal of Distributed and Parallel Systems (IJDPS) Vol.10, No.2/3, May 2019 13 [5] E. Luque, A. Ripoll and A. C. a. T. Margalef, "A distributed diffusion method for dynamic load balancing on parallel computers," in Euromicro Workshop on Parallel and Distributed Processing, 1995. [6] P.Neelakantan, "Decentralized Load Balancing In Heterogeneous Systems Using Diffusion Approach," International Journal of Distributed and Parallel systems (IJDPS), vol. 3, no. 1, pp. 229 - 239, 2012. [7] C.-C. Hui and S. Chanson, "A hydro-dynamic approach to heterogeneous dynamic load balancing in a network of computers," in Proceedings of the 1996 International Conference on Parallel Processing Software., 1996. [8] G. Cybenko, "Dynamic load balancing for distributed memory multiprocessors," Journal of Parallel and Distributed Computing, vol. 7, no. 2, pp. 279-301, 1989. [9] J. E. Boillat, "Load balancing and Poisson equation in a graph," Concurrency: Practice and Experience, vol. 2, no. 4, pp. 289-313, 1990. [10] "Google File System," [Online]. Available: http://en.wikipedia.org/wiki/Google_File_System. [11] R. N. Calheiros, R. Ranjan, A. Beloglazo, C. A. F. D. Rose and R. Buyya, "CloudSim: a toolkit for modeling and simulation of cloud," SOFTWARE – PRACTICE AND EXPERIENCE, vol. 41, no. 1, pp. 23-50, 2010. [12] R. M. I. Stoica, D. Liben-Nowell, D. R. Karger, M. F. Kaashoek, F. Dabek and H. Balakrishnan, "Chord: a Scalable Peer-to-Peer Lookup Protocol for Internet Applications," in Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols, San Diego, California, USA, 2001.