Demand-Aware Erasure Coding For Distributed Storage Systems
Demand-Aware Erasure Coding For Distributed Storage Systems
ABSTRACT:
However, with erasure coding, the overhead of reconstructing data from failures
also increases significantly. Under the ever-changing workload where data
accesses can be highly skewed, it is challenging to deploy erasure coding with
appropriate values of parameters to achieve a well trade-off between storage
overhead and reconstruction overhead.
In this paper, we propose Zebra, a framework that encodes data by their demand
into multiple tiers that deploy erasure codes with different values of parameters.
Zebra automatically determines the number of such tiers and dynamically assigns
erasure codes with optimal values of parameters into corresponding tiers.
EXISTING SYSTEM:
Different from the above works where the reconstruction overhead is evaluated
in terms of network traffic or disk I/O, a tree-structured topology can be created,
which routes the traffic through the edges of the tree and alleviates the
bottleneck of sending data from existing servers to the replacement server.
The purpose of such works, instead, is to save the time of reconstruction. Such
works can be applied into our framework without affecting the network overhead
during reconstruction, and thus we focus on network overhead only in this paper.
PROPOSED SYSTEM:
This scheme is similar to the method proposed in which implements two tiers
with two other preconfigured erasure codes.
In our simulation, the two tiers both deploy RS codes or local reconstruction
codes for the purpose of fair comparison. For convenience, we name the two tiers
as hot tier and cold tier, as the hot tier will store hot data with low reconstruction
overhead while the cold tier can provide low storage overhead for the cold data.
In this scheme, under the constraint of the overall storage overhead, we try to
assign as much hot data as possible to the hot tier and store the rest in the cold
tier.