0% found this document useful (0 votes)
29 views3 pages

Erasure Coding

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views3 pages

Erasure Coding

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

A GUIDE TO RAID PT. 4: RAID VS. ERASURE CODING.

WHAT’S THE DIFFERENCE?

Erasure coding and RAID are both data protection technologies that are
used to improve the reliability and performance of storage systems.
While erasure coding is sometimes seen as a more advanced
technology than RAID, it is important to understand that they are
actually two sides of the same coin.

In this blog post, we will take a closer look at erasure coding and RAID, and
we will show you how they are related.

What is Erasure Coding?


Erasure coding is a data protection technique that breaks data into fragments
and then encodes the fragments with redundant information. This redundancy
allows the data to be reconstructed even if some of the fragments are lost or
damaged.

Erasure coding was invented in 1960 by Irving Reed and Gustave Solomon.
They developed a new type of error correction code, called the Reed-
Solomon code, which was more efficient and reliable than previous codes.
Reed-Solomon codes are now widely used in a variety of applications,
including distributed storage systems, communication systems, and
aerospace systems.

Erasure coding is often used in storage systems to protect against disk


failures. For example, a system with four disks might use erasure coding to
break the data into 6 fragments, and then store 3 fragments on each disk.
This way, if one disk fails, the data can still be reconstructed from the
fragments stored on the other disks.

There are many different erasure coding schemes, each with its own
advantages and disadvantages. Some of the most common schemes include:

 Reed-Solomon coding: This is a simple and efficient erasure coding


scheme that is well-suited for a wide range of applications.
 Low-density parity-check (LDPC) codes: These codes are more
complex than Reed-Solomon codes, but they can offer better
performance in some cases.
 Turbo codes: These codes are even more complex than LDPC codes,
but they can offer the best performance in terms of data protection and
efficiency.
The choice of erasure coding scheme depends on the specific application.
For example, a system that needs to protect against a small number of disk
failures might use a simple scheme like Reed-Solomon coding. A system that
needs to protect against a large number of disk failures might use a more
complex scheme like LDPC or Turbo codes.

Comparing Erasure Coding to RAID


RAID, a technology that originated in the 1980s, has essentially evolved into
a form of erasure code technology in a broad sense. In simpler terms, RAID
can be thought of as an example of erasure coding because it employs a
specific kind of erasure code.

For example, take parity coding. Parity coding is a straightforward erasure


code that can handle the failure of one or more drives. For instance, in a
RAID 5 setup over 4 drives, data is spread across 3 drives, and parity
information is computed and stored on the forth drive. If one drive fails, the
lost data can be reconstructed from the remaining 3 drives using the parity
information. Other RAID configurations, such as RAID 6, and RAID 7.3, also
fall under the category of erasure coding.

Erasure coding, in its narrower sense, has seen widespread use over the past
two decades. This is because it has gained popularity in recent years, thanks
to the growth of cloud computing and distributed storage systems. Erasure
coding began being utilized to optimize storage in the early 2000s. One of the
earliest commercial storage products to adopt erasure coding was Amazon
S3, the cloud storage service introduced in 2006.

However, its fundamental principles are quite similar to those used in


advanced RAID setups like RAID N+M. For instance, RAID N+M divides data
blocks across N drives while also distributing M parity blocks. Typically, M
involves 32 drives or fewer, but theoretically, this number can be unlimited.
This approach allows users to independently choose the number of drives
dedicated to storing checksums, which enables the system to recover data
even in the event of a significant failure involving up to M drives, depending
on the specific checksum distribution strategy.

As a result, it's becoming increasingly clear that the distinction between


erasure coding and RAID is quite nuanced. The foundation of advanced RAID
configurations, which is based on mathematical principles similar to erasure
coding, underscores the interconnected nature of these technologies. This
perspective reinforces the idea that, fundamentally, both erasure coding and
RAID serve as synonymous embodiments of data protection and storage
efficiency.

Erasure Coding or RAID: What to Choose?


Concluding our exploration of erasure coding versus RAID, we acknowledge
that RAID is a natural progression of erasure coding technology. This prompts
a vital question: how to choose the right RAID configuration? We've
dedicated a separate blog post to this topic, offering a comprehensive
breakdown of available RAID setups, their advantages, and practical
guidance to align them with your needs.

If you are using a RAID system, you can improve its performance by using
a software RAID engine, such as xiRAID. xiRAID is a universal tool
compatible with all RAID levels, and it can significantly improve the
performance of your RAID system.

You might also like