Erasure Coding (EC) is a data protection mechanism which protects against data loss by breaking data items, such as files, into fragments, and calculated additional data pieces (parity information) and storing them across a set of independent locations or storage media.
Traditional methods like replication have been the go-to solution for protection against data loss or corruption for decades. In recent years, however, a more efficient and resource-friendly technique has become more prevalent - erasure coding. This innovative approach not only ensures data integrity but also optimizes storage capacity and reduces the chances of catastrophic data loss. Let us delve into the elements of erasure coding, exploring what it is and how it revolutionizes the way we protect our digital assets.
What is Erasure Coding?
Like many of the more commonly known technologies, such as RAID or replication / mirroring, erasure coding is a method of data protection. Erasure coding is a class of high-performance Forward Error Correction (FEC). A simplified explanation would say that it breaks down data into smaller pieces, does some mathematical magic, and writes the pieces to different disks. Doesn’t sound too complicated, does it?
What that really means is slightly more involved though. Erasure coding schemes break down pieces of information, such as files, into fragments (sometimes also called chunks), which are enriched with redundancy information (meaning fragments are extended with results of multiple mathematical equations), and eventually distributed across multiple storage nodes and disks.
Unlike traditional replication, which duplicates the entire data, erasure coding allows for more efficient storage utilization. This method employs advanced mathematical algorithms to create parity fragments, which can later be used to reconstruct the original data even if some fragments are lost or corrupted.
The Core Principles of Erasure Coding
While it may sound like erasure coding being the new kid on the block, it was actually invented in 1960 by Irving Reed and Gustave Solomon. Together they created a new encoding mechanism, known as the Reed-Solomon code. Today, this algorithm is widely used in a large variety of systems, which include distributed storage solutions, but also communication services, or aerospace systems.
These days, while there are many more erasure coding schemes, the three most common ones are:
The Reed-Solomon code, which is simple and efficient, and can be applied to a wide range of applications, and is very common for simple data storage solutions, such as DVD and Blu-Ray disks.
The low-density parity check (LDPC or Gallager code), which is more complex but shows better performance in certain use cases, such as 10GBASE-T (10 Gbit/s Ethernet).
The turbo codes, originally invented by Claude Berrou in 1991, are more complex than LDPC but provide the best performance of data protection to efficiency ratio, and are widely used in mobile communications technologies such as UMTS and LTE.
Anyhow, all the different implementations combine a set of particular features. Storage solutions, utilizing erasure coding for data protection, most commonly use either a Reed-Solomon or LDPC algorithm.
Data Fragmentation
Erasure coding begins by breaking down the original data into smaller fragments. These fragments are the building blocks that will be distributed across the storage nodes. The size and number of these fragments depend on the specific erasure coding scheme being used.
Parity Creation
Parity fragments (sometimes called coding chunks) are generated using mathematical functions that operate on the original data fragments. These parity fragments are calculated in such a way that any combination of original fragments and parity fragments can be used to reconstruct the original data. This redundancy is the key to the ability of erasure coding to tolerate the loss of information pieces without actual data loss.
Distribution Across Nodes
Once the data and parity fragments are created, they are distributed across different storage nodes and disks. This distribution ensures that a failure in one node does not result in the loss of the entire dataset. Each node stores a unique combination of data and parity fragments.
Reconstruction Mechanism
In the event of a node failure or data loss, the erasure coding system can reconstruct the missing or corrupted fragments using the available fragments stored on other nodes. The mathematical relationships established during the parity creation phase facilitate this reconstruction process.
Erasure Coding Profile
Common to all erasure coding algorithms are two specific numbers, called K and M. K defines the amount of fragments the original piece of information is split into, meaning that a K=3 says to split the original object, say a file, into three fragments. M, on the other hand, defines how many parity fragments are distributed. A M=2 means that the parity information is stored on two different systems. In a configuration of K=3, M=2 a storage cluster would need five servers to store the data fragments and and parity fragments.
Advantages of Erasure Coding
Erasure coding provides a set of advantages over the more traditional data protection mechanisms, such as RAID or replication.
Optimized Storage Utilization
Erasure coding significantly reduces the amount of storage space required compared to traditional replication methods. While replication duplicates data in its entirety, erasure coding introduces redundancy at the fragment level, allowing for more efficient use of storage resources.
Editor's Note: If you want to know the erasure coding overhead you can use our Erasure Coding Calculator.
Fault Tolerance
The distributed nature of erasure coding ensures that the failure of a single storage node does not result in data loss. As long as the required number of fragments is available across the surviving nodes, the original data can be reconstructed. This fault tolerance is crucial for systems requiring high availability and reliability.
Cost-Effective Scalability
Traditional replication can become prohibitively expensive as data volumes grow. Erasure coding provides a cost-effective alternative, allowing organizations to scale their storage infrastructure without a linear increase in costs.
Reduced Bandwidth Requirements
Transmitting and storing parity fragments instead of full data copies reduces the bandwidth and storage requirements. This is particularly advantageous in scenarios where network bandwidth or storage capacity is a limiting factor.
Use Cases of Erasure Coding
Erasure coding has a lot of use cases, not only in the storage ecosystem but also in the world of communication. UMTS and LTE, as well as certain satellite communication systems use erasure coding schemes to implement forward error correction.
Anyhow, in terms of storage solutions, next to consumer storage media such as DVD, there are three main storage alternatives that heavily benefit from erasure coding, both in terms of reliability or durability, as well as storage efficiency.
Cloud Storage
Erasure coding is widely adopted in cloud storage environments where cost efficiency and fault tolerance are paramount. Cloud storage solutions leverage erasure coding to ensure data durability and availability across many storage nodes or data centers.
Distributed File Systems
Systems like Hadoop Distributed File System (HDFS) and Ceph rely on erasure coding to provide fault tolerance and efficient storage utilization. Erasure coding enables these systems to handle large-scale data processing and storage requirements.
Object Storage
Object storage platforms, commonly used for archival and backup purposes, benefit from erasure coding to optimize storage space without compromising data integrity. This makes erasure coding an ideal choice for long-term data retention.
Challenges and Considerations of Erasure Coding
While erasure coding offers numerous advantages, it's important to consider that there’s always the good and the bad news. That said, erasure coding has some characteristics that need to be understood.
Computational Overhead
The encoding and decoding processes involve more or less complex mathematical calculations, which can introduce computational overhead. However, advancements in hardware and algorithm optimization have mitigated this concern to a great extent.
Latency
The additional steps involved in the encoding and decoding processes can introduce latency. Organizations must carefully evaluate their performance requirements and select an erasure coding scheme that aligns with their needs. Anyhow, the typically distributed nature of the storage using erasure coding, commonly mitigates this issue by parallelizing storage requests.
Algorithm and Scheme Selection
Different erasure coding algorithms and schemes offer varying levels of efficiency, fault tolerance, and complexity. Choosing the right scheme requires a thorough understanding of the specific use case and performance considerations.
Erasure Coding and simplyblock
Erasure coding stands as a powerful and efficient way for data protection, offering a great balance between storage efficiency, fault tolerance, and cost-effectiveness. As organizations grapple with the ever-growing volumes of data, adopting erasure coding becomes not just a choice but a strategic imperative. Erasure coding reshapes the landscape of data storage, ensuring that our digital assets remain secure, resilient, and accessible in the face of evolving challenges.
That’s why simplyblock utilizes erasure coding for data protection and fault tolerance in our clustered storage solution, enabling “more bang for the buck” in terms of storage efficiency combined with the industry-standard NVMe over TCP protocol. Simplyblock enables logical devices which are high-performance, predicable low-latency, and cost-effective, available as Kubernetes Persistent Volumes, for easiest access possible. Not to talk about all the other cool features such as compression, deduplication, encryption, thin provisioning, and more; learn more now.
Comments