Why is RAID 2 not used? - Darwin's Data

RAID (Redundant Array of Independent Disks) is a technology that combines multiple disk drive components into a logical unit for the purposes of data redundancy and performance improvement. There are several standardized RAID levels, each with its own benefits and drawbacks. RAID 2 is one of the earlier and more complex RAID levels that is rarely implemented today. There are several key reasons why RAID 2 has fallen out of favor compared to other RAID levels:

Table of Contents

High Overhead

RAID 2 utilizes disk striping with dedicated Hamming code parity bits for error correction. This means data is striped across multiple drives in a stripe set, with each stripe containing a Hamming code for error detection and correction. While this provides very robust protection against drive failures, the addition of the parity bits significantly increases overhead. For a four disk RAID 2 array, the overhead is a staggering 100% – meaning half of the total disk capacity is consumed for parity. This makes RAID 2 inefficient in terms of usable capacity compared to other RAID levels.

Not Widely Supported

Unlike RAID 0, 1, 5, 6 and 10, RAID 2 is not widely supported in drive firmware or operating system software RAID implementations. The complex calculations required for generating and checking Hamming code make it difficult to implement in a performant manner, especially for larger stripe sets. As a result, RAID 2 support never became commonly included and most RAID implementations top out at RAID 6.

No Write Penalty Advantage

RAID 2’s benefit is the ability to correct single bit errors in any stripe of data. However, modern drive capacities are such that even an unrecoverable read error from a single drive is unlikely to affect a whole stripe write across multiple high capacity drives. And with ever improving drive reliability, even unrecoverable read errors are increasingly unlikely. As a result, RAID 2’s advantage of single bit error correction on reads is less relevant.

Additionally, RAID 5 and 6 provide single drive fault tolerance on writes without RAID 2’s overhead. So the write performance of RAID 5/6 is superior to RAID 2. For these reasons, the read error correction advantage of RAID 2 is diminished when compared to other RAID levels.

Limited Drive Scalability

The Hamming code implemented in RAID 2 has limits on the number of data disks that can be supported in a single stripe set. Typical implementations support between 4 to 16 data drives. Beyond this, the parity overhead becomes excessive or the Hamming encoding method breaks down. This makes RAID 2 less flexible in terms of scalability, as raw storage capacity scales up with drive density but RAID 2 is limited to a relatively small number of drives.

Cost and Power Inefficiency

Since RAID 2 consumes 50% or more of array capacity for parity, it is extremely inefficient in terms of overall storage costs and power consumption. The high overhead drives up the cost per usable gigabyte significantly. And all those extra drives incur additional power and cooling overhead. Compared to more efficient RAID levels like RAID 5/6, RAID 10, or erasure coding schemes, the resource inefficiency makes RAID 2 undesirable.

Better Options Available

When RAID 2 was introduced in the late 1980s, it provided sophisticated error detection beyond simple parity schemes. But storage technology has evolved greatly since then, providing options that make far more efficient use of capacity while still providing excellent fault tolerance and performance:

RAID 5 and 6 offer single and double parity respectively while keeping overhead to a minimum.

RAID 10 provides mirroring for performance and capacity balancing.
Advanced erasure coding schemes provide even more efficient fault tolerance compared to RAID.
End-to-end integrity checking like checksums can be implemented above RAID layer for added data validation.

These modern options provide comparable or superior data protection to RAID 2 without the crippling capacity overhead. They can be implemented efficiently even for very large scale storage deployments. For these reasons, they have supplanted RAID 2 in most usage scenarios.

Limited Adoption in Niche Use Cases

RAID 2 does still see some limited use in certain niche applications where bit error rates are a critical concern. High end HPC clusters dealing with massive datasets may opt to use RAID 2 despite its inefficiency, valuing the robust error detection capabilities over capacity utilization. But outside of these rare scenarios, the downsides of RAID 2 have prevented mainstream adoption.

Conclusion

In summary, RAID 2 is not used today for several key reasons:

Very high overhead, only 50% usable capacity
Lack of widespread software/firmware support
Minimal advantages compared to more advanced RAID levels

Poor drive scalability due to Hamming code limits
Inefficient use of resources driving up costs
Viable modern alternatives like RAID 6, RAID 10, erasure coding

Outside of some niche HPC applications, the downsides of RAID 2 far outweigh its benefits compared to other options. The storage world has moved on, and RAID 2 remains a relic of the past only seen in textbooks rather than modern data centers.