Why should RAID 5 no longer be used?

RAID 5 has been a popular option for many years due to its ability to provide redundancy without too much capacity overhead. However, changes in storage technologies and workloads have exposed weaknesses in RAID 5 that make it less ideal for modern deployments.

What is RAID 5?

RAID 5 is a storage array that uses distributed parity and striping. Data is striped across multiple drives, while parity information is distributed across the drives as well. This allows the array to withstand the failure of a single drive without data loss. If a drive fails, the parity information can be used to reconstruct the data from the failed drive.

A minimum of three drives is required for RAID 5 – two for data strips and one for parity. Additional drives can be added to increase capacity and performance. The parity drive capacity overhead is the equivalent of one drive. So in a four drive array, 25% of the total capacity is devoted to parity.

The Rise and Fall of RAID 5

RAID 5 gained popularity in the 1990s and 2000s for several reasons:

  • It provided redundancy without too much wasted capacity like mirroring.
  • Larger capacity hard disk drives made the overhead more manageable.
  • Most workloads at the time were not write-intensive so the parity write penalty was less noticeable.

However, in recent years, a few key factors have led to the decline of RAID 5:

  • Larger drive capacities increased rebuild times. If a 1TB drive fails, rebuilding the data takes much longer and stresses the array more than a failed 500GB drive.
  • Virtualization and other applications resulted in more random write workloads, magnifying the RAID 5 write penalty.
  • General technology advancements like SSDs and non-volatile memory exposed the limitations of RAID 5 more starkly.

The RAID 5 Write Penalty

One of the main disadvantages of RAID 5 is the write penalty. Writes are slower because parity must be calculated and updated each time data is written to the array. Here is a breakdown:

  1. New data is written to disk.
  2. The old data is read from the parity drive.
  3. The new parity is calculated by XORing the new data with the old data.
  4. The new parity is written to the parity drive.

This process results in a minimum of 4 I/O operations for every write compared to 2 I/Os for RAID 0. With a write-heavy workload, this can significantly reduce performance.

Large Rebuild Times

As drive capacities continue to increase, rebuild times for recovering failed RAID 5 arrays also increase. When a disk fails, the parity data must be used to reconstruct the data from the failed drive and write it to a replacement drive. This puts significant stress on the array.

With large capacity SATA drives, rebuilds can take many hours in some cases. During this time, if another drive fails, data loss will occur. The longer rebuild times increase the risk of a second drive failure.

Poor Random Write Performance

Workloads have shifted to be more random write intensive compared to the sequential workloads that were more common when RAID 5 was first developed. Virtualization, database applications, web servers, and boot/swap volumes often exhibit highly random access patterns.

This magnifies the write penalty in RAID 5 since the parity drive must be updated more frequently. In some cases, RAID 5 can have worse random write performance than a single disk.

Alternatives to RAID 5

Given the weaknesses around rebuild times, random write performance, and the general write penalty, what are the alternatives to RAID 5?

RAID 10

RAID 10 provides performance and redundancy by combining mirroring and striping. Data is mirrored while also being striped across drives.

The advantages of RAID 10 include:

  • Better performance since there is no parity calculation on writes.
  • Faster rebuilds as only the mirror drive needs to be updated if one drive fails.
  • Ability to withstand multiple drive failures if the failed drives are in different mirrors.

The disadvantage is lower overall capacity due to requiring a minimum of 4 drives.

RAID 6

RAID 6 provides an extra parity drive compared to RAID 5. This allows the array to withstand the loss of two drives.

The advantages of RAID 6 include:

  • Improved resiliency – can survive two disk failures.
  • Slightly faster rebuilds than RAID 5 since remaining data is distributed across more disks.

However, write performance and random I/O performance still suffer due to parity calculations.

Erasure Coding

Erasure coding like Reed-Solomon encoding takes the distributed parity concept of RAID 5/6 but can distribute the data and parity blocks across more drives. This minimizes the performance impact when a drive fails compared to RAID 5/6.

The advantages of erasure coding include:

  • Very large arrays with fast rebuilds.
  • Configurable for different resiliency levels.
  • High storage efficiency.

The disadvantages include high CPU overhead and complexity to implement versus standard RAID.

When RAID 5 Still Makes Sense

While RAID 5 is no longer recommended for performance-oriented applications, it can still be suitable in certain scenarios:

  • Archival data with very low change rates.
  • Small arrays with low capacity drives where rebuild times remain short.
  • Secondary storage where performance is not critical.

The lower overhead of RAID 5 can make sense when absolute storage efficiency is more important than performance. But for most primary application workloads, the shortcomings often outweigh the benefits compared to RAID 10 or more advanced erasure coding options.

Conclusion

RAID 5 was once the go-to for redundancy on storage arrays. But due to technology changes around drive capacities, workloads, and alternatives like SSDs, the weaknesses of RAID 5 now overshadow its benefits in most use cases.

Very large drive capacities increase rebuild times and risk of failure during rebuilds. Random write workloads magnify the write penalty inherent in the RAID 5 parity model. And SSDs and non-volatile memory expose the latency issues associated with RAID 5 compared to simpler mirroring or more advanced erasure coding schemes.

For performance-oriented workloads, RAID 10 provides better overall performance and rebuild times. RAID 6 offers more resiliency but still suffers from the parity write penalty. And erasure coding like Reed-Solomon coding provides an alternative approach to distributed parity that scales to much larger arrays.

The distributed parity approach of RAID 5 made sense decades ago when drive capacities were much smaller, workloads were more sequential, and technologies like SSDs did not exist. But times have changed, and RAID 5’s weaknesses now overshadow its benefits for most use cases outside low change rate archival storage.