Is RAID 5 still good? - Darwin's Data

RAID 5 has been a popular option for many years due to its ability to provide data redundancy while maximizing storage capacity. However, some experts argue that RAID 5 may no longer be the best choice given changes in storage technologies. This article examines the pros and cons of using RAID 5 in modern IT environments.

Table of Contents

What is RAID 5?

RAID stands for “Redundant Array of Independent Disks.” It is a method of combining multiple disk drives to act as one large storage unit. The main goal of RAID is to provide greater data security and/or increased performance compared to a single disk.

RAID 5 uses block-level striping with distributed parity. This means the data is broken up into blocks and striped across all the disks in the array. Additionally, a parity block is calculated and written across the disks. The parity block can be used to reconstruct data in case one of the drives fails.

A key benefit of RAID 5 is that it provides redundancy for data protection with minimal storage capacity overhead. Since the parity information is distributed across all the disks, the loss of any one disk does not result in data loss. RAID 5 requires a minimum of three disks.

The pros of using RAID 5

Here are some of the main advantages of using RAID 5:

Good storage efficiency – With distributed parity, RAID 5 is much more storage efficient than RAID 1 mirroring. RAID 5 arrays generally achieve around 67% to 75% of total raw storage capacity.

Allows for drive failure – The distributed parity provides fault tolerance for a single disk failure. If a drive fails, the missing data can be recreated from the parity blocks.
Better read performance – RAID 5 performs well for read operations since the workload can be distributed across multiple disks.
Cost effective – RAID 5 provides redundancy without doubling your hardware costs like mirroring. It is a relatively affordable way to prevent data loss.

Easy to resize – Many RAID 5 implementations allow you to easily expand storage capacity by adding more disks.

For these reasons, RAID 5 became a popular choice for secondary storage needs that require redundancy on a budget. The combination of decent performance plus the ability to survive disk failures made RAID 5 suitable for use cases like storing documents, databases, and media files.

The cons of using RAID 5

However, RAID 5 does come with some downsides, especially in modern IT environments:

Performance overhead – The parity calculations add write performance overhead. All write operations require multiple reads and writes to update data and parity blocks.
Limited failure tolerance – RAID 5 can only tolerate a single disk failure. Additional disk failures can cause complete data loss.
Slow rebuilds – Rebuilding a failed drive requires reading from all remaining disks which can take a long time with large disk sizes.

Vulnerable to latent sector errors – If undiscovered disk errors exist, a RAID 5 array is more vulnerable to data loss compared to other RAID levels.
Mediocre random write performance – Applications that perform many small random writes will experience slower performance on RAID 5.

These cons have become more pronounced with changes in hard drive technologies over the past decade. Larger drive capacities and slower rebuild times in particular have called RAID 5’s effectiveness into question for some workloads.

RAID 5 rebuild times are slowing down

One of the major drawbacks impacting RAID 5 is that rebuild times have gotten much slower. When a disk fails, the RAID controller must read all the data blocks from the remaining disks in order to reconstruct the data that was on the failed drive.

With older smaller drives, this rebuild process took hours. But thanks to rapidly growing drive capacities, rebuilding a failed 6 TB drive could take a full day or longer. During this lengthy process, the RAID 5 array is operating in a degraded mode and is vulnerable to a second disk failure.

Slower rebuilds also impact performance. The heavy read activity puts additional strain on the remaining disks which delays other I/O operations.

Here is an example comparing rebuild times for different RAID 5 arrays:

RAID 5 Array	Total Capacity	Estimated Rebuild Time
6 x 160 GB HDD	800 GB	5 hours
6 x 600 GB HDD	3 TB	16 hours
6 x 2 TB HDD	10 TB	2 days
6 x 4 TB HDD	20 TB	4 days

As this table illustrates, rebuild times have increased significantly as drive sizes continue to grow larger. Whereas RAID 5 provided sufficient redundancy with smaller disks, the lengthy rebuild exposures have made it much less tolerant of faults.

Risk of unrecoverable read errors

All hard disks have a small chance of encountering unrecoverable read errors during normal operation. These latent sector errors are typically managed automatically by the drive through remapping. However, the redundancy of RAID 5 means drives will continue to operate even with a number of bad sectors.

The problem occurs when a second disk fails before the latent errors on the first disk are detected and remapped. At that point, the RAID 5 array can no longer reconstruct the data due to the bad sectors spread across multiple disks. This scenario often results in permanent data loss.

Newer drive technologies like shingled magnetic recording (SMR) also introduce performance and error recovery challenges for RAID 5 arrays.

RAID 6 offers double parity

In response to the increasing size and vulnerability of RAID 5 arrays, RAID 6 was developed. It adds a second distributed parity block to allow the array to withstand the failure of two disks.

The dual parity provides several advantages:

Triple disk failure tolerance
Additional protection during rebuilds

Extra parity helps detect latent sector errors

However, RAID 6 doubles the write penalty compared to RAID 5. Because all write operations must calculate and update two parity blocks, performance overhead is substantial.

Consider RAID 10 for better performance

Where higher performance is needed, RAID 10 is often a better solution. RAID 10 combines mirroring and striping for both redundancy and speed.

In a typical 4-disk RAID 10 configuration, two mirrored pairs of drives are created. Data is then striped across the pairs. This layout allows large sequential reads and writes to be split across all four disks.

Write penalty is lower with RAID 10 since parity calculations are not required. Rebuilds are also much faster since the mirrored drive contains a complete copy of the data. The downside is RAID 10 effective capacity is equivalent to a single drive. However, performance and redundancy make RAID 10 a popular choice for applications like virtualization and databases.

Consider erasure coding for large arrays

For large arrays with more than 8 disks, erasure coding schemes like Reed-Solomon codes can provide an alternative to parity-based RAID. Compared to dual parity, erasure coding offers:

Higher fault tolerance – Can recover from 3 or more disk failures
Larger capacities for equivalent redundancy
Flexible reconstruction strategies

Facebook uses erasure coding with 14 data fragments and 4 coding fragments distributed across 18 servers. This provides enough redundancy that any 4 servers can fail without data loss.

The downside is computing erasure codes has much higher CPU overhead. Performance is also impacted by reconstruction times since more disks are involved.

Auto-migration features help avoid failures

Some storage vendors have added auto-migration technologies that help mitigate the risks of RAID 5 arrays as disks age.

For example, storage pools can track disk errors and automatically move data off disks with high error rates. Critical data can selectively be mirrored to avoid using degraded disks.

As disks exceed certain error thresholds, they are automatically excluded from the RAID 5 array. Free space is rebalanced across the remaining disks.

Auto-migration requires no administrative intervention. However, these features are generally only found on more expensive enterprise-grade storage appliances.

RAID is not a backup

It’s important to note that RAID is designed to protect against hardware failures and improve uptime. It is not a backup solution and does not guard against accidental deletion, corruption, disasters, or ransomware.

Organizations should carefully test restores from backups to validate their effectiveness. Maintaining a good backup regime is necessary regardless of the storage redundancy level.

Should you still use RAID 5?

RAID 5 can still be appropriate for certain use cases such as:

Read-heavy workloads where performance is not critical
Archival data and backups
Smaller arrays with lower-capacity drives

However, for general primary storage, RAID 5 should be avoided in favor of RAID 6, RAID 10, or erasure coding schemes. The rebuild times and risk of data loss on disk failure outweigh the capacity benefits except for cold storage use cases.

To summarize, here are some best practices regarding RAID 5:

Only use RAID 5 for secondary, non-critical data where redundancy is still desired

Keep RAID 5 arrays small (under 6 disks) to limit rebuild times
Monitor disk health closely and replace disks proactively
Always maintain good backups that are tested periodically

Conclusion

RAID 5 was once a standard recommendation for combining performance and redundancy. But given the technology changes, administrators should rethink defaulting to RAID 5 and consider alternatives like RAID 6 or RAID 10 for most applications today.

While RAID 5 still has a place for certain workloads, it is no longer the one-size-fits-all solution it once was. Carefully evaluating your specific performance, capacity, and redundancy needs is necessary to choose the right RAID level for modern storage environments.