Should I use 3 or 4 disks for RAID 5?

When setting up a RAID 5 array, one of the first decisions is choosing between 3 or 4 disks. Both options provide the core benefit of RAID 5 – distributed parity for redundancy. However, there are some key differences to consider when deciding between 3 and 4 disks.

What is RAID 5?

RAID 5 is a popular RAID (Redundant Array of Independent Disks) configuration that uses distributed parity and block-level striping. This means the data is split up into blocks and striped across all the disks in the array. Additionally, RAID 5 calculates and writes parity information for each block and distributes the parity across all the disks.

The main benefits of RAID 5 are:

  • Improved read performance – spreading data across multiple disks allows reads to occur in parallel.
  • Redundancy – the parity blocks provide fault tolerance. If one disk fails, the missing data can be recreated from the parity blocks on the remaining disks.
  • Efficient use of storage – RAID 5 uses the equivalent of one disk worth of space for parity, so an N disk array has N-1 disks worth of usable space.

3 Disk RAID 5

A 3 disk RAID 5 array consists of 2 data disks and a third disk for parity. Here is a visualization:

Disk 1 Disk 2 Disk 3
Data Data Parity

With 2 data disks, the usable space in a 3 disk RAID 5 is the total capacity of 2 disks. However, there are some downsides to the 3 disk configuration:

  • Not much read performance gain – with only 2 disks to spread reads across, parallelization gains are minimal.
  • No fault tolerance during rebuild – if a disk fails, the array is in a degraded state until the rebuild completes. Another disk failure during this time will result in data loss.
  • Higher ratio of parity overhead – with 1 disk reserved for parity, 33% of total capacity is lost instead of 25% for a 4 disk array.

Due to the downsides, 3 disk RAID 5 is generally not recommended for critical storage. The 2 disk fault tolerance and lower parity overhead of 4 disk RAID 5 provide more robust protection.

4 Disk RAID 5

A 4 disk RAID 5 array consists of 3 data disks and a fourth disk for parity. Here is a visualization:

Disk 1 Disk 2 Disk 3 Disk 4
Data Data Data Parity

With 3 data disks, a 4 disk RAID 5 array provides:

  • Triple disk read performance versus a single disk.
  • Ability to withstand a single disk failure without losing data while rebuilding.
  • Only 25% parity overhead compared to 33% for 3 disk RAID 5.

These advantages make 4 disk RAID 5 a popular choice for many scenarios including:

  • File and application servers needing improved I/O performance.
  • Mission critical storage where fault tolerance is mandatory.
  • General purpose storage getting significant read operations.

The downside is cost – a 4 disk array requires buying 4 matched drives instead of 3. However, the performance, redundancy, and storage efficiency gains often make the extra upfront cost worthwhile.

When to choose 3 disks

There are some cases where a 3 disk RAID 5 array makes more sense:

  • Low budget scenarios where cost savings outweigh performance/redundancy.
  • Arrays intended for sequential write workloads like video editing. The parity overhead has less impact on writes.
  • Situations where I/O performance demands are minimal.

Overall, the recommendation is to default to 4 disks for RAID 5 in most cases. But for less demanding workloads or tight budgets, 3 disks can be an acceptable compromise.

RAID 5 Rebuild Times

One concern with RAID 5 is rebuild times after a disk failure. The array must read all surviving disks to reconstruct the data from the failed drive. Rebuild times are proportional to the size of the disks in the array. Larger capacity drives take longer to rebuild.

Here are some example rebuild times for common drive sizes in 4 disk arrays:

Drive Size Rebuild Time
4TB 4-6 hours
8TB 8-12 hours
12TB 12-18 hours
16TB 16-24 hours

As the table shows, rebuild times grow substantially as drive sizes increase. The longer rebuild exposes the array to potential second disk failure causing data loss.

Mitigating long rebuild times

There are some ways to help mitigate the issue of long RAID 5 rebuild times on larger drives:

  • Use enterprise class drives designed for RAID environments. They typically have faster, sustained rebuild performance.
  • Keep spare drives ready for immediate swap in after a failure. This starts rebuild faster.
  • Schedule patrol reads to proactively detect bad sectors and remap them. Reduces the number of sectors needing rebuild.
  • Consider RAID 6 dual parity for additional protection during long rebuilds.

RAID 5 vs RAID 6

RAID 6 is an alternate RAID level that provides additional redundancy by using a second distributed parity block. This protects against two simultaneous disk failures.

The advantages of RAID 6 over RAID 5 include:

  • Allows two disk failures without data loss – provides extra protection for large arrays.
  • Safer rebuilds – can withstand a second disk failure during parity rebuild after one drive fails.
  • Extra parity helps detect latent sector errors during rebuilds and correct them.

Downsides of RAID 6 include:

  • Double the write penalty – two parity blocks to update during writes hurts performance.
  • Higher capacity overhead – only N-2 disks are usable in an N disk array versus N-1 with RAID 5.

In general, RAID 6 becomes more advantageous as drive capacities exceed 4-6TB. The fault tolerance outweighs the downsides. But for lower capacity drive arrays, RAID 5 still provides a good balance of redundancy and efficiency.

When to choose RAID 6 over RAID 5

RAID 6 is recommended over RAID 5 in these scenarios:

  • Using high capacity drives 8TB or larger.
  • Need ability to survive simultaneous disk failures.
  • During rebuilds, read errors are common indicating latent defects.
  • Cost savings from needing fewer spare drives with RAID 6 offsets overhead.

Optimal number of disks

Based on the above comparisons between 3, 4, and 6 disk configurations, these guidelines summarize the optimal number of disks for RAID 5 arrays:

  • 3 disks only recommended for niche cases due to low performance and redundancy.
  • 4 disks is the ideal number for most general purpose RAID 5 arrays using up to 8TB drives.
  • 6 disks or more recommended when using drives larger than 8TB, or if dual drive fault tolerance is required.

Drive Selection Considerations

When creating a RAID 5 array, the selection of drives is also important. Here are some tips:

  • Match drives in terms of size, speed, and type across all array disks. Mixing drives increases rebuild times.
  • Use enterprise class drives designed for RAID environments if possible. They are engineered with RAID rebuilds and performance in mind.
  • Ensure drives support TLER (time limited error recovery) to prevent drive dropouts during rebuilds.
  • Include spare drives in initial purchase for immediate swap when a failure occurs to accelerate rebuild.

Software vs Hardware RAID

RAID 5 can be implemented via dedicated hardware RAID controllers, or via software RAID built into operating systems and server software.

Hardware RAID advantages:

  • Offloads RAID calculations from main CPU.
  • Batteries or flash caches protect data in cache during power failures.
  • Some hardware RAID cards offer advanced caching optimizations.

Software RAID advantages:

  • No additional hardware purchase needed.
  • Drives are not locked to a specific hardware controller.
  • Centralized management when software RAID is network based.

For most small to medium implementations, software RAID provides a readily available and cost-effective solution. But large arrays with heavy workloads may benefit from a dedicated hardware accelerated RAID controller.

Expanding RAID 5 Arrays

Once initial RAID 5 arrays are deployed, future expansion should be considered. Arrays can be expanded by:

  • Replacing smaller drives with higher capacity drives – Easy to do but requires purchasing new drives.
  • Adding additional RAID 5 disk sets – Increase usable space while keeping redundancy of original set intact.
  • Migrating to larger RAID 5 array – More disruptive but allows changing RAID level if needed.

Careful capacity planning is needed to determine the most cost and performance efficient expansion strategy.

Conclusion

In summary, the choice between 3 vs 4 disks for RAID 5 comes down to balancing cost, redundancy, and performance:

  • 3 disk RAID 5 minimizes initial purchase cost but has major drawbacks for performance and redundancy.
  • 4 disk RAID 5 provides the ideal blend of capacity, fault tolerance, and performance for most implementations.
  • Higher disk counts help mitigate rebuild risk for large capacity drives or enable dual parity protection.

When in doubt, 4 drive RAID 5 is typically the recommended starting point for general purpose storage using up to 8TB drives. Carefully match drives, consider enterprise models, and plan capacity expansion to get the most from a RAID 5 array deployment.