How many drives for raid level 5?

RAID level 5 is a commonly used RAID configuration that provides efficient storage capacity and a high level of data protection. Determining the right number of drives for a RAID 5 setup requires understanding the key factors that impact performance, capacity, and fault tolerance.

What is RAID Level 5?

RAID Level 5 is a redundant array of independent disks (RAID) configuration that stripes data and parity information across 3 or more drives. The data blocks and parity information are distributed across the drives so that if any single drive fails, the data can be reconstructed from the remaining data and parity blocks. This provides fault tolerance and allows the array to continue operating with one failed drive.

In addition to fault tolerance, RAID 5 provides good performance and efficient storage capacity. By striping data across multiple disks, RAID 5 allows for parallel disk I/O which improves performance compared to a single disk. Because it only requires parity information equivalent to one drive’s worth of capacity, RAID 5 is an efficient use of storage capacity.

How Many Drives for RAID 5?

The minimum number of drives required for RAID 5 is 3. This allows the array to dedicate the equivalent capacity of 1 drive to parity in order to provide fault tolerance. Some key factors determine how many total drives beyond the minimum 3 that are optimal for a RAID 5 configuration:

  • Storage capacity needs
  • Performance requirements
  • Number of drive failures that can be tolerated
  • Cost considerations

Storage Capacity

Adding more disks to a RAID 5 array increases the total storage capacity, while the amount of capacity dedicated to parity remains constant at 1 disk’s worth. For example, in a 3 drive RAID 5 the total capacity is 2 drives worth while 1 drive provides parity. A 4 drive RAID 5 would have 3 drives of capacity and 1 parity drive. The formula is:

Total Capacity = (Number of Drives – 1) * Individual Disk Size

More drives allow for greater overall storage capacity for the array. The tradeoff is increased cost for more drives.

Performance

RAID 5 performance is impacted by the number of drives in several ways:

  • More drives increase parallelism since data is striped across more disks. This can improve performance.
  • But more drives also means parity calculation overhead is spread across more disks, which can hinder performance.
  • More spindles can help mitigate rebuild times when a failed drive is replaced.

Performance testing can determine the RAID 5 optimization point for a particular workload. Typically 4-8 drives provide a good balance for many workloads.

Fault Tolerance

The fault tolerance of RAID 5 depends on the number of drive failures that need to be tolerated without data loss. RAID 5 can only tolerate a single drive failure by default. If continued operation with 2 failed drives is needed, a RAID 6 configuration with dual parity drives is required.

Cost

More drives means higher cost. The equipment cost for RAID enclosures, HBAs, and drives can add up quickly. The ideal number of drives provides required capacity and performance at a reasonable hardware cost.

RAID 5 Drive Count Examples

Here are some examples of RAID 5 drive counts for different scenarios:

4 Drive RAID 5

  • Total raw capacity is 3 x single drive size
  • Good performance from 4 spindles / data stripes
  • Low hardware cost
  • Tolerates 1 drive failure

4 drives is a good option for small arrays that need some redundancy on a budget.

6 Drive RAID 5

  • Total raw capacity is 5 x single drive size
  • Great performance benefit from 6 spindles
  • Reasonable hardware cost
  • Tolerates 1 drive failure

6 drives provides fast performance while keeping cost manageable.

8 Drive RAID 5

  • Total raw capacity is 7 x single drive size
  • Excellent performance from 8 data stripes
  • Higher hardware cost
  • Tolerates 1 drive failure

8 drives is a sweet spot for performance while still tolerating 1 disk failure.

12 Drive RAID 5

  • Total raw capacity is 11 x single drive size
  • Maximum performance benefit from 12 spindles
  • Much higher hardware cost
  • Tolerates 1 drive failure

At 12 drives the hardware cost starts getting steep, but may be justified for certain workloads requiring very high performance and capacity.

When to choose RAID 5?

RAID 5 is a good choice when these factors apply:

  • High storage capacity is needed
  • Good random read/write performance is required
  • The workload is transactional in nature
  • Budget allows for at least 3 drives
  • Fast rebuild times are not critical

Databases and other business applications are a good fit for RAID 5’s strengths.

RAID 5 Performance Considerations

RAID 5 performance behavior depends on the typical size of I/O requests. Some guidelines:

  • Small block size random reads/writes achieve high throughput and IOPS
  • Large sequential reads perform very well
  • Large random writes and mixed workloads perform poorly due to write penalty

Use RAID 10 if the workload has primarily large I/O and throughput is more important than overall capacity.

When Not to Use RAID 5

Avoid RAID 5 in these scenarios:

  • Rebuilding a failed drive takes too long and hurts availability
  • Need to tolerate more than 1 drive failure
  • Primarily large block sequential I/O
  • Heavy write workloads

For these cases, consider RAID 10 to mirror data across drives for faster rebuilds and better performance.

RAID 5 Rebuild Time Considerations

When a drive fails in a RAID 5, the entire array is in a degraded state until the failed drive is replaced and data rebuilt. During this time, the array is vulnerable to a second drive failure that would cause data loss. Therefore, it’s important to minimize rebuild times.

Factors that influence RAID 5 rebuild time include:

  • Drive capacity – higher capacity drives take longer to rebuild
  • Number of drives – more drives means more data to rebuild
  • Rebuild prioritization – rebuild I/O prioritized over client I/O
  • Drive speed – faster drives rebuild quicker

If frequent rebuilds are expected, use smaller drives and consider adjusting the rebuild I/O priority level.

Alternative RAID Options

Instead of RAID 5, other RAID levels can be used depending on requirements:

RAID 10

  • Block-level striping + mirroring between drives
  • High performance, especially for large I/O
  • Very fast rebuilds
  • 50% storage efficiency
  • Tolerates multiple drive failures in a group

RAID 6

  • Block-level striping with double distributed parity
  • Tolerates 2 drive failures
  • Slower writes than RAID 5 due to dual parity calculation
  • Less efficient than RAID 5 – Capacity = # Drives – 2

RAID 50/60

  • Combination of RAID levels
  • RAID 50 = Striped RAID 5 arrays
  • RAID 60 = Striped RAID 6 arrays
  • Large capacity and allows multiple drive failures
  • Rebuild times are faster than large single RAID group

Software vs Hardware RAID

RAID 5 can be implemented in software or hardware. Main differences:

Software RAID

  • RAID logic is handled by OS or hypervisor
  • Simple management and flexible configuration
  • Higher CPU overhead
  • Avoid for highly intensive workloads

Hardware RAID

  • Dedicated RAID controller with onboard processing
  • Offloads RAID tasks from main CPU(s)
  • Battery-backed write cache
  • More expensive

Hardware RAID preferred for mission critical systems. Software RAID provides a cost-effective option.

RAID Controller Considerations

For hardware RAID, key aspects of RAID controllers include:

  • RAID Level Support – Ensure controller supports needed RAID levels like RAID 5.
  • Cache Memory – Larger cache improves write performance. Battery backup protects cache data if power loss.
  • Processing Power – Faster CPU manages parity calculations and rebuild times.
  • I/O Modules – More ports allow connecting more drives.
  • Management Interface – GUI, CLI, or API for monitoring and configuration.

Matching controller capabilities to workload demands ensures optimal RAID 5 performance.

Drive Considerations for RAID 5

Some best practices for selecting drives for RAID 5 arrays:

  • Use enterprise class drives for better reliability and performance.
  • Choose drives with suitable capacity. Larger drives rebuild slower but provide more capacity.
  • Consider SSDs for better performance if in budget. Or use SSD cache drive(s).
  • Use drives with similar RPM and capacity to balance performance.
  • Stagger drive firmware updates to avoid multiple failures during updates.

Monitoring RAID Health

Monitoring tools can provide insight into the health and performance of a RAID 5 array. Useful metrics include:

  • Current rebuild progress and estimated time to completion
  • Drive temperatures and SMART attributes
  • Spare drive capacity
  • I/O response times and throughput
  • CPU utilization
  • Recovery and resynchronization rates

Alert thresholds can warn when metrics go out of normal operational range and indicate problems.

Conclusion

The ideal number of drives for RAID 5 depends on capacity, performance, fault tolerance, and budget requirements. A minimum of 3 drives is required, but 4-8 drives offer a good balance for many workloads. RAID 5 works best for transactional applications with primarily small, random I/O. It’s important to consider rebuild times, and use RAID 10 or RAID 6 if fast rebuilds are critical. Proper RAID controller selection and enterprise class drives are recommended to optimize performance and reliability.