What is the optimal drives for RAID 5?

RAID 5 is a type of redundant array of independent disks (RAID) that combines distributed parity and striping (RAID 0) for data protection and faster writes. RAID 5 requires at least three disks and is considered one of the most commonly used RAID levels[1].

With RAID 5, data is striped across all the drives in the array, just like with RAID 0. Unlike RAID 0 however, RAID 5 also utilizes parity information that is distributed across the drives. The parity information allows for data recovery in the event of a single drive failure. If a drive in the array fails, the missing data can be recreated using the parity information[2].

Some key benefits of RAID 5 include:

  • Increased read performance compared to a single drive due to striping
  • Ability to withstand a single drive failure without data loss
  • Doesn’t require as many disks as RAID 1 or RAID 10 for redundancy
  • Lower cost than RAID 1 or RAID 10

Overall, RAID 5 provides a good balance of performance, redundancy, and efficiency that makes it a popular choice for many applications.

[1] https://www.pcmag.com/encyclopedia/term/raid-5
[2] https://networkencyclopedia.com/raid-5-volume/

How Data is Stored in RAID 5

RAID 5 uses distributed parity to protect against data loss in the event of a drive failure. Data is striped across all drives in the array in blocks. Along with the data blocks, parity information is also written across the drives[1]. Parity allows for the reconstruction of data in the event a drive fails.

Parity is calculated using XOR operations across the data blocks in a stripe. The parity block is distributed among the drives, with each drive storing parity for the data blocks on the other drives. This way, if a drive fails, the parity blocks on the remaining drives can be used to reconstruct the lost data blocks by performing the XOR operation again[1].

When a failed drive is replaced, the RAID controller will regenerate the lost data blocks and distribute them across the new replacement drive. This allows a RAID 5 array to continue operating with full redundancy during a drive failure.

[1] https://community.spiceworks.com/topic/1927399-dell-server-throwing-ntfs-error-id-55-corrupt-file-system-on-c

Optimal Drive Counts

RAID 5 requires a minimum of 3 drives in order to provide fault tolerance and data redundancy. With just 2 drives, RAID 5 is no different than RAID 0. The storage capacity of a RAID 5 array is that of the smallest drive times (N-1), where N is the number of drives. So in a 3 drive array, you get the capacity of 2 drives.

Increasing the number of drives increases redundancy. If one fails, the array can still operate and be rebuilt after replacing the failed drive. However, more drives also increases the rebuild time. With more drives, there is more data to rebuild across the array if a drive fails.

Performance and redundancy both increase up to about 6-8 drives in a RAID 5 array. Beyond this, performance gains diminish significantly and rebuild times start becoming impractical. Most experts recommend staying in the 3-8 drive range for optimal performance and manageable failure recovery.[1]

Going beyond 8 drives is not recommended for RAID 5 unless absolutely necessary. At that point, other RAID levels like RAID 6 become better options.

Optimal Drive Capacities for RAID 5

When setting up a RAID 5 array, using drives with matching capacities is ideal. This allows the full capacity of each drive to be utilized without any storage going to waste. With mismatched drive sizes, the total capacity available will be limited to the size of the smallest drive multiplied by the number of drives minus one for parity (source).

For home and small office use, drives in the range of 2-6TB offer a good balance of capacity versus cost. For larger storage needs, 8-14TB enterprise-class drives are recommended to build higher capacity arrays (source). When mixing drive capacities, it’s best to use sizes that are multiples of each other (e.g. 2TB and 4TB) to maximize available storage.

Ultimately, the optimal drive capacity depends on the total storage needs and budget. Using matching higher capacity drives allows building RAID 5 arrays with abundant storage and room to expand in the future.

Optimal Drive Speeds

When selecting drives for a RAID 5 array, it’s important to match the rotational speeds (RPMs) of the drives as closely as possible. Mixing drives with drastically different RPMs can lead to performance bottlenecks.

For example, if you combine 7200 RPM and 5400 RPM drives in the same RAID 5 array, reads and writes will be limited by the slower 5400 RPM drives. The faster drives will frequently need to wait for the slower ones during operations. This mismatch limits the overall performance of the array.

In general, enterprise-class 7200 RPM HDDs tend to offer the best balance of performance and value for RAID 5 arrays. While 10,000-15,000 RPM drives are faster, they come at a significant cost premium. HDDs in the 7200 RPM range provide reasonably fast throughput while keeping expenses manageable.

For optimal performance, all the drives in a RAID 5 array should match in terms of RPM rating. Mixing drive speeds can work in less performance-sensitive environments, but matching speeds is strongly recommended for business-critical applications.

Sources:

[1] https://www.arcserve.com/blog/understanding-raid-performance-various-levels

[2] https://community.spiceworks.com/topic/2141833-raid5-speed-question

Optimal Interface

When selecting drives for RAID 5, one key consideration is the drive interface. The three main options are SATA, SAS, and NVMe.

SATA is the most common and affordable interface for traditional hard disk drives (HDDs). SATA 3 provides up to 6 Gb/s of bandwidth, which is enough for most HDDs that top out at around 200 MB/s sequential speeds. However, SATA can become a bottleneck with higher speed solid state drives (SSDs).

SAS is an enterprise-focused interface that offers much higher bandwidth of up to 12 Gb/s with the latest SAS3 standard. This removes any bottlenecks for high performance drives. However, SAS comes with increased costs and mainly benefits server/business use cases that demand maximum performance. Consumer NAS devices mostly utilize SATA.

NVMe is a very high performance interface designed for SSDs. NVMe drives can reach over 3,000 MB/s sequential read/write speeds. This requires PCIe 3.0 x4 or higher interfaces capable of 32 Gb/s. NVMe offers the highest possible performance but also carries a significant price premium. Support on NAS devices is still limited as of 2020 [1].

For most home and SOHO RAID 5 usage, SATA offers a good balance of affordability and performance. SAS could be considered for high-end business usage, while NVMe remains overkill for typical RAID needs. Carefully match your interface bandwidth to the drive speeds to avoid creating a bottleneck.

Balancing Cost vs Performance

When selecting drives for a RAID 5 array, there is a tradeoff between cost, capacity, and performance that should be considered. Generally, higher capacity drives reduce the cost per terabyte of storage, but may have slower performance. Faster drives like enterprise SSDs can significantly improve performance, but at a higher price point.

For most use cases, a balanced approach is recommended. Mid-range SATA or SAS hard drives offer a good blend of affordability and performance for RAID 5. According to one analysis, 7,200 RPM SATA drives provide strong random read speeds for typical workloads at a reasonable cost (https://www.techtarget.com/searchstorage/definition/RAID-5-redundant-array-of-independent-disks).

When optimizing for performance, using 10K or 15K RPM SAS drives can help, albeit at higher cost. For more demanding workloads, SSDs may be justified, but capacity will be lower per dollar spent. Finding the right balance depends on workload requirements, capacity needs, and budget.

Managing Drive Failures

One of the benefits of RAID 5 is that it can remain operational during a single drive failure. When a drive fails in a RAID 5 array, it goes into a degraded mode, running with reduced redundancy until the failed drive is replaced.

To minimize downtime, many RAID 5 setups utilize hot spare drives. These are extra drives that are automatically used by the RAID controller to rebuild the array if a failure occurs. When a drive fails, the hot spare is automatically swapped in to replace it. The RAID then rebuilds onto the hot spare, restoring full redundancy without any administrator intervention.

Well-designed RAID 5 implementations are able to maintain availability through most drive failures. According to one source, “RAID 5 provides redundant data storage through parity checking, stripping data across drives, and the ability to operate with one failed drive.” https://www.stellarinfo.com/blog/raid-5-recovery-with-one-drive-failure/ With proper hot spare configuration and drive replacements, downtime can be minimized even during rebuilds.

The redundant nature of RAID 5 provides high availability for critical data and applications. By planning for drive failures and leveraging technologies like hot spares, organizations can keep their RAID 5 arrays online and accessible.

Expanding RAID 5 Arrays

As more storage capacity is needed, one option is to expand the existing RAID 5 array by adding additional drives. There are a few considerations when expanding a RAID 5 array:

Adding drives increases the overall storage capacity and can help improve rebuild times by spreading parity data across more drives. However, depending on the RAID controller, there may be limitations on how many drives can be added. According to community forums, some controllers may only support adding 1-2 drives at a time to limit rebuild impact (1).

When adding new drives for capacity expansion, it’s ideal if they are the same size or larger than the existing drives. Mixing drive sizes can lead to wasted unused capacity. The RAID controller will also often rebuild the array from scratch when new drives are added, so performance may be temporarily impacted.

In general, expanding an existing RAID 5 incrementally by adding drives introduces more opportunities for failure compared to creating a new larger array. At some point it becomes more reliable to migrate data to a new RAID 5 set with more drives rather than continuing to expand the original array (2).

When to Consider Alternatives

While RAID 5 can provide a good balance of performance, capacity, and redundancy for many uses, there are some situations where you may want to consider alternative RAID configurations:

If uptime and redundancy are critical, RAID 6 offers an extra parity drive over RAID 5. This allows the array to withstand up to two drive failures without data loss. The tradeoff is decreased usable capacity and write performance compared to RAID 5.

For workloads with heavy write operations, RAID 10 can provide better performance by striping and mirroring data across drives. However, it requires at least four drives and offers less usable capacity than RAID 5.

If you want flexibility to grow or migrate the array in the future, software RAID solutions avoid vendor lock-in compared to hardware RAID cards. This allows using commodity drives in different configurations.

As drive sizes continue to increase, the rebuild times and risk of unrecoverable read errors also increase with large RAID 5 arrays. At a certain point, alternatives like RAID 6 and 10 become more advisable.