How many storage devices for RAID 5?

RAID 5 is a type of RAID (Redundant Array of Independent Disks) that provides data redundancy and fault tolerance by using distributed parity and striping (RAID.com, 2022). In RAID 5, data is striped across multiple drives, similar to RAID 0, but it also utilizes distributed parity information that allows for data recovery in case a drive fails (TechTarget, 2022).

Specifically, RAID 5 splits and stripes data across three or more drives, while also generating and storing parity information on one of the drives (PCMag, 2022). The parity drive rotates for each stripe, distributing the parity information evenly across all drives. This allows the array to withstand a single drive failure without data loss, as the parity information can be used to reconstruct the data that was on the failed drive (NetworkEncyclopedia, 2022).

RAID 5 provides improved performance over a single drive and protection against drive failure. However, write performance suffers due to parity calculation. RAID 5 is commonly used for transaction databases, email servers, and other applications that require fault tolerance and moderate performance (PCMag, 2022).

Minimum Number of Drives

The minimum number of drives required for a RAID 5 configuration is 3 (source). This is because RAID 5 requires at least 3 drives to provide redundancy and fault tolerance. With only 2 drives, there is no parity so there is no protection against drive failure (source).

RAID 5 requires a minimum of 3 drives because data is striped across all drives, while 1 drive’s worth of capacity is used for parity information. The parity drive contains calculated error correction data based on the data on the other drives. If any 1 drive fails, the data on the failed drive can be recreated using the parity drive. So with 3 total drives, 1 drive can fail while data integrity is maintained.

Drive Failure Tolerance

A key characteristic of RAID 5 is its ability to withstand a single drive failure without losing data. This is made possible through the use of parity information that is distributed across all the drives in the array. Parity allows the system to reconstruct data in the case of a failed drive [1]. Specifically, RAID 5 requires a minimum of 3 drives in order to provide fault tolerance. With 3 drives, if one fails, the remaining two drives can be used to recalculate the data that was on the failed drive. This provides protection against a single drive failure.

However, RAID 5 provides no protection against multiple simultaneous drive failures. If two drives fail at the same time, the parity information is not sufficient to rebuild the lost data [2]. Therefore, the maximum number of drive failures that can be tolerated in RAID 5 without data loss is one. To protect against two drive failures, an alternative RAID level like RAID 6 would be required.

Performance

RAID 5 provides decent read performance, but write performance is slower compared to RAID 0 or RAID 1 due to the parity calculations involved. During writes, the parity information needs to be updated which adds additional overhead.

According to benchmarks on AnandTech (https://www.anandtech.com/show/3204), RAID 5 performance with 4 drives was found to have:

  • Sequential read speeds up to 230 MB/s
  • Sequential write speeds around 180 MB/s

Compared to RAID 0 which had read/write speeds of over 400MB/s. The parity calculations during writes in RAID 5 result in slower performance.

RAID 5 read speeds are comparable to RAID 0, but writes can be 50% slower or more. However, RAID 5 provides fault tolerance unlike RAID 0. For applications requiring high write performance, RAID 10 may be a better option despite the higher cost.

Rebuild Times

One of the key considerations with RAID 5 is how long it takes to rebuild the array after a drive failure. When a drive in a RAID 5 array fails, the system needs to rebuild the data that was on the failed drive using parity information spread across the remaining drives.

According to TechTarget, as HDD sizes increase, RAID 5 rebuild times will rise significantly, putting the system at risk for another drive failing during the rebuild process. Rebuild times for large capacity HDDs can take many hours or even days.

For example, according to a discussion on TrueNAS forums, rebuilding a 6TB HDD in a RAID 5 array could take around 5-6 hours. With larger 14TB drives, rebuild times could be 24 hours or more.

The longer rebuild times with modern high capacity HDDs is a major downside of RAID 5. If another drive fails during this time, data loss could occur. For mission critical data, alternatives like RAID 6 or RAID 10 may be preferable despite increased cost.

Cost Efficiency

RAID 5 offers good cost efficiency compared to other RAID levels, making it a popular choice for many situations. By using distributed parity, RAID 5 requires fewer disks than RAID 1 mirroring to achieve the same capacity and redundancy. According to one Reddit user, going with a 4-bay RAID 5 configuration yields better performance at an overall cost savings of about $50 compared to RAID 1 (Source).

With RAID 5, only 1 disk worth of space is used for parity, whereas RAID 6 uses 2 disks. This allows RAID 5 to provide more usable capacity for the same number of disks compared to RAID 6. One source notes that because no extra disks are occupied for mirroring like in RAID 1, the costs for RAID 5 are immediately less (Source). Overall, RAID 5 provides good balance of redundancy and capacity while keeping costs down.

Drawbacks of RAID 5

While RAID 5 offers redundancy and can withstand a single drive failure, it has some notable downsides to consider:

Slower write speeds – Writing data in RAID 5 is slower compared to a single drive or mirroring (RAID 1) due to the parity calculation on writes. Each write requires the data to be written and parity to be updated across all drives, which adds substantial overhead (TechTarget).

Long rebuild times – When a failed drive is replaced, rebuilding the RAID can take a very long time due to the need to recalculate parity across all drives. This leaves the system vulnerable during the rebuild process. According to TechTarget, long rebuild times are a major drawback of RAID 5 and can result in data loss.

The RAID is offline during rebuilds – The entire RAID 5 array is unavailable during the parity rebuild process. For large drive sizes or arrays with many disks, this can mean significant downtime.

Vulnerable to failure during rebuilds – If a second drive fails during the rebuild process, the entire RAID 5 array will be lost. The long rebuild times increase this risk of failure.

Higher ratio of disks for parity – Compared to RAID 6, a higher ratio of the total disks are dedicated to parity in RAID 5. This results in lower usable capacity for the same number of disks.

Alternatives

RAID 6 and RAID 10 are often used as alternatives to RAID 5, depending on an organization’s priorities. RAID 6 offers better protection against multiple drive failures by using double distributed parity but comes at the cost of reduced write performance compared to RAID 5 (RAID 6 vs. RAID 10). RAID 10 provides faster read/write speeds by striping and mirroring data across drives but requires more drives. RAID 6 only requires a minimum of 4 drives while RAID 10 needs at least 4 drives to provide redundancy (RAID 6 vs. RAID 10: Which is better to prevent data loss?).

RAID 4 is an older alternative that provides dedicated parity like RAID 5 but only for writes, reducing performance. RAID 4 is rarely used today in favor of RAID 5 or RAID 6 (RAID 5 and why it isn’t enough).

Use Cases

RAID 5 is commonly used in environments where a balance between data protection, performance, and storage efficiency is needed. Some ideal scenarios and applications for using RAID 5 include:

File Servers – RAID 5 provides excellent redundancy for file storage while maximizing disk space utilization. This makes it a popular choice for file servers that need to maximize storage capacity.

Database Servers – Databases benefit from RAID 5’s improved read performance compared to RAID 1. The distributed parity helps speed up read operations.

Virtualization – RAID 5 can help maximize storage for virtual machine data stores while providing redundancy. The improved performance helps support numerous VMs.

Transaction Processing – The faster read performance of RAID 5 improves transactional database access and processing for applications like ERP, CRM, ecommerce, etc.

Media Production – Media production workflows require fast, redundant storage for video files, audio, images, etc. RAID 5 balances performance and protection.

Backup Storage – Archival and backup storage benefits from RAID 5’s storage efficiency while still providing redundancy against drive failures.

General Purpose – For general purpose file storage, RAID 5 provides a good combination of redundancy, storage utilization, and performance.

Source: https://mycloudwiki.com/san/raid-5-overview/

Conclusion

In summary, RAID 5 storage requires a minimum of 3 drives to implement. It can tolerate the failure of 1 drive without data loss by using parity data striped across the drives. Performance is decent but not as fast as RAID 0 or 10 due to the parity calculations. Rebuild times can be lengthy with larger drive sizes. RAID 5 provides a good balance of redundancy, performance, and cost efficiency for many use cases but does come with some drawbacks like the write hole issue during rebuilds. Careful consideration should be given to the alternatives like RAID 6, RAID 10, or other erasure coding schemes for mission critical data. Overall, RAID 5 remains a popular choice for redundant storage in servers and NAS devices when cost efficiency and redundancy are top priorities.