How many disks RAID 6 can lose without losing data?

RAID 6 is a type of redundant array of independent disks (RAID) that is designed to protect against the failure of up to two disks without losing data. This makes RAID 6 an attractive option for setups that require high reliability and uptime.

What is RAID 6?

RAID 6 is a level of RAID that uses block-level striping with double distributed parity. This means that data is distributed across multiple disks in the array, similar to RAID 0, but RAID 6 also generates and stores parity information that gets striped across the disks as well.

The parity information is used to reconstruct data in case of disk failures. Since RAID 6 uses double parity, it can handle the loss of up to two disks without losing access to data. The dual parity provides redundancy and fault tolerance.

How does RAID 6 provide fault tolerance?

RAID 6 provides fault tolerance through its use of double parity. Here’s how it works:

  • Data is striped across all the disks in the RAID 6 array, just like with RAID 0.
  • In addition to striping data, RAID 6 also calculates and writes two sets of parity information to the disks.
  • The first set of parity (P) information contains the XOR result of data blocks in the same stripe.
  • The second set of parity (Q) information contains the XOR result of data blocks and P parity blocks in the same stripe.

So for every stripe of data written across the disks, two parity blocks are also calculated and written. This provides redundancy that can handle up to two disk failures without data loss.

Advantages of RAID 6

Some key advantages of using RAID 6 include:

  • High fault tolerance – Can withstand failure of up to 2 disks without data loss
  • Good performance – Data is striped, allowing for parallel reads and writes across multiple disks
  • Capacity efficiency – Only requires the equivalent of 2 disks worth of capacity for parity
  • Easy to expand – Additional disks can be added to grow the overall capacity

How Many Disks Can Fail in RAID 6 Without Data Loss?

The key benefit of RAID 6 is its ability to withstand and recover from the failure of up to two disks without losing access to data. This is possible due to the double distributed parity.

Even if two disks fail completely, the parity information distributed across the remaining disks can be used to reconstruct the missing data. This provides excellent protection against hardware failures.

Scenario 1: One Disk Fails

If a single disk fails in a RAID 6 array, there is no data loss. The system can continue operating normally using the remaining disks.

All data blocks that were stored on the failed disk can be rebuilt using the P and Q parity blocks on the other disks. Rebuilding the failed disk may impact performance until it is replaced and rebuilds completed, but the array remains fully functional.

Scenario 2: Two Disks Fail

If two disks fail simultaneously, a RAID 6 array can still avoid data loss. The double parity provides enough redundant information to recreate all missing data blocks from the failed drives.

However, there is no remaining redundancy at this point. If any additional disk fails before the two failed disks are replaced and rebuilt, data loss will occur.

So in summary, up to two failed disks can be tolerated in RAID 6 without permanent data loss. But the failed disks should be replaced as soon as possible to restore fault tolerance.

Factors That Determine Disk Failure Tolerance

While RAID 6 can theoretically withstand up to two disk failures, the actual number that can be tolerated depends on the specific RAID 6 configuration and parameters.

Total Number of Disks

The more disks present in the RAID 6 array, the higher the tolerance for failed disks. For example, an 8 disk array can handle up to 2 disk failures. But a 4 disk array could potentially lose half its capacity with 2 failed disks.

Disk Capacity

Larger capacity disks generally have a higher likelihood of failure compared to smaller disks over the same time period. Using high capacity disks like 4TB or 8TB models could reduce the number of failures the system can tolerate. Smaller disks may increase reliability.

Hot Spares

Some RAID 6 setups use hot spare disks that can automatically rebuild failed disks. This reduces the vulnerable window when the array is without redundancy. Hot spares can improve the tolerance for disk failures.

Rebuild Times

The time taken to rebuild failed disks also impacts tolerance. Quick rebuilds minimize the chances of additional disk failures during rebuild. Improving rebuild times with faster disks or controllers allows RAID 6 to better tolerate failures.

Recommended RAID 6 Configurations

Here are some recommended RAID 6 configuration guidelines for optimal disk failure tolerance:

Minimum 4 data disks

RAID 6 arrays should contain a minimum of 4 data disks to provide redundancy. 2-disk RAID 6 offers no advantage over RAID 1 mirroring.

8 data disks for optimal tolerance

An 8 disk RAID 6 array can tolerate up to 2 disk failures while still retaining 50% redundancy. This offers a good balance of failure tolerance and capacity.

Consider hot spare disks

Adding hot spare disks allows automatic rebuild of failed disks, minimizing vulnerability windows. Global hot spares can protect multiple RAID 6 arrays.

Smaller capacity drives preferred

Smaller capacity disks typically have lower failure rates, allowing RAID 6 to better tolerate multiple disk losses. Drive sizes up to 2TB offer a good compromise.

Faster disks and controllers

SSDs or high RPM hard drives rebuild faster than slow disks. Dual RAID controllers also accelerate rebuilds. This improves tolerance for multiple disk failures during rebuilds.

Examples of RAID 6 Failure Tolerance

Here are some examples that illustrate RAID 6’s ability to withstand disk failures:

4 Disk Array – Withstands 1 Failure

Disks Status
Disk 1 Online
Disk 2 Online
Disk 3 Online
Disk 4 Failed

This 4 disk RAID 6 array can only withstand a single disk failure without losing data or redundancy. If Disk 4 fails, the array can rebuild it using parity on the other disks. But if a second disk fails, data will be lost.

8 Disk Array – Withstands 2 Failures

Disks Status
Disk 1 Online
Disk 2 Online
Disk 3 Online
Disk 4 Failed
Disk 5 Online
Disk 6 Failed
Disk 7 Online
Disk 8 Online

This 8 disk RAID 6 configuration can withstand up to 2 disk failures without data loss. If Disk 4 and Disk 6 fail, the array can rebuild them using parity from the other 6 disks.

When to Use RAID 6

RAID 6 offers excellent protection against disk failures, at the cost of some additional overhead for the second parity computation. Here are some cases where RAID 6 is recommended:

  • Mission critical storage that cannot have downtime or data loss
  • Large arrays with many disks where failures are more likely
  • Situations where quick disk replacement/rebuild is not feasible
  • Archival data that rarely changes after initial write

The dual parity does impose write penalties and higher capacity overheads compared to RAID 5. So for less critical data, RAID 10 or RAID 5 may be more suitable.

Best Practices for RAID 6 Reliability

Beyond selecting RAID 6 as the architecture, some additional best practices can improve the reliability and failure tolerance of the array:

  • Use enterprise class SAS or SATA drives designed for RAID
  • Implement hot spare drives to accelerate rebuilds
  • Use smaller capacity drives for better reliability
  • Monitor disks SMART attributes to identify failures early
  • Replace disks once they reach 40-50% of rated lifetime
  • Use a UPS to protect against power failures
  • Consider replacing entire arrays after 4-5 years of use

Alternatives to RAID 6

While RAID 6 is highly tolerant of disk failures, it is not the only RAID type that offers redundancy:

RAID 10 (1+0)

RAID 10 mirrors stripes of disks for performance and can tolerate failure of up to 50% of disks. Rebuild times are faster but at the cost of capacity overhead.

RAID 60

RAID 60 combines RAID 6 for protection against two disk failures and RAID 0 for striping. It requires at least 8 disks but can tolerate multiple failures.

RAID-Z

ZFS RAID-Z offers variable parity protection similar to RAID 5/6 but with additional benefits like bit rot protection and snapshots.

Erasure Coding

More advanced erasure coding like Reed-Solomon codes provide RAID 6-like tolerance but with lower capacity overhead.

Conclusion

RAID 6 can tolerate up to two concurrent disk failures without data loss due to its double distributed parity. This provides excellent protection for mission critical storage and large disk arrays where failures are more likely.

To maximize reliability, RAID 6 arrays should follow best practices like using enterprise class disks, hot spares, smaller capacity drives, and periodic replacements. Compared to RAID 5 and 10, RAID 6 offers the best redundancy for large storage deployments that need high uptime and fault tolerance.