What is the best RAID for redundancy?

Redundancy is a key consideration when setting up a RAID (Redundant Array of Independent Disks) system. The goal of redundancy is to protect against data loss in the event of a drive failure. There are several different RAID levels that provide varying degrees of redundancy, with different performance trade-offs.

What is RAID?

RAID is a way of combining multiple drives together into a logical unit. Data is distributed across the drives according to the RAID level being used. This allows for increased capacity, performance, and redundancy compared to single drives.

Some key advantages of RAID include:

  • Increased storage capacity – RAID combines drives together into a larger logical volume.
  • Improved performance – Data can be distributed across drives for faster reads and writes.
  • Redundancy – Extra parity data can be used to rebuild data in case of a drive failure.

There are several standard RAID levels, each with different characteristics:

RAID Level Minimum Drives Redundancy Description
RAID 0 2 No Block-level striping. Increased performance but no redundancy.
RAID 1 2 Yes Disk mirroring. Complete redundancy but requires at least 2 drives.
RAID 5 3 Yes Block-level striping with distributed parity. Good redundancy and performance.
RAID 6 4 Yes Same as RAID 5 but with double distributed parity. High redundancy but write performance impact.

Choosing the Right RAID Level

When choosing a RAID level, there are several factors to consider:

  • Redundancy requirements – How much redundancy and fault tolerance is needed? RAID 0 provides no redundancy, RAID 1 provides full redundancy, while RAID 5/6 provide single or double distributed parity.
  • Performance needs – Read and write performance is impacted by RAID level. RAID 0 provides best write performance but no redundancy. RAID 1 and 5 offer good performance with redundancy. RAID 6 has high redundancy but slower writes.
  • Number of drives – Each RAID level has a minimum number of drives required. At least 2 drives are needed for any redundant configuration.
  • Cost – Higher redundancy RAID levels require more drives, increasing overall storage costs.

Understanding the performance and redundancy trade-offs of each RAID level is key to selecting the right solution for a particular storage need.

RAID 5 – A Great Balance of Redundancy and Performance

For most applications requiring redundancy, RAID 5 provides an excellent combination of fault tolerance and performance:

  • RAID 5 requires a minimum of 3 drives
  • Data is block-level striped across drives, providing fast reads/writes
  • An additional parity block is calculated and written across the drives
  • The parity block can be used to reconstruct data in case of a single drive failure
  • RAID 5 provides single-drive fault tolerance – if one drive fails, data can still be accessed and rebuilt from the remaining data and parity blocks
  • Because parity calculation is distributed across drives, write performance is better than RAID 1 mirroring
  • Cost is lower than RAID 1 because full duplication (mirroring) is not required

In summary, RAID 5 provides excellent performance while also providing the ability to withstand a single drive failure. Rebuilding data after a failed drive is replaced also has minimal impact on access speeds. For these reasons, RAID 5 is a popular choice for redundancy in applications such as database servers, enterprise storage, and media streaming.

Drawbacks of RAID 5

There are some potential drawbacks to consider with RAID 5:

  • The rebuild process after a drive failure can take a long time with large drive capacities. During this time, there is little redundancy and risk of data loss if another drive fails.
  • Larger capacity drives increase the likelihood of an unrecoverable read error during rebuild. UREs can cause data loss.
  • Write performance suffers compared to RAID 0 striping due to the parity calculation overhead.
  • Minimum 3 drives increases cost compared to RAID 1 mirroring with 2 drives.

RAID 6 – Double Parity for Higher Redundancy

RAID 6 provides higher redundancy than RAID 5 by using double distributed parity:

  • Minimum 4 drives are required
  • Data is striped across drives like RAID 5
  • Two independent parity blocks are calculated and written across the drives
  • Protection against failure of up to 2 drives
  • Much lower risk of data loss during rebuilds compared to RAID 5
  • Very high redundancy for mission critical data

The tradeoff is that write performance suffers even more than RAID 5 due to the higher parity overhead. Costs are also higher due to requiring a minimum of 4 drives.

RAID 6 becomes advantageous compared to RAID 5 in situations where:

  • Very large capacity drives are used, increasing rebuild times and risk of UREs with RAID 5.
  • Absolutely no chance of data loss can be tolerated.
  • Rebuilds need to complete as quickly as possible.

The downside to RAID 6 is the significantly slower write speeds due to dual parity calculation. This makes it unsuitable for some write intensive workloads.

Potential RAID 6 Use Cases

  • Media streaming servers – High read speed is maintained while very low risk of data loss during rebuilds.
  • Scientific data archives – Large amounts of irreplaceable data need strong protection.
  • Medical data repositories – No chance of data loss is acceptable.
  • Financial trading transaction logs – Fast rebuilds and redundancy needed.

RAID 10 – Mirroring + Striping for Performance and Redundancy

RAID 10 combines both drive mirroring and striping for redundancy plus high performance:

  • Minimum 4 drives, paired as mirrored sets
  • Data is mirrored on each drive pair
  • Mirrored pairs are then striped together into a RAID 0 array
  • Fast reads and writes due to RAID 0 striping
  • Full redundancy from RAID 1 mirroring

RAID 10 is advantageous for very high performance redundancy. But it comes at a high cost since mirroring requires twice as many drives compared to parity-based RAID levels. At least 4 drives are needed for any appreciable capacity and redundancy.

Potential RAID 10 Use Cases

  • Database servers needing maximum performance and redundancy.
  • High volume transaction systems that require increased fault tolerance.
  • High performance computing clusters where downtime is unacceptable.
  • Critical applications requiring high speed mirrored redundancy.

Choosing the Optimal RAID Level

There is no single “best” RAID type for all scenarios. The right RAID level depends on your specific requirements for redundancy, performance, cost and drive counts. Here are some key considerations when choosing RAID level:

  • Application performance needs – Read/write performance requirements of the workload. RAID 0 for maximum speed. RAID 1/10 for fast mirrored speed. RAID 5/6 offer good performance with parity tradeoff.
  • Importance of redundancy – Level of redundancy and fault tolerance needed. RAID 0 has none. RAID 1/10 provide full mirroring. RAID 5 has single distributed parity. RAID 6 offers high redundancy with double parity.
  • Drive failure impacts – Size of drives and tolerance for drive rebuild times. Large drives increase RAID 5 rebuild times and risk of UREs during rebuilds. RAID 6 is better for large drives.
  • Number of drives available – Each RAID level has a minimum drive requirement. At least 2 drives needed for any redundancy.
  • Cost considerations – RAID types with more redundancy require more drives. RAID 10 has highest cost. RAID 5 provides good redundancy with lower cost than mirroring.

Below is a comparison summary of popular RAID levels to help guide the decision process:

RAID 0 RAID 1 RAID 5 RAID 6 RAID 10
Minimum Drives 2 2 3 4 4
Redundancy None High Medium High High
Read Performance High Medium Medium Medium High
Write Performance High Low Medium Low High
Cost Low Medium Low High High

Software vs Hardware RAID

RAID can be implemented in software or hardware:

  • Software RAID – RAID is managed by the operating system. More flexible but consumes CPU resources.
  • Hardware RAID – Dedicated RAID controller handles RAID tasks. Less flexible but faster and lower CPU usage.

Software RAID can be configured on any system and is generally OS dependent. Hardware RAID uses proprietary RAID cards and firmware so is less flexible but offers better performance. Hardware RAID costs more due to the controller card but reduces load on the CPU.

For performance sensitive applications like busy databases or real-time streaming, hardware RAID is preferred. Software RAID provides a lower cost option for less intensive workloads.

Expanding RAID Arrays

Most RAID levels allow for adding additional storage capacity via expanding the array. There are a few methods for RAID expansion:

  • Adding drives – Additional drives can be added to the array to increase total capacity. This applies to RAID 5, 6 and 10. May require rebuilding parity or remirroring.
  • Replacing with larger drives – Drives can be swapped out one-by-one with larger capacity models. The array is rebuilt and capacity incrementally increased with each swap.
  • Online capacity expansion – Some RAID controllers support seamless expansion by migrating data to larger partitions on existing disks.

Expanding by adding drives has minimal downtime but can impact performance during parity or mirror rebuilds. Replacing drives is non-disruptive but requires systematically swapping out each drive. Online capacity expansion is seamless but dependent on controller support.

Recovering from Drive Failures

A key benefit of redundant RAID levels is the ability to withstand and recover from drive failures:

  • For RAID 1, the mirrored drive can directly service requests until the failed drive is replaced and remirrored.
  • For RAID 5 and 6, the parity blocks are used to mathematically rebuild missing data blocks from the failed drive.
  • The RAID controller automatically begins drive rebuilds when a new replacement drive is inserted.
  • Drive hot swapping allows the failed drive to be replaced without turning off the system.

Once the failed drive has been swapped out, the RAID controller rebuilds affected data to the replacement drive. Access remains available during the rebuild process. Rebuild times depend on the RAID level, drive capacities, and controller speed. Large drives in RAID 5 arrays can result in very long rebuild windows.

Monitoring RAID Status

Monitoring RAID status helps identify issues before they cause failures and outages. Important RAID health indicators to watch include:

  • Current status – Overall status of the RAID volume like Online, Critical, Offline, etc.
  • Remaining space – How much usable space is left in the volume. Expand if getting too full.
  • Disk utilization – Busy disks with high utilization and queue depths can indicate performance issues.
  • Disk temperatures – Overheating drives can lead to failures.
  • S.M.A.R.T. errors – Flags from drives’ built-in monitoring indicate potential failures ahead.
  • Rebuild progress – Stalled or very slow rebuilds need attention to avoid extended vulnerable periods.
  • Up time – Lengthy up times increase risk of undetected bit rot and corruption issues.

Monitoring tools like snmpd and munin plugins can provide automated health status and alerting. Watching these indicators can prevent problems from occurring and catching issues early.

Backing Up RAID Arrays

Although RAID provides redundancy for drive failures, it is not a backup solution. Additional backups are required to protect against file corruption, accidental deletion, malware, and catastrophic system failure. Backup options for RAID arrays include:

  • External drive backups – Periodically back up important files and directories to external USB or networked drives.
  • Cloud backups – Replicate data to managed cloud storage services with versioning support.
  • Tape backups – Still one of the most reliable long-term archival options. Tape libraries automate offsite storage.
  • Snapshots – Take periodic read-only snapshots to provide restorable virtual copies as needed.

Ideally data should be backed up to both a local copy and offsite copy for comprehensive protection against all failure scenarios. Test backups frequently by performing test restores.

Conclusion

Choosing the right RAID level is a balance of redundancy, performance, and cost considerations. RAID 5 hits a sweet spot for many use cases needing good redundancy without the high cost of mirroring. RAID 6 offers higher redundancy for mission critical data using double parity. RAID 10 combines mirroring and striping for maximum performance and redundancy but at a higher cost. Monitoring health metrics and pairing RAID with comprehensive backups provides protection against all types of failures and data loss.