How much capacity do you lose in RAID 5?

RAID 5 is a popular RAID level that provides a good balance between performance, capacity, and redundancy for storage systems. It stripes data across multiple disks like RAID 0, but also dedicates one disk’s worth of capacity for parity information to provide fault tolerance.

This parity mechanism allows the array to recover and rebuild data in case of a single disk failure. However, it also means that not all disks are available for actual data storage. So how much total capacity do you lose in a RAID 5 array compared to using the disks individually?

Quick Answer

In a RAID 5 array with N disks, you lose 1/N of the total raw capacity for parity storage. For example, in a 5 disk RAID 5 array, you lose 1/5 or 20% of the total capacity. In a large array with 12 disks, you only lose 1/12 or 8.3% of capacity for the parity disk.

RAID 5 Capacity Calculation

The basic formula for calculating the usable capacity in a RAID 5 array is:

(Number of Disks – 1) x (Disk Capacity)

So for example, take a RAID 5 array with 5 x 4TB SATA hard drives:

  • Number of disks (N) = 5
  • Disk capacity (C) = 4TB

Total raw capacity = 5 x 4TB = 20TB

RAID 5 total capacity = (5 – 1) x 4TB = 16TB

Capacity lost to parity = 20TB – 16TB = 4TB

Capacity loss percentage = 4TB / 20TB = 20%

So in this example, the total raw capacity is 20TB but the usable RAID 5 capacity is only 16TB, meaning we lose 1 disk worth or 20% as parity.

General RAID 5 Capacity Loss

In general for a RAID 5 array with N disks:

  • Total raw capacity = N x Disk capacity
  • RAID 5 total capacity = (N – 1) x Disk capacity
  • Capacity lost = Total raw capacity – RAID 5 capacity
  • Capacity loss percentage = Capacity lost / Total raw capacity

Plugging this into the formula:

Capacity loss percentage = 1/N x 100%

So in a 10 disk RAID 5 array, 1/10 of capacity is lost, which is 10%. As the number of disks grows, the parity overhead becomes smaller percentage wise.

Why Parity Disks Reduce Capacity

RAID 5 requires dedicated parity disks to store redundancy information. This allows the array to recover data if a disk fails. But it also means that not all disks are available for data.

For example, in a 5 disk RAID 5 array, one of the disks stores parity information calculated from the data on the other 4 disks. If one data disk fails, the parity disk can be used to reconstruct that lost data.

But this means that the parity disk itself does not store user data. So a 5 disk RAID 5 array only has 4 disks worth of actual capacity available for data storage. The 5th disk is for parity.

When to Use RAID 5?

RAID 5 provides a good balance of performance, capacity and redundancy for many applications including:

  • File and application servers
  • Database servers
  • Virtualization hosts
  • Medium sized NAS appliances

The capacity loss of a single disk may be acceptable to gain the redundancy. Larger arrays minimize this capacity loss since it is a smaller overall percentage.

Alternatives to Reclaim Capacity

If the capacity loss of RAID 5 is too high for your needs, alternatives include:

  • RAID 10 – Mirrors disks instead of parity for redundancy. Same capacity loss as RAID 1.
  • RAID 6 – Double parity provides additional redundancy but capacity loss of 2 disks.
  • RAID 50/60 – Nested RAID combines multiple RAID 5/6 arrays for larger capacity.
  • JBOD – Just a Bunch of Disks provides full capacity without redundancy.

Advanced technologies like erasure coding also allow more efficient parity calculations that use less overhead. But RAID 5 still provides a good starting point for redundant array storage in many cases.

RAID 5 Capacity Loss Examples

Here are some examples of RAID 5 capacity loss with different number of disks and individual disk size:

# Disks Disk Size Total Raw Capacity RAID 5 Capacity Capacity Loss Percentage Loss
3 2 TB 6 TB 4 TB 2 TB 33%
4 4 TB 16 TB 12 TB 4 TB 25%
6 6 TB 36 TB 30 TB 6 TB 16%
8 8 TB 64 TB 56 TB 8 TB 12%
12 10 TB 120 TB 110 TB 10 TB 8%

This illustrates how the capacity loss decreases as the number of disks in the RAID 5 array increases. With a larger number of disks, the parity overhead becomes a smaller overall percentage.

RAID 5 Capacity Calculator

You can use this RAID 5 capacity calculator to determine the usable capacity for your specific disk configuration:



GB

Factors that Reduce Effective Capacity

In addition to the parity disk overhead, other factors can further reduce the effective usable capacity in a RAID 5 array:

  • Disk formatting – Filesystems and partitioning consume some space on disks.
  • Spare disks – Some capacity may be reserved as hot spares.
  • Snapshot reserves – Some capacity may be reserved for snapshot space.
  • Disk failures – Degraded arrays have lower capacity.
  • Temporary space – Used during rebuilds after a disk failure.
  • Free space – Most arrays require some unallocated space.

These factors mean that the actual usable free space for data storage will likely be lower than the raw RAID 5 total capacity. Properly factoring in space for these conditions will provide a more accurate estimate.

RAID 5 Rebuild Impact on Capacity

When a disk fails in a RAID 5 array, a rebuild operation is required to reconstruct the data onto a replacement disk. During this rebuild time, the array is in a degraded state and has reduced redundancy and performance.

The rebuild also requires dedicated bandwidth and disk operations, which can temporarily impact the performance of other activities on the storage system.

In addition, a degraded RAID 5 has lower capacity since the missing disk is no longer contributing storage space. For example, in a 5 disk array that loses 1 disk, the total capacity is reduced to 3 disks worth during the rebuild process.

Mitigating Rebuild Impact

Some ways to minimize the impact of rebuilds on RAID 5 capacity include:

  • Use hot spare disks to start rebuilds immediately.
  • Select disks with faster rebuild times like SSDs.
  • Ensure sufficient I/O bandwidth for rebuild activity.
  • Consider larger RAID groups to minimize probability of failure.
  • Monitor disk health to proactively replace deteriorating disks.
  • Ensure proper ventilation and temperatures to avoid environmental disk failures.

RAID 5 vs Other RAID Levels

Compared to other common RAID types, RAID 5 offers a middle-ground for capacity overhead and redundancy:

RAID Type Min Disks Parity Disks Fault Tolerance Capacity Overhead
RAID 0 2 0 None 0%
RAID 1 2 0 1 Disk 50%
RAID 5 3 1 1 Disk 1 Disk
RAID 6 4 2 2 Disks 2 Disks

RAID 10 and 50 would have the same overhead as RAID 1 and 5 respectively, but with additional performance benefits from disk striping.

Stripe Size Impact on RAID 5 Capacity

The stripe size used for RAID 5 can also have a minor impact on available capacity. This is because writes smaller than the stripe size will still consume an entire stripe worth of space on each disk.

For example, with a stripe size of 128KB in a 4 disk RAID 5 array:

  • A 128KB write will use 128KB on each disk, totaling 512KB of space.
  • But a 64KB write will also use 128KB on each disk, wasting space.

In this case, a smaller 64KB stripe size would have less wasted space. But large stripes are optimal for sequential I/O performance.

To minimize wasted capacity, choose a RAID stripe size that matches your typical I/O patterns. But in general, the impact of stripe size on RAID 5 capacity efficiency is quite small.

Conclusion

RAID 5 provides a good balance of performance, capacity and redundancy for many storage systems. It does carry a capacity overhead cost of 1 disk compared to a non-redundant array. But the benefits of fault tolerance often outweigh this overhead.

When designing RAID 5 storage, be sure to factor in the capacity loss as well as impacts during rebuilds and other considerations that further reduce usable space. Carefully evaluating capacity requirements and matching this against your selected disk configuration will help right-size the array.

RAID 5 continues to be a popular choice for mid-range storage that needs redundancy without excessive capacity loss. When architected and managed properly, it can deliver optimal storage performance, capacity and resiliency.