Does RAID work with NVMe?

NVMe stands for Non-Volatile Memory Express, and is a protocol for accessing high-performance solid state drives (SSDs) that utilize the PCI Express (PCIe) interface. Compared to older SATA SSDs, NVMe SSDs offer significantly higher bandwidth and lower latency.

Some key benefits of NVMe SSDs over SATA SSDs include:

  • Higher throughput – NVMe SSDs can provide over 3,000 MB/s sequential read speeds, compared to around 550 MB/s for SATA SSDs. This is enabled by PCIe 3.0 x4 connectivity. (Source)
  • Lower latency – NVMe SSDs have much lower read/write latency, with 4K random read latencies around 10-20 microseconds versus 50-100 microseconds for SATA SSDs.
  • More parallelism – The NVMe protocol allows for much higher queue depths and more parallel commands.

Overall, the combination of higher bandwidth and lower latency makes NVMe SSDs much faster for tasks like booting, loading applications, transferring large files, and running demanding workloads. The performance difference can be quite substantial compared to SATA SSDs.

What is RAID?

RAID stands for Redundant Array of Independent Disks. It is a data storage technology that combines multiple disk drive components into a logical unit. Data is distributed across the drives in one of several ways called “RAID levels”, depending on the required level of redundancy and performance.

The different RAID levels include:

  • RAID 0 – Also called disk striping, RAID 0 splits data evenly across two or more disks with no parity information for redundancy. RAID 0 offers fast performance but no fault tolerance.1
  • RAID 1 – Also known as disk mirroring, RAID 1 duplicates data across disk drives to provide full redundancy. If one drive fails, the other contains an exact copy of the data.2
  • RAID 5 – RAID 5 stripes data and parity information across three or more disks. If one disk fails, the parity information can be used to reconstruct the data from the failed drive.1
  • RAID 10 – RAID 10 combines disk mirroring and disk striping for both performance and redundancy. Data is mirrored and striped across drives for fast performance and the ability to withstand multiple drive failures.2

Challenges of Implementing RAID with NVMe

Implementing RAID with NVMe SSDs poses some unique challenges compared to traditional hard disk drives (HDDs). The key differences stem from the fact that NVMe SSDs are much faster than HDDs, with potential speeds up to 3.5x faster for sequential reads and 11x faster for random reads (source). This speed advantage brings new considerations for RAID implementation.

First, traditional RAID controllers and backplanes may struggle to keep up with the fast speeds of NVMe SSDs. Most legacy RAID cards and backplanes were designed with slower HDDs in mind and can become a bottleneck, limiting the overall performance. Upgrading to NVMe-compatible RAID controllers and backplanes that can match the NVMe SSD performance is often necessary to realize the full benefits.

Second, the parallelism of NVMe SSDs changes some RAID calculations. With the ability to queue multiple I/O operations simultaneously, NVMe SSDs can saturate the PCIe bus bandwidth easier. This impacts RAID write penalty calculations, as multiple SSDs writing in parallel have a compound effect on overall performance. Software and hardware RAID solutions need to account for this behavior to optimize performance.

RAID Solutions for NVMe SSDs

There are a few options for enabling RAID with NVMe drives:

HBA RAID Cards
Dedicated HBA (Host Bus Adapter) RAID cards like the Highpoint 7120 are one solution for hardware RAID with NVMe drives. An HBA RAID card has its own onboard RAID processor to handle all RAID calculations and management. This offloads the work from the CPU for better performance. HBA RAID cards connect directly to the PCIe bus and provide ports for installing NVMe drives. A key advantage of HBA RAID cards is they support nearly all RAID modes including 0, 1, 5, 6, 10 etc. The downside is cost, as HBA RAID cards can be quite expensive.

Motherboard/Chipset RAID

Many modern motherboards and chipsets have built-in RAID support, like Intel RST or AMD RAID. This allows creating a bootable NVMe RAID array without any additional hardware. However, motherboard RAID has limitations – it may not support RAID 5/6, and performance can suffer since the RAID calculations use CPU resources. The choice between an HBA RAID card versus motherboard RAID comes down to budget and performance requirements for the particular use case.

Performance Considerations

When implementing RAID with NVMe SSDs, it’s important to consider the potential performance impact of RAID overhead. Though NVMe SSDs offer incredibly fast sequential read/write speeds, combining multiple NVMe SSDs into a RAID array can reduce some of these gains.

One of the main factors is RAID write penalty. When writing data in a RAID 5 or RAID 6 array, parity information needs to be calculated and written across the drives. This requires additional computations and write operations compared to a single drive or RAID 0 array, reducing overall write performance. Estimates of this write penalty are around 25-35% for RAID 5 and 35-55% for RAID 6 configurations [1].

There is also some processing overhead for the RAID controller to manage the array, calculate parity, and coordinate IO across multiple drives. Top performing RAID controllers designed for NVMe can minimize this, but it’s rarely nonexistent. This can result in small reductions in performance versus the raw drive speeds.

When benchmarking RAID performance with NVMe SSDs, results will almost always be lower than the sum of the maximum throughput of the raw drives. However, well-implemented NVMe RAID can still achieve exceptional speeds – often doubling or tripling the performance of a single SSD. RAID 0 can reach up to 28.5GB/s sequential reads in benchmarks [2]. For most workloads, NVMe RAID delivers substantial performance gains over a single SSD.

Use Cases and Recommendations

NVMe RAID can provide significant performance benefits for the right use cases, but may be unnecessary or even detrimental in some situations. Here are some recommendations on when NVMe RAID is most applicable:

Good applications for NVMe RAID include:

  • Video production/editing – The fast sequential speeds of RAID 0 can dramatically improve workflow.
  • Scientific computing/machine learning – RAID 0 improves parallel data access for high performance computing.
  • Database servers – Low latency and high IOPS of RAID 0 improves database performance.
  • Virtualization – RAID 10 provides redundancy along with better random read/writes for VMs.

Situations where RAID may not be beneficial:

  • Boot drive – The latency of RAID can slightly slow down booting compared to a single NVMe drive.
  • Read-heavy workloads – A single NVMe drive may provide enough performance for mostly reads.
  • Lower queue depth workloads – RAID advantages diminish at low queue depths.
  • Cost-sensitive applications – RAID adds expense compared to a single NVMe drive.

The decision depends on workload characteristics and performance requirements. For highly parallel workloads that can leverage the throughput, NVMe RAID can be compelling. But for lighter workloads, a single NVMe drive may suffice.

Software vs Hardware RAID for NVMe SSDs

When implementing RAID with NVMe SSDs, you have the choice between software RAID and hardware RAID solutions. There are pros and cons to each approach:

Software RAID relies on RAID functionality built into the operating system or third-party software. The benefits of software RAID include:

  • Lower cost since it doesn’t require a hardware RAID controller.
  • Flexibility in configuring and managing the RAID set.
  • Ability to use advanced RAID modes like RAID-5 and RAID-6.

However, software RAID has some downsides:

  • Higher CPU utilization since RAID calculations are handled by the CPU.
  • Slower performance compared to hardware RAID.
  • Lack of battery-backup for cache data in case of power failure.

Hardware RAID uses a dedicated RAID controller card to manage the RAID array. Benefits of hardware RAID include:

  • Faster performance by offloading RAID processing to the controller.
  • Lower CPU usage by reducing load on the CPU.
  • More mature and stable drivers compared to software RAID.
  • Cache data protection using battery-backup in case of power loss.

The limitations of hardware RAID are:

  • Higher cost due to the RAID controller card.
  • Less flexibility in RAID configuration and disk management.
  • Limited to basic RAID modes like RAID-0, RAID-1, and RAID-10.

Overall, hardware RAID tends to provide better performance while software RAID offers more flexibility. The choice depends on budget, performance needs, and RAID modes required.[1]

Managing NVMe RAID Sets

Once you have set up your NVMe RAID array, it’s important to proactively monitor the health and performance of the drives to identify any potential issues. The Dell PowerEdge RAID Controller software provides tools to check the status of NVMe SSDs in a RAID configuration (Dell). This allows you to see details like drive temperature, media and data errors, wear status, and more. Third-party tools like StorCLI can also be used to monitor NVMe RAID health.

If a drive in the RAID array fails, it will need to be replaced and the data rebuilt onto the new drive to restore redundancy. The RAID controller will automatically begin rebuilding data once a new drive is inserted, but this process can be monitored in the management software. Rebuilding NVMe drives tends to be faster than traditional SATA SSDs due to the increased bandwidth of PCIe. However, rebuilding large multi-terabyte NVMe RAID arrays can still take hours or days to complete.

To minimize downtime, hot spare drives can be configured so that the rebuilding process starts immediately if a drive fails. Monitoring tools like email alerts can also give prompt notification of drive failures or degradation. Overall, proactively monitoring NVMe RAID health and having a rebuilding plan helps maximize uptime and data protection.

Alternatives to RAID

While RAID remains a popular choice for improving performance, redundancy, and reliability, there are some alternatives that may be worth considering for NVMe environments. Some popular alternatives include:

Drive Mirroring and Striping

Drive mirroring creates an identical copy of a drive to provide redundancy in case one fails. Although mirroring doubles the storage space required, it avoids the performance overhead of parity calculations used in RAID 5/6. Striping spreads data across multiple drives to improve performance but does not provide fault tolerance. These options can be configured through OS tools like Windows Storage Spaces.

For more information see:
Reddit discussion on NVMe RAID options

ZFS and Storage Spaces

ZFS and Storage Spaces are advanced logical volume managers that support software RAID alternatives using copy-on-write, snapshots, data integrity checks, and other features. They allow creating pool-based storage with flexible configurations. Compared to classic hardware RAID, they provide more flexibility and advanced management.

For more details see:
Hardware vs Software RAID Comparison

Conclusion

In summary, RAID can work with NVMe SSDs, but there are important factors to consider. While NVMe SSDs offer incredibly fast speeds, implementing RAID adds complexity. NVMe’s parallelism and performance can be hampered by RAID overhead. Careful matching of controller, drives, and software/hardware RAID implementation is needed. RAID 5/6 parity calculations can limit NVMe SSD throughput if not carefully optimized.

For most consumer use cases, RAID may provide minimal advantages over a single modern NVMe SSD with robust internal redundancy. But for enterprise and intensive workloads, the redundancy and performance gains of NVMe RAID can be worthwhile if configured properly. The decision depends on workload, required redundancy levels, and budget. With careful planning, NVMe RAID can successfully combine the speed of NVMe SSDs and the protection of RAID.