RAID (Redundant Array of Independent Disks) is a data storage technology that combines multiple hard disk drives into one unit. RAID provides increased storage performance, capacity, and reliability compared to single drives. There are several different RAID levels that provide varying combinations of performance, capacity, and fault tolerance.
RAID 0, also known as striping, combines two or more hard drives into one larger logical unit. Data is split up into blocks that get written across the drives in the array. This allows for faster read/write speeds since the workload is balanced across multiple disks. However, RAID 0 provides no redundancy – if one drive fails, all data will be lost. RAID 0 is best suited for non-critical data where performance is more important than data protection.
RAID 1, also known as mirroring, writes identical copies of data to two or more drives. This ensures data redundancy – if one drive fails, data can be recovered from the mirror drive(s). Read performance is improved since reads can be distributed across drives. However, write performance does not improve since every write must be copied to all drives. The usable capacity of a RAID 1 array is equal to the capacity of a single drive. RAID 1 provides the best performance of all the redundant RAID levels and is well-suited for mission critical data that requires maximum uptime.
RAID 5 stripes data and parity information across a minimum of three drives. The parity information allows for data recovery in case of a single drive failure. Upon failure, the missing data can be recreated from the parity information. RAID 5 provides a good balance between performance, capacity, and redundancy. Read performance is high since data is striped, and the workload is balanced across drives. Write performance is lower than RAID 0 since parity information needs to be calculated and written with each write. Usable capacity is equal to the total capacity of all drives, minus one drive worth used for parity. RAID 5 is a popular choice for mission critical data that requires redundancy on a budget.
RAID 6 is similar to RAID 5, but uses a second independent distributed parity scheme to provide protection against failure of two drives. This makes RAID 6 more fault tolerant than RAID 5, but also comes at a small performance cost. Like RAID 5, a minimum of three drives are required for RAID 6. Usable capacity is equal to the total capacity of all drives, minus two drives worth used for parity. RAID 6 is best suited for mission critical data that requires high redundancy and cannot tolerate two disk failures.
RAID 10 combines mirroring and striping to create a hybrid solution. It requires a minimum of four drives configured as two mirrored pairs (RAID 1). Data is then striped across the RAID 1 arrays, providing performance similar to RAID 0 but with redundancy. If a single drive fails, data can be rebuilt from the RAID 1 mirror. RAID 10 provides high performance and good redundancy, but at the cost of 50% usable capacity since half the total capacity is used for mirroring. RAID 10 is ideal for high performance applications that require fault tolerance and maximum uptime.
Comparison of RAID Levels
|1 drive failure
|1 drive failure
|2 drive failures
|1 drive failure
Where N is the total number of drives in the array.
Choosing the Right RAID Level
When selecting a RAID level, you need to balance performance, redundancy, and cost. Here are some guidelines for choosing the right RAID level:
- RAID 0 – When performance is critical but redundancy is not. Use for temporary data.
- RAID 1 – When redundancy is critical but you only need capacity of one drive. Use for OS drives or other critical data.
- RAID 5 – When you need redundancy but also want increased capacity. Use for important data that can tolerate slower writes.
- RAID 6 – When redundancy is absolutely critical. Use for data that cannot have any downtime.
- RAID 10 – When you need high performance and redundancy. Use for mission critical applications and databases.
In general, the more redundancy provided, the greater the performance impact. Lower RAID levels provide less redundancy but faster performance. Also consider the cost – higher RAID levels require more drives which increase overall storage costs.
Setting up a RAID Array
Setting up a RAID array requires RAID controller hardware or software. Here is an overview of the process:
- Select RAID controller – Either a plug-in RAID controller card, motherboard integrated RAID, or software RAID through the OS.
- Choose drives – Select the number and capacity of drives needed for the desired RAID level.
- Configure RAID level – Use the management interface for the RAID controller to define the RAID level and add drives.
- Initialize and format – Initialize the array and format with a filesystem (NTFS, exFAT, etc).
- Mount/use array – The OS will now show the RAID array as a single volume to store data.
Most RAID controllers provide management interfaces to monitor the RAID health and status. If a drive fails, the controller will alert you so it can be replaced. The RAID array will re-sync once the new drive is added rebuilding the fault tolerance.
Software vs Hardware RAID
RAID can be implemented in software or hardware:
- Software RAID – RAID is implemented at the operating system level in software. This allows using the system CPU and RAM. Software RAID can be configured on any system but has a performance penalty since it uses system resources.
- Hardware RAID – Uses dedicated RAID controller hardware with its own processor and RAM. Hardware RAID reduces the performance impact on the main CPU. But requires purchasing a RAID controller.
Hardware RAID generally provides better performance and does not impact the server CPU. But software RAID can still be a good option for cost-sensitive environments. The choice depends on budget and performance requirements.
Expanding a RAID Array
One advantage of RAID arrays is that many allow on-the-fly expansion by adding additional drives. The process depends on the RAID level:
- RAID 0 – Requires creating new array with all drives.
- RAID 1 – Add drive and rebuild mirror.
- RAID 5 – Add drive and expand capacity/reshuffle parity.
- RAID 6 – Add drive and expand capacity/reshuffle parity.
- RAID 10 – Add RAID 1 array.
The RAID controller will manage the process of expanding the array once new drives are added. This allows easily expanding storage capacity as needs grow over time.
Rebuilding Failed Drives
When a drive fails in a redundant RAID array, it can be replaced and rebuilt without downtime. The steps include:
- Physically replace failed drive with new drive.
- RAID controller detects new drive and starts rebuild process.
- Data and/or parity information is rebuilt on new drive.
- RAID array goes back online in redundant state once rebuild completes.
The time to rebuild depends on the RAID level, controller, and drive sizes. Large high-capacity drive arrays can take many hours or even days to rebuild. During the rebuild window, there is no redundancy until the rebuild completes.
Migrating a RAID Array
Existing RAID arrays can be migrated to new systems while retaining data integrity. This is done by moving all member drives to the new system’s RAID controller. There are two approaches:
- Block-level migration – Drives are moved one by one with the array rebuilding between each drive. More drives means more rebuilds.
- Swap controllers – The entire set of drives is moved to the new controller at once. No rebuilds required.
Block migration minimizes downtime by keeping the array online during the transition. But it requires more rebuilds. Swapping controllers has longer downtime but no rebuilds. The optimal approach depends on the RAID level, number of drives, and acceptable downtime.
Alternative RAID Implementations
Some alternative RAID implementations provide their own variations on RAID protection:
- ZFS – Software RAID built into the ZFS filesystem. Supports RAID equivalents and storage pooling.
- Windows Storage Spaces – Microsoft’s software RAID for Windows. Supports pooling and tiered storage.
- Linux MD-RAID – Linux kernel native software RAID.
- Btrfs – The Btrfs filesystem supports integrated RAID-like redundancy.
These implement RAID-like functionality without dedicated hardware. They provide an alternative for software-only and cost-optimized solutions. However, they may not match the performance of true hardware RAID controllers.
Virtualized and Software-Defined RAID
New options are emerging for implementing RAID in virtual and software-defined infrastructure:
- Virtual RAID – Virtual RAID controllers created in hypervisors like VMware ESXi.
- Software-defined storage – Abstraction and virtualization of storage hardware using software.
- RAID via SAN – Block-based SAN arrays that provide RAID management.
- Hyperconverged infrastructure – Software RAID across pooled compute and storage.
These virtualized and software-defined options remove the need for physical RAID cards while still providing pooled and redundant storage. This provides more flexibility but a potential performance tradeoff compared to dedicated hardware RAID.
Cloud-Based and Network RAID
Large scale cloud and network storage can utilize RAID technologies across multiple servers and geographic locations:
- Erasure coding – Mathematical parity encodings similar to RAID 5/6 spread across drives in multiple servers.
- Wide striping – Data is striped across hundreds or thousands of servers for performance.
- Distributed RAID – Parity-based RAID across nodes in different physical locations.
These implementations provide RAID-like fault tolerance and performance at massive scale across networks and data centers. However, they require advanced orchestration and distributed filesystems to coordinate.
RAID provides flexible options to balance performance, capacity, redundancy, and cost. Lower RAID levels maximize capacity and speed while higher levels focus on redundancy. Typical uses cases include:
- RAID 0 – Caching, video editing scratch disks
- RAID 1 – OS drives, small critical data
- RAID 5 – General file and application servers
- RAID 6 – Archival, backups, large storage pools
- RAID 10 – Transactional databases, email, high performance applications
When planning a RAID implementation, consider the data usage patterns and availability requirements to select the optimal RAID level. Hardware RAID provides the best performance but incurs cost. Software and virtual RAID trade some performance for flexibility and cost savings. The RAID level can be expanded or changed down the road to adapt to evolving needs.