What are the components of a disk array?

A disk array, also known as a storage array or disk subsystem, is a data storage system used in computing environments to increase storage performance, capacity, and reliability. Disk arrays are composed of multiple disk drives working together as a consolidated storage resource. The components that make up a disk array include:

Disk Drives

The disk drives are the core component of a disk array system. They provide the actual storage capacity and are where data is physically written. Disk arrays contain multiple disk drives, which can number from just a few drives to hundreds of drives depending on the storage capacity required.

The most common types of disk drives used in today’s disk array systems include:

  • SATA (Serial ATA) – Common in entry-level and mid-range disk array systems. Cost effective per GB but lower performance than other drive types.
  • SAS (Serial Attached SCSI) – Used in enterprise-class disk arrays. Provides better performance than SATA with more capabilities.
  • SSD (Solid State Drives) – Offer highest performance but at a higher cost per GB than HDDs. Often used to store frequently accessed data.
  • NVMe (Non-Volatile Memory Express) – Emerging high speed solid state drive protocol designed to utilize the PCIe bus. Provides very low latency and high IOPS.

The configuration of the disk drives within the array can vary as well. Some common arrangements include:

  • RAID – Redundant Array of Independent Disks. Provides data protection through techniques like mirroring or parity.
  • JBOD – Just a Bunch of Disks. The drives are not combined into a logical volume but act as independent drives.

Storage Processors

The storage processors (also known as controllers) are the brains of the disk array system. They manage all read and write requests to the disk drives, cache data to improve performance, execute RAID configurations for protection, and provide the storage interface connectivity.

High-end enterprise disk arrays often have multiple storage processors configured for high availability. If one storage processor fails, the other will take over automatically. The storage processors also scale performance by sharing the workload, which supports more I/O operations and faster response times.

Cache Memory

Most disk array systems utilize some amount of cache memory, usually DRAM, to temporarily hold data that is frequently accessed. Reading data from cache is much faster than having to access the disk drives. This improves overall system performance significantly. The cache can be present on the storage processors themselves and also sometimes on the actual disk drives as well. The cache memory is volatile, meaning data will be lost if power to the array is interrupted. However, backups to disk are continually made to avoid data loss.

Internal Interfaces

There are internal interconnects that link together the components inside a disk array such as the storage processors, disk drives, and cache memory. These interfaces need to support the bandwidth required for the anticipated workload. Some common internal interface technologies include:

  • SAS – Serial Attached SCSI
  • SATA – Serial ATA
  • FC – Fibre Channel
  • InfiniBand

Higher performance arrays will leverage faster interconnect technologies like Infiniband to handle heavy transactional workloads. The internal architecture can be designed for redundancy as well so that no single interconnect failure can take down the array.

Enclosure

The disk drives, storage processors, internal interfaces, and supporting components are all housed together inside an enclosure. This consolidated unit makes the disk array system modular, self-contained, and rack mountable. Most enclosures also include:

  • Power supplies – Redundant modules that protect against power source failures.
  • Cooling fans – Maintain proper air flow and temperature.
  • Management ports – Allow monitoring and administration.

Enterprise-grade disk arrays will offer high availability and redundancy of all these supporting resources inside the enclosure.

RAID Configurations

As mentioned previously, RAID or Redundant Array of Independent Disks is a common method of arranging the disk drives to enhance performance, capacity, or reliability. There are several types of RAID levels that can be configured in a disk array system:

RAID 0

Also known as disk striping. Data is spread evenly across multiple drives in chunks to improve performance. However, it does not provide any fault tolerance. If one drive fails, all data will be lost.

RAID 1

Also known as disk mirroring. Identical copies of data are written to redundant drives, providing full data protection if one drive fails. But the usable capacity is only equal to one drive.

RAID 5

Data is striped across drives like RAID 0, but parity information is also calculated and written across the disk array. This allows data to be recreated if a single disk fails.

RAID 6

Similar to RAID 5 but with double distributed parity, protecting against the loss of two disk drives. Write performance may be slower than RAID 5 however.

RAID 10

Combines mirroring and striping for both performance and fault tolerance. Data is mirrored then striped across sets of drives. Provides protection against multiple drive failures if in different mirrored sets.

Host Interfaces

The host interface allows external servers and computers to connect to the storage array system. There are many standard options available including:

  • Fibre Channel
  • SAS
  • iSCSI
  • FCoE (Fibre Channel over Ethernet)
  • Infiniband

The interface selected will depend on the speed, cable length, and protocol support required by the systems accessing the storage array. Multiple ports are usually provided for redundancy and greater bandwidth.

Management Software

Disk array systems include management software for configuration, monitoring and maintenance tasks. This can include:

  • Graphical user interfaces
  • Support for scripting and automation
  • Tools for provisioning, snapshots, replication, etc.
  • Alerts and monitoring
  • Integration with virtualization and cloud systems

Easy to use management tools are key for efficient administration and operational management of the array.

Encryption

With the rise of regulatory compliance and data privacy regulations, encryption is becoming an increasingly important capability for storage systems and disk arrays. Full disk encryption protects data at rest on the drives. Encryption can be applied selectively to certain volumes or datasets as well based on policies. This prevents unauthorized access to sensitive data if a disk drive is removed or compromised.

Data Protection Features

In addition to RAID for protecting against disk failures, disk array systems may offer other data protection and recovery capabilities including:

  • Snapshots – Point in time copies of data volumes or LUNs.
  • Cloning – Duplicate LUNs or volumes instantly.
  • Replication – Sync or replicate data between arrays.
  • Backup support – Integration with backup software.
  • Disaster recovery -Features for maintaining business continuity.

Storage Tiering

To optimize performance and costs, storage tiering attempts to automatically place data on the most appropriate storage media. This could involve:

  • SSDs for the most performance sensitive data.
  • SAS or SATA HDDs for less frequently accessed data.
  • Tape backup for archival data.

By utilizing different media, storage tiering improves performance while also reducing costs compared to placing all data on the fastest storage.

Caching Servers

High performance disk arrays may use caching servers, which are servers filled with RAM that sit between the storage array and application servers. They utilize large memory pools to cache active data and buffer writes. This lowers the data load on the primary storage array while providing faster data access.

Qualities of Enterprise Arrays

Disk arrays designed for enterprise and mission critical workloads emphasize these key characteristics:

  • High availability – Eliminate single points of failure and ensure 24/7 uptime for continuous operations.
  • Performance – Provide low latency, high IOPS, and bandwidth to support demanding workloads.
  • Scalability – Scale capacity and performance as needed through additional drives, processors, memory, etc.
  • Data protection – Use RAID, snapshots, replication and other features to prevent data loss.
  • Management – Include monitoring, automation and tools to simplify administration.

Entry-Level Arrays

Small office and entry-level disk arrays prioritize these attributes:

  • Low cost – Less expensive than enterprise arrays while providing basic shared storage.
  • Easy to use – Web-based management and wizards simplify configuration and administration.
  • All-in-one – Converged units combine storage, networking and servers.

They utilize commercial off-the-shelf components and technologies like SATA drives, RAID levels and Gigabit Ethernet connectivity. Performance, scalability and availability are limited compared to high-end arrays.

Hyperconverged Infrastructure (HCI)

Hyperconverged infrastructure combines storage, compute, networking and virtualization into an integrated software-defined system. The storage is provided by drives in each node, aggregated into a virtual SAN accessed over the network. This creates a highly scalable and resilient shared storage architecture using commodity hardware.

All-Flash Arrays

All-flash arrays contain only solid state flash drives for storage instead of mechanical hard disk drives. The advantages of all-flash storage include:

  • Much lower latency – 10-100x faster than HDD arrays.
  • Higher IOPS and bandwidth for demanding workloads.
  • Lower power and cooling requirements.
  • Smaller rack space footprint.

The tradeoff is a higher cost per gigabyte compared to HDD-based storage. Therefore all-flash arrays are targeted at performance-sensitive workloads.

Object Based Storage

Object storage manages data as objects instead of blocks or files. It utilizes commodity infrastructure with metadata to store petabyte-scale repositories. Object stores provide:

  • Massive scalability for huge amounts of unstructured data.
  • Geographic distribution capabilities.
  • Metadata tagging and retrieval.
  • Replication for data protection.

Applications include cloud storage, archives, big data analytics, medical imaging, and digital media repositories.

Direct-Attached Storage (DAS)

Direct-attached storage refers to storage devices that are connected directly to a server, usually internal hard drives or enclosures connected via cables to external ports. DAS provides isolated storage that can only be accessed by the attached server. While less complex than a storage area network, DAS also lacks shared access and centralized management capabilities.

Network-Attached Storage (NAS)

NAS systems are storage appliances that connect to the network, often via Ethernet. They contain on-board storage and operating systems specialized for file-based protocols like NFS or SMB/CIFS. NAS arrays allow shared storage access by multiple clients on the network through standard file sharing protocols.

Storage Area Networks (SANs)

SAN refers to a dedicated, high speed network using protocols like Fibre Channel, iSCSI or FCoE to provide block-level access to consolidated storage. Disk arrays are connected to SAN fabrics to enable shared block storage connectivity between multiple servers. A key advantage of SANs is that storage resources can be centrally managed while being accessed by servers.

Software-Defined Storage (SDS)

Software-defined storage abstracts storage hardware resources into software-based services. It utilizes commodity infrastructure while value-added services are provided via software. SDS enables on-demand provisioning, automated tiering, seamless scaling and other benefits. It also allows storage administrator to centrally manage diverse storage resources across data center and cloud environments.

Cold/Warm/Hot Storage Tiers

Storage systems may implement different tiers based on the frequency of data access:

  • Hot storage – Houses actively used data on fast media like flash SSDs.
  • Warm storage – Stores moderately accessed data on HDDs, SSDs or cloud storage.
  • Cold/archival storage – Infrequently accessed data moved to cheaper storage like tape or cloud archives.

This strategy optimizes costs while maintaining performance for different data usage patterns.

Conclusion

Disk arrays provide expandable storage capacity, resilience and performance through a combination of technologies like RAID, caching, tiering and redundancy features. The components are designed to meet requirements ranging from small offices to enterprise data centers. Careful consideration of workload patterns, scalability needs and availability requirements will guide selection of the optimal disk array components and architecture.