Are hard drives good for archiving?

Hard drives have long been used for archiving data, but are they really the best solution for long-term storage? There are several key factors to consider when evaluating hard drives for archiving purposes.

Reliability

Reliability is one of the most important factors for an archival medium. After all, if you can’t depend on being able to access the data in the future, the storage is useless. Hard drives are reasonably reliable, but they do fail eventually. One study found an annual failure rate of around 4% for consumer-grade hard drives. Enterprise-class drives were slightly better at around 2-3%. So if you archive data on a hard drive you can expect it to fail every few years or so. This makes single hard drives a poor choice for long-term archiving of important data. Redundancy and backup are required.

Durability

Closely related to reliability is the durability of hard drives. How long will the stored data last before degrading to the point where it is unreadable? HDDs are susceptible to damage from shock, vibration, magnetism, temperature extremes, and more. Specialized data center drives in temperature controlled environments may last 5-10 years, while consumer external drives used in a normal environment will degrade much quicker. Under ideal conditions hard drives may retain data for 10-20 years, but real-world conditions are never ideal. For true long-term archiving measured in decades, hard drives are a poor medium.

Maintainability

Another key factor for archives is how easy the data is to maintain for long periods. Hard drives fail, interfaces change, software moves on. This requires regular effort to maintain the archived data. It means copying data to new drives periodically as old ones fail. Migrating data to new interfaces or formats when old ones become obsolete. Keeping compatible software alive to be able to read old proprietary data and disk formats. All of this takes work. Realistically it requires an ongoing commitment of resources to maintain a long term hard drive archive.

Cost Effectiveness

Hard drive storage seems inexpensive at first glance, but the cost adds up for larger archives. Not just the up front cost of drives, but the ongoing costs of electricity, maintenance, storage infrastructure, and periodic data migration to new hardware. When taking the total cost into consideration, hard drives become less appealing for massive archiving needs. The energy and resource costs of operating a large data center full of spinning hard drives can be substantial.

Capacity

Large archival needs require lots of storage space, which hard drives can provide. Drives are available at up to 20TB currently, with higher capacities coming. Data centers can house racks and racks of drives to build enormous petabyte archives. The downside is that all that capacity in one place raises the stakes. If a multi-petabyte data center suffers a catastrophic failure, the amount of lost data could be massive. Again redundancy and geographic distribution become critical.

Access Speed

Hard drives are relatively slow for random access compared to memory, flash storage, and tape drives. But sequential access speeds can reach into the hundreds of megabytes per second. For many archive use cases where data is infrequently accessed or streamed sequentially, HDDs provide adequate performance.

Portability

Individual hard drives are highly portable and removable, but a large storage array is not very portable. This limits options for physical relocation or distribution of data archives. For the greatest portability and distribution, removable media like tapes work better. Cloud storage also offers more flexibility for accessing archives from different locations.

Security

Hard drives offer reasonable data security if protected by physical access controls and encryption. However they are still vulnerable to damage, theft, and unauthorized access if not properly secured. Offline drives like tapes and write-once optical discs can improve security. Cloud storage security is dependent on the provider.

Environmental Impact

With rising energy costs and increasing focus on environmental sustainability, the environmental impact of data archives is becoming more important. Spinning hard drives consume significant power for the motors and cooling systems. Multi-petabyte data centers can use staggering amounts of energy. Increased use of tape, flash, and cloud storage can help minimize environmental impact.

Longevity

How far into the future must your archive remain viable? For short to medium term archiving of a few years to a decade, hard drives work well. But for longer term archives they require constant migration to new systems. This may not be practical for very long time spans. Other media like tape, film, optical discs, or engraved metal plates have much longer theoretical lifespans. Cloud storage life expectancy will depend on the stability of the service provider.

Drive Interface

Hard drive interfaces change over time. Each change requires migrating all archive data to the new interface which adds complexity. Common interfaces have included ATA, SATA, SAS, SCSI, FC, and now NVMe is emerging. Choosing hard drives ties you to constantly evolving drive standards. An external interface like USB suffers similar issues. Tape drives and cloud storage insulate you from internal interface changes.

Magnetic Longevity

Hard disks use magnetization to store data. But magnetic charge fades slowly over time leading to data loss. How fast this occurs depends on the disk technology. Estimates range from a year or two for enterprise drives, to 10-20 years for high quality consumer grade drives. So even with ideal environmental conditions hard drives have a practical limit on magnetic data retention.

Physical Size

Hard drives have shrunk dramatically over the years, from room-sized mainframes to internal drives smaller than a deck of cards. But they still require substantial physical space for storage arrays and data center infrastructure. Tape and cloud storage have a much smaller physical footprint by removing most hardware from the primary storage location.

Geographic Distribution

Keeping archive copies in multiple geographic locations is a challenge for large hard drive archives. It requires data synchronization across multiple data centers which adds complexity. Cloud storage makes geographic distribution much simpler since the provider handles it. Tape archives can also be easily distributed.

Media Degradation

Magnetic storage is vulnerable to degradation of the physical media. Bits stored magnetically can be altered by external magnetic fields. The substrate can deform or shrink over time. Generally hard drives are not rated for archival data retention of more than a few years. This compares poorly to etched metal plates or engraving that may last centuries.

Offline Access

Data stored on hard drives is always online and accessible. This is convenient for active archives, but offline media like tape has advantages for security and energy efficiency. Tape cartridges can be stored offline in power-free environments. When needed, they can be loaded into a drive and mounted like removable media.

Disaster Recovery

A key role of archives is preserving data through unexpected disasters like fires, floods, and earthquakes. Hard drive archives are vulnerable. A disaster that destroys the data center could wipe out the entire archive. Again tape drives and cloud storage offer more flexibility to recover from catastrophic data loss. Tapes can be stored offsite or distributed geographically.

Miried Media

For the most robust archives, storing data on mixed media in multiple locations provides the best protection. This might mean a combination of hard drives for active access, tape drives for offline redundancy, and cloud storage for geographic distribution. Critical archives may also backup to formats like microfilm that have exceptionally long lifespans.

Drive Density

Storage density measures how compactly data can be stored. Hard drive areal density has steadily increased over time, packing more data onto each disk platter. But optical discs reached higher densities long ago. Blu-ray Discs top out around 25GB per layer, while HDDs average around 1GB per platter. Increased density means more storage in less space.

Hard Drive Density Over Time

Year Density
2000 17 GB/in2
2005 60 GB/in2
2010 370 GB/in2
2015 1 TB/in2
2020 2.4 TB/in2

Density keeps increasing, but is still lower than other archive media. Higher density means storing more data in less physical space.

Archive Media Density Comparison

Media Density
HDD 1-3 TB/in2
Tape 10-100 GB/in2
BD Disc 25 GB/in2
Microfilm 100 GB/in2

Hard drives have high density compared to tape, but lower than optical discs and microfilm. This gives optical and microfilm advantages for compact long-term storage.

Random I/O

Hard drives perform better for workloads dominated by random reads and writes rather than sequential I/O. This makes them suitable for applications like databases and virtual machines. But media like tape and optical discs are very sequential, slower for random I/O. The access pattern impacts which media works best.

Erasing and Rewriting

Hard drives and tape drives are both rewritable storage media. Data can be erased and rewritten repeatedly throughout their lifetime. This is useful for active archives where data needs to change frequently. WORM storage media like engraved metal plates cannot be changed once written. Rewritable media enables updating archived data.

Random Access

Hard drives allow random access to stored files. This enables quickly retrieving specific data on demand. Other media like tape has to scan sequentially to find requested data, so random access is much slower. Quick random access is critical for active archives.

Conclusion

Hard drives can serve a role in archival storage, but have limitations for true long-term archiving measured in decades. Their strengths are low initial cost, high capacity, frequent rewritability, and random access performance. Weaknesses include reliability, durability, maintainability, and environmental impact.

For active archives that change regularly and require frequent random access, hard drives work well. But they should be combined with tape or cloud backups for redundancy.

For true long-term archival storage, tape, film, optical discs, and engraved metal plates last much longer. Cloud storage life expectancy depends on the stability of the provider. And distribution across mixed media in multiple locations provides the most secure long-term archive.