Are SMR drives reliable?

What are SMR drives?

Shingled magnetic recording (SMR) drives are a type of hard disk drive (HDD) that increases storage density by overlapping the data tracks on the platter, similar to how shingles on a roof overlap. This allows SMR drives to pack more data onto each platter compared to conventional magnetic recording (CMR) drives 1.

Unlike CMR drives which have distinct magnetic tracks, SMR drives write new tracks that overlap part of the previously written magnetic track. This overlapping design eliminates the need for space between tracks, enabling higher track density. The overlapping tracks resemble the layered pattern of shingles on a roof, which is how SMR got its name.

The key benefits of SMR drives are lower cost per gigabyte and higher storage density compared to traditional CMR drives. By overlapping tracks, SMR squeezes 50% more tracks onto a platter. This design innovation allows SMR drives to offer more capacity at a lower price point than CMR 2.

History and adoption of SMR

Shingled magnetic recording (SMR) was first developed by Seagate in 2009 as a way to increase storage density in hard disk drives (HDDs) [1]. The technology works by overlapping or “shingling” the data tracks on a drive platter to pack more data in the same physical space.

Major hard drive manufacturers like Seagate, Toshiba, and Western Digital have all released SMR-based drives for consumer and enterprise use. However, adoption was initially slow due to concerns around performance and reliability [2]. It wasn’t until around 2018-2020 that SMR drives started appearing more widely in desktop and NAS applications.

Today, SMR drives account for a significant portion of the overall HDD market. Seagate estimates that over 50% of HDD exabytes shipped are based on SMR technology as of 2020 [3]. However, adoption in enterprise and high-performance applications still lags behind consumer use.

Performance compared to CMR

Several performance benchmarks have shown that SMR drives have much slower read and write speeds compared to conventional magnetic recording (CMR) drives. This is due to the overlapping tracks which require more complex read/write operations 1. In sequential write tests, SMR drives can have over 2x slower performance than CMR drives of the same capacity. Random write performance is even worse on SMR due to constantly needing to rearrange tracks2.

The performance limitations of SMR can cause major issues when used in RAID arrays. The constant rewriting as data is distributed across drives puts a heavy strain on SMR. There are many reports of RAID arrays completely failing or becoming unavailable when rebuilt on SMR drives. For reliable RAID performance, most experts recommend using CMR drives instead.

SMR Caching and TRIM

SMR drives use a technology called caching to boost performance. With SMR, new writes are initially cached in a small conventional magnetic recording (CMR) area before being written sequentially to the shingled recording zones (Source 1). This caching helps absorb bursts of random writes and improve overall throughput.

However, the cache area has limited space. Once it fills up, write speeds drop drastically as data can no longer be quickly cached and must be directly written to the slower shingled zones. This is where TRIM comes in.

TRIM is a command supported by modern operating systems that lets the drive know which blocks of shingled data are no longer in use. With this information, the SMR drive can selectively overwrite these “trimmed” sectors without needing to retain the old data in cache (Source 2). By freeing up space in the cache, TRIM helps maintain consistent write performance.

So in summary, SMR caching provides a speed boost for bursty workloads, while TRIM support allows the cache to clear unused data, preventing the cache from filling up and write speeds cratering. Both features are crucial for acceptable real-world performance with SMR drives.

Reliability concerns

Some view SMR drives as less reliable than conventional magnetic recording (CMR) drives, especially under heavy workloads. This is because SMR drives write new data in narrow, overlapping tracks. This can slow down writes as the drive needs to read and rewrite adjacent tracks when appending data (https://www.techpowerup.com/forums/threads/smr-hdds-worse-reliability.294320/).

In contrast, CMR drives write new data in wider, distinct tracks. This allows for faster, more straightforward writes. For this reason, SMR drives may experience problems handling multiple simultaneous write tasks compared to CMR (https://forums.tomshardware.com/threads/cmr-vs-smr-is-smr-really-that-bad.3748294/). The SMR drive needs to pause and optimize writes by rearranging data, which can significantly reduce performance.

However, for sequential write workloads like backups or archives, SMR drives can perform well. The reliability concerns mainly apply to random write workloads. As long as the workload matches the SMR drive’s intended use case, reliability is often not an issue.

Using SMR Drives

SMR drives can be reliable and effective for certain use cases, such as:

  • Archival storage – SMR drives are well-suited for storing data that will primarily be read and not frequently overwritten, like media libraries, documents, backups, etc. The sequential write design is optimized for these large, cold datasets (source).
  • Secondary drives – SMR works well as a secondary drive for data that doesn’t need fast access speeds. This includes game libraries, downloaded videos, photos, and other personal media (source).
  • Backup drives – The large capacities make SMR drives well-suited as backup drives for periodic full-disk imaging. Writes are infrequent and backups emphasize capacity over performance.

However, CMR drives would be preferable in situations where fast random access is needed, such as:

  • OS or boot drives
  • Frequently accessed application data
  • Scratch disks for editing workflows
  • RAID arrays

For storing frequently changing or overwritten data, CMR offers better sustained performance. But SMR can work well for large, infrequently accessed datasets.

Drive failures and data recovery

Recovering data from failed SMR drives can be challenging compared to traditional CMR drives. Some of the key challenges include:

SMR drives use complex shingled writing techniques that overlap tracks, making it difficult to extract raw data. If part of a shingle is damaged, the whole group may be unreadable.

SMR drives have less spare area for sector reallocation. This means there are fewer reserve sectors to remap damaged areas.

The SMR translation layer obscures the physical location of logical stored data. This mapping must be reversed engineered to rebuild file systems.

TRIM and UNMAP commands can make data unrecoverable by actively zeroing out deleted files. This is done in the background by the drive itself.

Data recovery experts have developed specialized tools and techniques to handle SMR drive failures. These include:

Advanced firmware hacking and reverse engineering to bypass SMR algorithms and mappings.

Bit-level imaging and data extraction using forensic methods to maximize recoverable data.

Repairing the SMR translation layer metadata and mapping if corrupted.

Low level manipulation of the electrical properties of the platters to stabilize heads.

Utilizing the volatility of flash memory caches to find temporary copies of lost files.

In summary, SMR presents unique challenges that require expertise to attempt data recovery from failed drives.

SMR in Enterprise and Data Centers

There have been concerns around using SMR drives in enterprise and critical systems like data centers and servers. This is because SMR drives were initially optimized for archival storage and cold data, where write performance is less critical. However, vendors have improved SMR drive firmware and algorithms to make them suitable for more demanding workloads.

According to Western Digital, SMR HDDs are transitioning from niche to mainstream technology in data centers. The vendor claims that host-managed SMR drives can deliver significant TCO advantages for data centers looking to optimize capacity per rack.

Still, some enterprise storage vendors like Dell recommend against using SMR drives in RAID arrays meant for databases or other transactional workloads that are sensitive to latency. Vendors suggest SMR drives are better suited for things like backup repositories, media storage, and archive systems.

Overall, while SMR drives may work for certain less demanding workloads, most enterprise vendors still recommend CMR drives for critical applications where performance consistency and low latency are paramount.

The Future of SMR

SMR technology continues to evolve and improve. According to Horizon Technology’s report (“How Shingled Magnetic Recording (SMR) Drives Up Data Center Capacity”), SMR is projected to enable hard drive capacities to increase at a rate of 30-40% annually over the next few years. This is a major jump compared to the historical annual growth rate of 15-20% for conventional magnetic recording drives.

Manufacturers are focused on improving SMR performance through advanced caching algorithms and interfaces like SATA Express. As the technology matures, SMR drives are expected to become commonplace across consumer and enterprise storage markets. Adoption will continue growing as SMR drives become more optimized for mixed workloads including hot data, archival data, and backups.

In the long run, industry experts predict SMR will be instrumental in pushing HDD capacities into the 50TB+ range in the next 5-10 years. While shortcomings exist today, SMR innovation will likely address concerns around latency, write speeds, and drive behavior over time. As SMR drives gain parity with CMR drives, high-capacity and low-cost SMR HDDs are poised to play an integral role in future mass-scale data storage.

The bottom line on SMR reliability

When reviewing the pros and cons of SMR drives, there are a few key takeaways to keep in mind:

  • SMR drives can deliver significant cost savings compared to CMR drives of the same capacity.

  • Performance, especially write speeds, can suffer with SMR when the drive’s cache is full.

  • SMR drives require TRIM and idle time to maintain performance, so they may not work well in some use cases.

  • There are some concerns around data recovery and drive failures, but the risks appear manageable.

  • Rebuilds and rewrites take longer on SMR, which is problematic for RAID.

Overall, SMR drives can be reliable options for things like archived data and backups. However, they aren’t recommended for more demanding applications with lots of writes, like OLTP databases or virtualization.

For home and SMB use, SMR drives are likely fine for bulk storage as long as you understand their limitations. But careful research and benchmarking is advised before deploying them in a business-critical environment.