What is the controversy with SMR drives?

Shingled Magnetic Recording (SMR) is a relatively new hard drive technology that allows drive manufacturers to increase storage density and capacity. SMR overlays (or “shingles”) tracks of data on each other like roof shingles, allowing more tracks to fit on each disk platter. This enables SMR drives to offer more storage space than conventional drives at a lower cost per gigabyte.

However, SMR drives have also been the subject of controversy, especially regarding their performance characteristics and use in certain applications like network-attached storage (NAS) devices and RAID arrays. Some key questions have been raised about SMR drives:

Are SMR drives significantly slower than conventional hard drives?

Yes, SMR drives can be much slower than conventional drives in certain scenarios. This is because overwriting existing data on an SMR drive requires rewriting entire “shingles” rather than just the target track. This process called a “read-modify-write” is far slower than directly overwriting a track like on a standard drive.

Performance is especially poor in random write workloads, which require constant rewriting of shingles. Sequential writes are less impacted. So SMR drives are okay for sequential data like multimedia, but poor for transactional/database workloads requiring random access.

Do SMR drives require special firmware and software optimizations to work correctly?

Yes. To manage the complexities of the shingled write process, SMR drives require specialized firmware called a “media cache” which queues up writes in cache before committing sequentially to the disk. The host system also needs SMR-aware software to ensure writes are sent in sequential order.

Without proper SMR optimizations, the host might attempt out-of-order writes, causing the SMR drive to constantly pause and re-cache data. This results in terrible performance. So SMR drives need careful handling to work as intended.

Are SMR drives inappropriate for RAID, NAS, and other critical applications?

Potentially yes. The performance challenges and special handling of SMR drives makes them poorly suited for applications that expect responsive random access. A NAS appliance expecting snappy access times for many concurrent users could be crippled by SMR drives in a RAID configuration.

Some NAS and RAID vendors explicitly do not support or recommend SMR drives for these reasons. However, SMR may be acceptable in single-drive external storage for backup, media, etc. where streamlined sequential access is preferred. But SMR should be evaluated carefully for reliability and performance impact before using in critical systems.

Have some SMR drives been deceptively marketed without fully disclosing their characteristics?

Unfortunately yes. Some hard drive vendors have released SMR drives without clearly indicating their technology or performance differences from standard drives. Some were even included in NAS products where SMR is unsuitable.

This deceptive marketing spurred backlash once the issues with SMR in those applications came to light. Vendors should be upfront about the intended use cases for SMR drives and warn against potential issues stemming from their design. Transparency is needed so consumers can make informed purchasing decisions.

SMR Technology In-Depth

To fully understand the controversy around SMR drives, let’s dive deeper into the technical details of how this technology works and why it can cause problems in certain situations.

How SMR Overlaps Tracks Like Shingles on a Roof

At its core, SMR increases drive density by overlapping the tracks on a platter like shingles on a roof, allowing more tracks per platter. This differs from a conventional drive where tracks are written fully separate from each other in parallel concentric circles across the platter.

With SMR, a “band” of shingled tracks are written sequentially in passes. Bands are then separated by thin regions with conventional parallel tracks acting as boundaries. This allows bands to be managed independently.

Within each band, a new track overwrite part of the previously written track. So tracks are basically “laid on top” of each other rather than fully side by side. This overlapping shingling effect provides the density improvements.

Challenges of Random Writes on SMR Drives

The overlapping tracks cause issues when it comes to random writes. On a standard drive, a random write simply overwrites the target track and does not disturb adjacent tracks.

But on an SMR drive, a random write must first read an entire band into cache, update the targeted portion of that data, then write the full band back out in sequential order. This “read-modify-write” process results in far more work than a standard drive write.

The constant need to rewrite entire bands is what causes SMR drives to perform so poorly for random write workloads. The sequential layout optimized for streaming data is disrupted.

Media Cache and Host Software Optimizations Are Required

To compensate for the random write challenges, SMR drives utilize a media cache to queue up incoming writes in sequential order before writing to disk. This helps absorb and optimize random writes into a sequential stream.

However, for caching to work properly, the host system software also needs to be SMR-aware. The OS and applications must ensure writes are sent to the SMR drive sequentially. Out-of-order writes will constantly thrash the cache as it tries to maintain a sequential stream.

If the software is not SMR optimized, the host might attempt cached writes before earlier ones complete. So both intelligent disk caching algorithms and host software coordination are critical to making SMR viable.

Drive-Managed SMR (DM-SMR) vs. Host-Managed SMR (HM-SMR)

There are two main implementations of SMR technology:

  • Drive-Managed SMR (DM-SMR) – The disk’s internal firmware completely manages the media cache and write optimization. The host system is unaware and sees what looks like a standard drive.
  • Host-Managed SMR (HM-SMR) – The host operating system coordinates writes to keep them sequential. This may require redesigning filesystems and changing drive access patterns.

DM-SMR is simpler but can hurt performance if the host makes lots of random writes that thrash the cache. HM-SMR requires OS changes but avoids cache thrashing issues. Most SMR drives today use DM-SMR.

SMR Performance Characteristics

Now that we understand how SMR works internally, let’s look specifically at how its performance differs from standard hard drives:

Sequential Read Performance

Sequential read performance is similar to standard drives. Reading sequential data from the densely packed shingled tracks performs comparably to a conventional platter layout. Streaming large files like videos works well.

Sequential Write Performance

Sequential writes are also comparable, and sometimes even faster than conventional drives thanks to the higher data density. Writing large amounts of contiguous data in a stream can be optimized by the sequential nature of SMR.

Random Read Performance

Random reads are much slower than sequential, but similar to standard drives. The head still needs to move around randomly to target tracks. The overlapping tracks don’t help or hinder much here.

Random Write Performance

As highlighted earlier, random writes see a huge impact – often 10-100x slower than a standard drive. This requires rewriting entire bands for each bit of updated data. This makes SMR terrible for transactional workloads.

Mixed Sequential and Random Access

Performance will be bottlenecked by the random writes. The frequent need to read-modify-write bands will limit overall throughput. Applications must be optimized to batch random writes and turn them into large sequential writes wherever possible.

Multi-User Access

SMR will struggle with concurrent access due to contention and the inability to isolate bandwidth. Multiple users making random accesses on consumer NAS and RAID configurations with SMR drives often results in glacial speeds.

RAID Configurations

RAID exacerbates the issues of SMR – striping combines the shingled bands across drives. So a random write on one drive causes multiple drives to read-modify-write bands. Performance can be unusable in many RAID setups with SMR.

Ideal and Poor Use Cases for SMR Hard Drives

Given the nuanced performance characteristics, SMR drives are well-suited for some uses but terrible for others:

Good Use Cases

  • Single-drive external storage for backups, movies, photos, music, etc.
  • Cold storage / archival storage where data is infrequently overwritten
  • Cheap dense space for storing downloaded media like games and video
  • Scratch disks for handling sequential operations like video processing

SMR is cost-effective for storing large amounts of streamed/archived data where high throughput and low latency are not critical.

Poor Use Cases

  • Database servers
  • Virtual machine hosting
  • RAID arrays
  • Network-attached storage (NAS)
  • Transactional workloads
  • Frequently overwritten data
  • OS/boot drives

Performance and even data integrity can suffer severely when using SMR in applications requiring efficient random access.

The Controversy Around SMR Drives

Now that we fully grasp SMR technology and where it does (or doesn’t) make sense, let’s discuss the controversial issues that have arose:

SMR Drives Often Not Clearly Labeled as Such

One of the biggest complaints is that consumers cannot easily identify if a drive uses SMR. Retail packaging and spec sheets may lack any mention of SMR. The technical details are opaque to average buyers.

This forces expert investigation and confirmation to determine if a drive is SMR-based. Lack of clear labeling prevents buyers from making informed decisions about suitability for different use cases.

SMR Drives Used in Inappropriate Scenarios

Major NAS and server vendors have shipped devices using SMR drives without disclosing their use and performance implications. These devices end up crippled by terrible speeds due to the mismatch between SMR limitations and workload demands.

For example, a RAID NAS expects efficient random writes. SMR drives in RAID provide the opposite. But vendors have not always been upfront about using SMR drives in devices and configurations where they are inappropriate.

Dropped Support Due to SMR Issues

Some vendors have revoked support for SMR drives even if they were previously validated, due to overwhelming user problems caused by applying SMR in unsuitable scenarios.

WD previously had some SMR drives on compatibility lists, but pulled support once major issues came to light in RAID/NAS deployments. Vendors have lost faith in SMR’s viability in heavy workloads.

Lack of Transparency Around SMR Pitfalls

Drive makers have not been fully honest and transparent about the severe performance and usage limitations inherent in SMR. Marketing does not always differentiate between SMR and conventional drives.

More transparency and more visible indicators of SMR usage are needed. Vendors should clarify that SMR may be unfit for loads like multi-user NAS/RAID and mixed random access – the types of use buyers expect from off-the-shelf drives.

No Clear Indicator of DM-SMR vs. HM-SMR

As mentioned earlier, DM-SMR is more problematic when the OS/applications are “SMR unaware”. HM-SMR requires host coordination but avoids cache thrashing.

But it’s unclear to buyers which implementation a drive may be using. This lack of clarity makes it hard to gauge real-world performance and compatibility – a possible DM-SMR drive could perform far worse than users anticipate.

Recommendations for SMR Viability

For SMR drives to gain acceptance and trust from users, a few recommendations could help improve their viability and transparency:

Require Clear Labelling of SMR Usage on Packaging/Marketing

Make it obvious upfront that a drive uses SMR rather than conventional recording. Don’t hide it in tiny print or technical jargon. Average buyers should know this detail before purchasing.

Note Compatibility Limitations on Spec Sheets

Spec sheets should note that SMR may not be appropriate for multi-user NAS/RAID configurations and random-write-heavy workloads. Warn buyers proactively of potential pitfalls.

Disclose DM-SMR vs. HM-SMR Implementations

Listing whether drive-managed or host-managed SMR is in use sets expectations correctly around real-world performance and compatibility nuances.

Accurately Represent Performance in Benchmarks

Many benchmarks only test sequential perf, not realistic random write loads. Publish benchmarks covering both sequential and random access to convey a full picture.

Encourage Development of Improved Caching and Host Software

Better drive-side caching algorithms and optimized host filesystems/software can help realize the intended benefits of SMR while minimizing drawbacks under real workloads.

Conclusion

SMR technology can offer higher capacities for cheaper bulk storage. But deceptive marketing and lack of transparency around significant performance limitations have damaged confidence in this new recording method.

More clarity around proper use cases from vendors along with optimized caching and software can help SMR mature into a viable option for large-scale sequential storage needs. But for now, caution is warranted when evaluating SMR, especially for performance-sensitive applications involving lots of random writes.