Why do SSDs wear out?

SSDs have become increasingly popular in computing devices due to their performance advantages over traditional hard disk drives (HDDs). SSDs offer faster boot and load times, improved reliability, lower power consumption, and lighter weight (Huang, 2008; Toshiba, 2017). However, SSDs have a finite lifespan and will wear out after repeated write/erase cycles. The goal of this article is to examine the reasons why SSDs eventually wear out.

How Do SSDs Work?

SSDs, or solid-state drives, use NAND flash memory to store data, whereas HDDs, or hard disk drives, use magnetic disks. The key difference is that SSDs have no moving mechanical components, while HDDs use a spinning platter and a read/write head to access data on the disk’s constantly moving surface.

The absence of moving parts gives SSDs significant advantages in speed and performance. SSDs can access data almost instantly, with read speeds above 500 MB/s compared to 80-160 MB/s for consumer HDDs. This leads to much faster boot times for computers and programs. SSDs are generally more reliable as well, since they are not susceptible to damage or degraded performance from mechanical shocks. Overall, the solid-state design makes SSDs better suited for laptops and other mobile devices.

However, HDDs still have advantages in cost and storage capacity. HDDs remain significantly cheaper per gigabyte compared to SSDs. HDDs are also available in much higher maximum capacities right now, with consumer models up to 18 TB compared to around 4 TB for consumer SSDs. For large backups or media storage, HDDs may still be preferable if capacity and cost are the primary concerns.

Why Does NAND Flash Memory Wear Out?

NAND flash memory wears out due to the physics of how it stores data. NAND flash stores data in an array of floating gate transistors. To write data, electrons are injected onto the floating gate, changing the threshold voltage required to detect a binary 1 or 0 state (https://www.dell.com/support/kbdoc/en-us/000137999/hard-drive-why-do-solid-state-devices-ssd-wear-out). To erase data, the floating gate needs to discharge any stored electrons. This discharge process damages the thin oxide layer that isolates the floating gate, wearing it away over time.

The problem is that this oxide layer can only endure a finite number of discharge cycles before becoming so damaged that the cell is unusable. Vendors typically specify NAND flash as capable of 10,000 to 1 million write/erase cycles per cell. But in reality, the wear on cells is uneven. Some may wear out much sooner from write amplification, garbage collection, and other factors (https://www.quora.com/How-and-why-does-flash-memory-wear-out). So SSDs employ various techniques to spread wear evenly across all cells.

Write Amplification

Write amplification is a phenomenon that happens when writing data to NAND flash storage, whereby the original amount of data intended to be written is amplified, meaning more data is actually written to the storage compared to what was requested. This amplification increases the number of writes on the NAND flash memory, wearing it out faster [1].

Write amplification occurs because of the way NAND flash memory works. NAND flash can only write to empty blocks, so if data needs to be updated, the SSD controller has to copy the old data to a new block, erase the old block to make it empty, then write the updated data to another new block. This process of read, erase, modify, write leads to data being rewritten multiple times, amplifying the number of writes [2]. The more write amplification that occurs, the more it wears down the NAND flash cells through unnecessary writes.

Garbage Collection

Garbage collection is an important process that occurs in SSDs to reclaim blocks that contain invalid data so they can be reused for new writes. SSDs write data in pages to blocks, but when data is deleted or overwritten, those pages become invalid. Garbage collection consolidates the valid pages so full blocks can be erased and rewritten (TechTarget, 2022). This process contributes to write amplification because data has to be moved around, increasing the writes to the flash memory and decreasing endurance.

Garbage collection is necessary for the following reasons:

  • Helps reclaim unused blocks: By consolidating valid pages, it frees up full blocks to be erased and rewritten.
  • Maintains free space for new writes: Without garbage collection, the drive would eventually run out of free blocks.
  • Allows pages to be rewritten: NAND flash can only be written to a limited number of times before that block wears out. Garbage collection erases blocks so they can be reused.

Frequent garbage collection passes will contribute to write amplification and wear out the drive faster. Therefore, optimizing garbage collection is important to improve endurance (TechTarget, 2022).

Sources:

TechTarget. (2022). What is solid-state storage garbage collection? https://www.techtarget.com/searchstorage/definition/solid-state-storage-SSS-garbage-collection

TRIM

TRIM is a command in SSDs that allows the operating system to notify the SSD which blocks of data are no longer in use and can be wiped internally. When data is deleted on an SSD, the SSD just marks the blocks as invalid but doesn’t actually erase them until they need to be rewritten. This helps reduce write amplification because without TRIM, those invalid blocks would be moved during garbage collection, causing additional writes.[1]

TRIM enables the SSD to proactively erase these invalid blocks before garbage collection needs to move them around, reducing unnecessary writes. However, even with TRIM enabled, continuous TRIM passes can still wear out the drive over time. The controller can only process so many simultaneous TRIM commands, so constant small deletes may not all get TRIMmed right away.[2]

Wear Leveling

Wear leveling refers to processes and algorithms that SSD controllers use to spread writes across all the cells in the NAND flash memory. This helps extend the lifespan of the SSD by avoiding premature wear-out of frequently written cells.

As explained earlier, NAND flash memory can only withstand a finite number of erase/write cycles before wearing out. SSD controllers use wear leveling techniques to ensure that writes are distributed evenly across all memory blocks/cells over time. This prevents any single block from wearing out prematurely due to excessive writes.

There are different types of wear leveling algorithms. Static wear leveling simply directs writes to the least worn blocks. Dynamic wear leveling tracks erase counts for each block and actively re-distributes data across the SSD to maintain an even wear level. More advanced algorithms also take into account differences in data write sizes and frequencies when leveling wear.

By spreading writes across more NAND flash blocks, wear leveling significantly increases the endurance and lifespan of SSDs. According to sources, it can extend the lifetime by a factor of 10-20x for enterprise SSDs. Consumer SSDs can benefit even more from wear leveling due to their lower write endurance ratings.

Controller Errors

One of the primary components of an SSD is the controller chip. The controller manages all of the storage and retrieval operations on the NAND flash memory. Over time, controllers can develop errors that lead to SSD failure and data loss.

Controllers have on-board RAM that stores metadata about the data location and file structure. If this RAM becomes corrupted, the SSD may become unstable or unresponsive. The controller firmware can also have bugs that appear over time and cause read/write issues.

Excessive heat is one of the main causes of controller errors. High temperatures can degrade controller components and generate more errors over time. Electrical problems like power surges can also damage controller hardware. The controller is complex silicon, so any manufacturing defects may not appear until the drive has been in use for a while.

If the controller develops uncorrectable errors during normal use, the SSD will become unusable in most cases. Data recovery services may be able to repair and retrieve data from a failed controller, but this process is expensive.

Proper cooling, surge protection, and monitoring tools can help minimize controller errors. But SSD controllers will eventually fail after prolonged use, contributing to the limited lifespan of SSDs.

(Sources: https://datarecovery.com/rd/common-causes-ssd-failure/, https://www.techtarget.com/searchstorage/tip/4-causes-of-SSD-failure-and-how-to-deal-with-them)

Thermal Throttling

Heat is one of the main factors that can degrade NAND flash memory over time. As an SSD heats up through sustained use, the voltage in the NAND cells can fluctuate, leading to potential data errors or loss of data retention. According to Akcp, higher temperatures accelerate the breakdown of the insulating oxide layer in NAND flash memory, resulting in more leakage of electrons over time 1.

To protect the NAND flash from permanent damage due to overheating, SSD controllers implement a thermal throttling mechanism. As Transcend explains, when the SSD temperature reaches a certain threshold, the controller will dynamically throttle down the SSD’s performance to reduce power draw and allow the drive to cool back down 2. This prevents the SSD from overheating to the point where data loss or failure could occur. The reduced performance allows the SSD to self-cool while prolonging the lifespan of the NAND flash.

According to TechTarget, thermal throttling has become a standard feature in modern SSDs to maintain safe operating temperatures. By keeping temperatures in check, SSDs can avoid potential issues like premature wear, lost data, and other temperature-related failures 3. While throttling can temporarily impact performance, enabling the SSD to cool is crucial for long-term data retention and reliability.

Conclusion

In summary, SSDs wear out over time due to inherent limitations of NAND flash memory technology. Specifically, factors like write amplification, garbage collection, and errors in the SSD controller all contribute to faster deterioration of SSDs compared to traditional hard drives. SSD lifespan ultimately depends on usage patterns and following best practices around provisioning, TRIM, and wear leveling. However, even with careful use SSDs will eventually wear out as the erase/program cycles physically degrade the NAND flash cells. The good news is that for average users SSDs should reliably last several years of typical usage before performance begins to suffer. Overall, the dramatically faster speeds of SSDs compared to HDDs make them well worth the tradeoff of a finite lifespan.