Why does Windows need to discover files before deleting?

When a user or application requests to delete a file in Windows, it initiates a multi-step process to safely remove the file from the file system. This process involves first discovering information about the target file before actually deleting it. The discovery phase is crucial because it allows Windows to check if the file is currently in use, ensure integrity of the file system, and prepare for the space occupied by the file to be reclaimed.

The file deletion process in Windows involves multiple phases. First, it must discover information about the file such as whether the file is currently open by another application or if there are any locks or leases associated with the file. This discovery phase ensures that deleting the file will not corrupt the file system or other applications using that file. Only after the discovery phase is complete can Windows safely delete the file.

File System Basics

A file system is responsible for organizing and storing files on a drive. The two most common file systems used by Windows are NTFS (New Technology File System) and FAT32 (File Allocation Table 32):

NTFS is the default file system for modern versions of Windows. It supports larger partition sizes up to 256 TB and larger file sizes up to 16 TB. NTFS uses advanced data structures to improve performance, reliability, and disk space utilization. It provides security features like access control lists and encryption.

FAT32 is an older file system supported across many operating systems. It supports partition sizes up to 2 TB and file sizes up to 4 GB. FAT32 uses a simple table to locate files and folders on the drive. It lacks many advanced features of NTFS but provides compatibility with other devices (source).

At the core, a file system organizes a drive into clusters which are then chained together to store file contents. The file system maintains metadata to map file and folder names to their data clusters on disk. This allows the operating system to locate and retrieve file contents when needed.

File Table and Directories

The file table is a central data structure in file systems that maps file names to their associated metadata and location on disk. Each file entry in the file table contains information such as the file name, file type, physical location on disk, size, creation/access/modification times, permissions, and other attributes.

Directories are used to organize the hierarchical tree structure of the file system. Each directory contains a list of file names and sub-directories, mapping to entries in the file table. The first directory in the file system hierarchy is called the root directory. Sub-directories can contain more files and folders, allowing an arbitrary nesting of directories and files into a coherent tree structure.

The file table allows quick lookup of files by name while directories maintain the logical organization of the file system. The separation of file metadata from the directory hierarchy allows files to be easily moved or renamed without having to update every directory entry. Overall, the file table and directory structure work together to enable efficient file storage and retrieval.

File Metadata

Every file stored on an NTFS file system contains metadata that describes attributes about the file. This includes information like the file name, size, creation/modification timestamps, access control lists, and file permissions [1]. Some key pieces of metadata that are relevant when deleting files include:

  • File size – The number of bytes the file occupies on disk.
  • Creation/modification timestamps – The date and time the file was created and last modified.
  • Access control lists – Specifies which users/groups have permission to access the file.
  • File permissions – Attributes like read-only, hidden, archive, etc.

This metadata is stored in the file record and directories that reference the file. When deleting a file, the file system needs to update and remove this metadata during the delete operation.

File Open Counts

When a file is open by a process, the operating system assigns a file handle to that file. This file handle allows the process to access and manipulate the file. Windows has a theoretical limit of 2^32 file handles per process due to the 32-bit architecture. However, in practice the limit tends to be much lower for stability reasons.

A single file can be opened by multiple processes simultaneously. Each process accessing the file will have its own file handle assigned by the operating system. To track how many processes have a file open, Windows maintains an open file count for each file. This count is incremented when a new handle is assigned and decremented when a process closes the file.

The open file count is important for delete operations. Windows will not delete a file until its open count reaches zero, indicating no processes are still accessing it. This prevents processes from inadvertently accessing deleted files. The open count ensures file access stability.

Locks and Leases

To prevent files from being corrupted when multiple users or processes attempt to write to the same file concurrently, operating systems implement locks and leases as file locking mechanisms. When a file is opened, a lock can be placed on it. There are two main types of locks:

Shared locks allow multiple users to open the file for reading simultaneously, but not for writing. These are also called read locks. Exclusive locks prevent all other users from opening the file until the lock is released. These are also called write locks.

On Windows, file locking is implemented through Server Message Block (SMB) and the concept of “opportunistic locks” or “oplocks.” Oplocks function as short-term leases that allow a client temporary exclusive access to a file to improve performance. Leases allow clients to cache files locally. When another client attempts to open the file, the lease is transparently recalled or revoked by the server. This prevents multiple clients from caching different versions of the file and corrupting it.[1]

In summary, shared locks allow concurrent reads but prevent writes, while exclusive locks prevent any access until released. Leases temporarily mimic exclusive locks to improve performance but are transparently revoked by the server when needed to maintain file integrity.

[1] “Overview of file sharing using the SMB 3 protocol in Windows Server.” *Microsoft Docs*. https://learn.microsoft.com/en-us/windows-server/storage/file-server/file-server-smb-overview

Delete Pending State

When a file is deleted in Windows, the file system does not immediately remove the file from disk. Instead, it marks the file as “pending delete” in the file system metadata [1]. The file still exists on disk in its original location and remains discoverable until all open handles to the file are closed [2].

This pending delete state allows any processes that still have the file open to continue accessing it normally. The file system will not actually delete the file contents and release the disk space until all outstanding open handles are closed. This prevents processes from unexpectedly losing access to a file that they still need.

Therefore, the pending delete status is a key reason that Windows continues to discover deleted files until garbage collection occurs. The discovery process must check pending delete files to see if handles remain open, and only remove them when they are fully unused.

Garbage Collection

Garbage collection is a background process in Windows that cleans up files marked as “pending delete.” When a file is deleted in Windows, it is not immediately removed from disk. Instead, it is marked as pending delete in the file system metadata. The actual contents of the file remain on disk until garbage collection clears it out.

Garbage collection will only permanently delete a file marked as pending delete if there are no open handles to that file. As long as any process has the file open, garbage collection cannot delete it. This prevents files from disappearing unexpectedly while a program is still using them. It also ensures file content is not deleted before pending writes are completed. Once all handles are closed, the pending delete file can be permanently deleted.

Running garbage collection periodically ensures disk space is recovered in a timely manner after deletes. It also scrubs orphaned metadata leftover from aborted creates and deletes. Garbage collection is a key process that keeps the file system tidy and enables safe file deletion.[1]

[1] https://learn.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/filestream-and-filetable-sp-filestream-force-garbage-collection?view=sql-server-ver16

Why Discovery is Needed

The file discovery process that Windows performs before deletion is crucial to avoid potential problems like data loss or corruption. As explained in previous sections, the file system contains complex interdependencies between file tables, metadata, open handles, locks, and more. Deleting a file immediately can lead to orphaned structures or corrupted references if the file is still in use by a process [1]. The discovery phase allows the operating system to cleanly finalize all operations involving the file, decrement open handle counts, release any locks, and unlink any directory entries before deletion [2]. This prevents deletion from interfering with active processes or leaving remnants behind in the file system. Overall, the upfront cost of discovery is worthwhile to ensure integrity of file system data structures and avoid potential corruption issues down the line.

Conclusion

In summary, there are several key reasons why the file discovery phase is essential before Windows deletes files:

The discovery process allows Windows to locate all traces of a file in the file system and closure open handles before deletion. This prevents errors or crashes if a program still has a file open. Discovery also logs information about each file’s metadata and location before deletion. This provides stability and recoverability if something goes wrong.

By taking time to discover files slated for deletion, Windows can cleanly prepare the file system and free up space. Rushing this process risks data loss or corruption. The discovery phase gives Windows a comprehensive view of each file so deletions happen smoothly. While it may seem slow, the discovery process ultimately allows Windows to maintain the integrity and reliability users expect.