Large Cluster Sizes Waste Disk Space

Cluster, cluster size, and slack

Large Cluster Sizes Waste Disk SpaceThe filesystem does not keep track of the disk space allocation (what space is used for which file and what is free) down to the byte level. That would require too much maintenance. Instead, the disk space divided into equal blocks called clusters. The filesystem then allocates disk space cluster-by-cluster.

This reduces the amount of overhead involved in tracking what is where, but also creates a side effect. If the file size is less than a cluster size, the filesystem still allocates a full cluster to store the file. The unused space in the cluster is thus wasted. For any file which size is not an integral multiple of a cluster size, there will be some unoccupied space in the last cluster.

This unoccupied space is called “slack” and it cannot be used for any other file. On average, half the cluster size is wasted for every file stored on the volume.

Typical volume holding a Windows Vista installation and also some installed applications contains approximately 150,000 files occupying about 150 GB. Given a cluster size of 4096 bytes, the amount of disk space lost is about 300 MB, or 0.2% overhead, which is negligible for all practical purposes.

Why Large Cluster Sizes Waste Disk Space?

Larger hard drive clusters waste disk space because data is saved to favor substantial data-access performance boosts by sacrificing minimal amounts of storage capacity. Clusters designate where file segments can start and end: any unused spaced in between is wasted. Slack can be measured by comparing file size and file size on disk — size on disk includes the unused space from cluster limitations.

Clusters and Sectors
Clusters divide the hard drive into manageable storage segments. They consist of several sectors, or the smallest addressable data segments on the storage device. Files can be spread over many clusters and whatever space is not used in the final cluster is left unused. Within the cluster system, large files do not have to occupy a contiguous section of the hard drive and can be split up over separate hard drive sections.

Slack
Slack is the unused space in a file’s final cluster. Slack works a lot like a partially used page at the end of a book chapter. If chapter 4 ends half-way through a page, chapter 5 may not start until the next page: used cluster data space works the same way. Each new file is the start of a new chapter and needs to start on a new page, so any unused space on the page can’t be used. Larger hard drive clusters increase the potential for unused space because the increased cluster size leaves more possible empty space if the file ends earlier in the cluster. If a file creates 3KB of slack when it uses 1KB of a 4KB cluster whereas it creates 7KB of slack if it only uses 1KB of space in a 8KB cluster.

Like Books
Hard drive data storage works a lot like a book. Individual letters are the smallest unit of information in a book while a computer uses binary digits. Letters usually don’t mean much on their own and are divided into words. Hard drives group binary digits into sectors, which are the smallest data unit that has any meaning. Hard drive clusters are like pages in a book: they store groups of sectors and words in a single, identifiable unit. Computer files are a lot like chapters in books: they contain a designated grouping of related content. Chapters can be of varying lengths and somehow need to be divided. Clusters on a hard drive work like page numbers on a book’s table of contents: they identify where content starts and ends.

Workload
Hard drive clusters alleviate the workload involved in locating saved data. Instead of storing all data in a continuous flow and monitoring each data bit for file starting and ending point, clusters break up the storage device in a way that reduces the total number of starting and ending points. Instead of scanning for a file through the entire hard drive, it uses cluster information stored in the file allocation table to go straight to the file. Using clusters is like using page numbers to find content in a book as opposed to finding it through word count.

What is the best cluster size to use?

Use the default setting for a cluster size. It provides the best results in almost all cases. In cases where the default value is not the same as the optimum value, you would not see the difference anyway. The ability to specify a cluster size is itself a legacy from the days of floppy drives.

On NTFS, it is not recommended to select a cluster size larger than 4096 bytes (4KB) for general use, because NTFS only supports compression for clusters up to 4096 bytes.