Content deleted Content added
Fixed a typo. Tags: Mobile edit Mobile web edit |
Guy Harris (talk | contribs) →File scattering: There's no "file sequence" page any more. |
||
(36 intermediate revisions by 33 users not shown) | |||
Line 1:
{{Short description|Condition where a segmented file system is used inefficiently}}
[[File:FragmentationDefragmentation.gif|thumb|Visualization of fragmentation and then of defragmentation]]In [[computing]], '''file system fragmentation''', sometimes called '''file system aging''', is the tendency of a [[file system]] to lay out the contents of [[Computer file|files]] non-continuously to allow in-place modification of their contents. It is a special case of [[fragmentation (computer)#Data fragmentation|data fragmentation]]. File system fragmentation
[[Solid-state drive]]s do not physically seek, so their non-sequential data access is hundreds of times faster than moving drives, making fragmentation less of an issue. It is recommended to not manually defragment solid-state storage, because this can prematurely wear drives via unnecessary write–erase operations.<ref>{{Cite news |last=Fisher |first=Ryan |date=2022-02-11 |title=Should I defrag my SSD? |language=en |work=PC Gamer |url=https://www.pcgamer.com/should-i-defrag-my-ssd/ |url-status=live |access-date=2022-04-26 |archive-url=https://web.archive.org/web/20220218151612/https://www.pcgamer.com/should-i-defrag-my-ssd/ |archive-date=2022-02-18}}</ref>
==Causes==
Line 9 ⟶ 10:
===Example===
[[File:File system fragmentation.svg|thumb|
If the file B is deleted, a second region of ten blocks of free space is created, and the disk becomes fragmented. The empty space is simply left there, marked as and available for later use, then used again as needed.{{efn|The practice of leaving the space occupied by deleted files largely undisturbed is why [[undeletion|undelete]] programs were able to work; they simply recovered the file whose name had been deleted from the directory, but whose contents were still on disk.}} The file system ''could'' defragment the disk immediately after a deletion, but doing so would incur a severe performance penalty at unpredictable times.
Line 26 ⟶ 27:
==Necessity==
{{Essay-like|section|date=June 2019}}
Some early file systems were unable to fragment files. One such example was the [[Acorn Computers|Acorn]] [[Disc Filing System|DFS]] file system used on the [[BBC Micro]]. Due to its inability to fragment files, the error message ''can't extend'' would at times appear, and the user would often be unable to save a file even if the disk had adequate space for it.▼
▲Some early file systems were unable to fragment files. One such example was the
DFS used a very simple disk structure and [[Computer file|files]] on [[Hard disk|disk]] were located only by their length and starting sector. This meant that all files had to exist as a continuous block of sectors and fragmentation was not possible. Using the example in the table above, the attempt to expand file F in step five would have failed on such a system with the ''can't extend'' error message. Regardless of how much free space might remain on the disk in total, it was not available to extend the data file.
Standards of [[error handling]] at the time were primitive and in any case programs squeezed into the limited memory of the BBC Micro could rarely afford to waste space attempting to handle errors gracefully. Instead, the user would find themselves dumped back at the command prompt with the ''Can't extend'' message and all the data which had yet to be appended to the file would be lost. The
==Types==
File system fragmentation may occur on several levels:
* Fragmentation within individual [[computer file|file]]s
* Free space fragmentation
* The decrease of [[locality of reference]] between separate, but related files
* Fragmentation within the data structures or special files reserved for the file system itself
===File fragmentation===
Line 46 ⟶ 49:
===File scattering===
File segmentation, also called related-file fragmentation, or application-level (file) fragmentation, refers to the lack of [[locality of reference]] (within the storing medium) between related files
▲File segmentation, also called related-file fragmentation, or application-level (file) fragmentation, refers to the lack of [[locality of reference]] (within the storing medium) between related files (see [[file sequence]] for more detail). Unlike the previous two types of fragmentation, file scattering is a much more vague concept, as it heavily depends on the access pattern of specific applications. This also makes objectively measuring or estimating it very difficult. However, arguably, it is the most critical type of fragmentation, as studies have found that the most frequently accessed files tend to be small compared to available disk throughput per second.<ref name="filesys-contents">{{cite journal | title=A Large-Scale Study of File-System Contents | publisher=[[Association for Computing Machinery]] | date=June 1999 | last=Douceur | first=John R. | last2=Bolosky | first2=William J. | journal=[[ACM SIGMETRICS]] Performance Evaluation Review | volume=27 | issue=1 | pages=59–70 | doi=10.1145/301464.301480}}</ref>
To avoid related file fragmentation and improve locality of reference (in this case called ''file contiguity''), assumptions or active observations about the operation of applications have to be made. A very frequent assumption made is that it is worthwhile to keep smaller files within a single [[file directory|directory]] together, and lay them out in the natural file system order. While it is often a reasonable assumption, it does not always hold. For example, an application might read several different files, perhaps in different directories, in exactly the same order they were written. Thus, a file system that simply orders all writes successively, might work faster for the given application.
===Data structure fragmentation===
The catalogs or indices used by a file system itself can also become fragmented over time, as the entries they contain are created, changed, or deleted. This is more of a concern when the volume contains a multitude of very small files than when a volume is filled with fewer larger files. Depending on the particular file system design, the files or regions containing that data may also become fragmented (as described above for 'regular' files), regardless of any fragmentation of the actual data records maintained within those files or regions.<ref name="ntfs-reserves-space-for-mft">{{cite web |title=How NTFS reserves space for its Master File Table (MFT) |url=https://learn.microsoft.com/en-us/troubleshoot/windows-server/backup-and-storage/ntfs-reserves-space-for-mft |website=learn.microsoft.com |publisher=Microsoft |access-date=22 October 2022 |language=en-us}}</ref>
For some file systems (such as [[NTFS]]{{efn|NTFS reserves 12.5% of the volume for the 'MFT zone', but ''only'' until that space is needed by other files. ''(i.e., if the volume ~ever~ becomes more than 87.5% full, an un-fragmented MFT can no longer be guaranteed.)''<ref name="ntfs-reserves-space-for-mft" />}} and [[Hierarchical File System (Apple)|HFS]]/[[HFS Plus]]<ref name="diskwarrior-hfs-hfsplus">{{cite web |title=DiskWarrior in Depth |url=https://www.alsoft.com/in-depth |website=Alsoft |access-date=22 October 2022}}</ref>), the [[collation]]/[[sorting]]/[[Data compaction|compaction]] needed to optimize this data cannot easily occur while the file system is in use.<ref name="windows-2000-defrag-performance">{{cite web |title=Maintaining Windows 2000 Peak Performance Through Defragmentation |url=https://learn.microsoft.com/en-us/previous-versions/windows/it-pro/windows-2000-server/bb742585(v=technet.10) |website=learn.microsoft.com |publisher=Microsoft |access-date=22 October 2022 |language=en-us}}</ref>
==Negative consequences==
File system fragmentation is more problematic with consumer-grade [[hard disk drive]]s because of the increasing disparity between [[sequential access]] speed and [[rotational latency]] (and to a lesser extent [[seek time]]) on which file systems are usually placed.<ref name="seagate-future">{{cite conference |first=Mark H. |last=Kryder |publisher=[[Seagate Technology]] |date=2006-04-03 |title=Future Storage Technologies: A Look Beyond the Horizon |conference=Storage Networking World conference |url=http://www.snwusa.com/documents/presentations-s06/MarkKryder.pdf |
In simple file system [[benchmark (computing)|benchmark]]s, the fragmentation factor is often omitted, as realistic aging and fragmentation is difficult to model. Rather, for simplicity of comparison, file system benchmarks are often run on empty file systems. Thus, the results may vary heavily from real-life access patterns.<ref name="workload-benchmarks">{{cite journal |first=Keith Arnold |last=Smith |date=January 2001 |title=Workload-Specific File System Benchmarks |publisher=[[Harvard University]] |___location=[[Cambridge, Massachusetts]] |url=http://www.eecs.harvard.edu/vino/fs-perf/papers/keith_a_smith_thesis.pdf |
==Mitigation==
Line 61 ⟶ 67:
===Preventing fragmentation===
Preemptive techniques attempt to keep fragmentation
Many of today's file systems attempt to
If the final size of a file subject to modification is known, storage for the entire file may be preallocated. For example, the [[Microsoft Windows]] [[swap file]] (page file) can be resized dynamically under normal operation, and therefore can become highly fragmented. This can be prevented by specifying a page file with the same minimum and maximum sizes, effectively preallocating the entire file.
[[BitTorrent (protocol)|BitTorrent]] and other [[peer-to-peer]] [[filesharing]] applications limit fragmentation by preallocating the full space needed for a file when initiating [[download]]s.<ref>{{cite journal |date=29 March 2009 |first=Jeffrey |last=Layton |title=From ext3 to ext4: An Interview with Theodore Ts'o |
A relatively recent technique is [[delayed allocation]] in [[XFS]], [[HFS+]]<ref>{{cite web |first=Amit |last=Singh |date=May 2004 |title=Fragmentation in HFS Plus Volumes |work=Mac OS X Internals |url=http://osxbook.com/software/hfsdebug/fragmentation.html |access-date=2009-10-27 |archive-date=2012-11-18 |archive-url=https://web.archive.org/web/20121118173110/http://osxbook.com/software/hfsdebug/fragmentation.html |url-status=dead }}</ref> and [[ZFS]]; the same technique is also called allocate-on-flush in [[reiser4]] and [[ext4]]. When the file system is being written to, file system blocks are reserved, but the locations of specific files are not laid down yet. Later, when the file system is forced to flush changes as a result of memory pressure or a transaction commit, the allocator will have much better knowledge of the files' characteristics. Most file systems with this approach try to flush files in a single directory contiguously. Assuming that multiple reads from a single directory are common, locality of reference is improved.<ref name=xfs-scalability>{{cite conference |first=Adam |last=Sweeney |first2=Doug |last2=Doucette |first3=Wei |last3=Hu |first4=Curtis |last4=Anderson |first5=Mike |last5=Nishimoto |first6=Geoff |last6=Peck |date=January 1996 |title=Scalability in the XFS File System |publisher=[[Silicon Graphics]] |
<!-- TODO: Cylinder groups and locality of reference; XFS allocation groups (are they actually relevant?) -->
Line 75 ⟶ 81:
{{Main|Defragmentation}}
Retroactive techniques attempt to reduce fragmentation, or the negative effects of fragmentation, after it has occurred. Many file systems provide [[defragmentation]] tools, which attempt to reorder fragments of files, and sometimes also decrease their scattering (i.e. improve their contiguity, or [[locality of reference]]) by keeping either smaller files in [[file directory|directories]], or directory trees, or even
The [[HFS Plus]] file system transparently defragments files that are less than 20 [[MiB]] in size and are broken into 8 or more fragments, when the file is being opened.<ref name=osx-intern>{{cite book |first=Amit |last=Singh |year=2007 |title=Mac OS X Internals: A Systems Approach |publisher=[[Addison Wesley]] |chapter=12 The HFS Plus File System |isbn=0321278542<!--Surprisingly, ISBN lookup on Google Books returns nothing. Hence, I supplied a URL.--> |chapter-url=https://books.google.com/books?id=UZ7AmAEACAAJ}}</ref>
The now obsolete Commodore Amiga [[Smart File System]] (SFS) defragmented itself while the filesystem was in use. The defragmentation process is almost completely stateless (apart from the ___location it is working on), so that it can be stopped and started instantly. During defragmentation data integrity is ensured for both metadata and normal data.
Line 94 ⟶ 100:
==Further reading==
{{refbegin}}
* {{cite thesis |url=
{{refend}}
|