File system fragmentation: Difference between revisions

Content deleted Content added
Additional citation, add format= attributes
More on delayed allocation
Line 1:
{{wip}}
 
In computing, '''[[file system]] [[fragmentation (computer)|fragmentation]]''', sometimes called '''file system aging''', is the inability of a file system to lay out related data sequentially (contiguously), an inherent phenomena in [[computer storage|storage]]-backed file systems that allow in-place modification of their contents. It is a special case of [[fragmentation (computer)#Data fragmentation|data fragmentation]].
 
==Why fragmentation matters==
File system fragmentation is projected to become more problematic with newer hardware due to the increasing disparity between sequential access speed and [[rotational delay]] (and to a lesser extent [[seek time]]), of consumer-grade [[hard disk]]s,<ref name=seagate-future>{{cite conference |author=Dr. Mark H. Kryder |publisher=[[Seagate Technology]] |date=2006-04-03 |title=Future Storage Technologies: A Look Beyond the Horizon |booktitle=Storage Networking World conference |url=http://www.snwusa.com/documents/presentations-s06/MarkKryder.pdf |format=[[PDF]] |accessdate=2006-12-14 }}</ref> which file systems are usually placed on. Thus, fragmentation is a important problem in recent file system research and design. The containment of fragmentation not only depends on the on-disk format of the file system, but also heavily on its implementation.<ref name=mcvoy-extent>{{cite conference |author=L. W. McVoy, S. R. Kleiman |date=1991 winter |title=Extent-like Performance from a UNIX File System |booktitle=Proceedings of [[USENIX]] winter '91 |pages=pages&nbsp;33&ndash;43 |___location=Dallas, Texas |publisher=[[Sun Microsystems, Inc.]] |url=http://www.cis.upenn.edu/~bcpierce/courses/dd/papers/mcvoy-extent.ps |format=[[PostScript]] |accessdate=2006-12-14 }}</ref>
 
In simple file system [[benchmark (computing)|benchmark]]s, the fragmentation factor is often omitted, as realistic aging and fragmentation is difficult to model. Rather, for simplicity of comparison, file system benchmarks are often ran on empty file systems, and unsuprisingly, the results may vary heavily from real life access patterns.<ref name=workload-benchmarks>{{cite paper |author=Keith Arnold Smith |date=2001-01 |title=Workload-Specific File System Benchmarks |publisher=[[Harvard University]] |url=http://www.eecs.harvard.edu/vino/fs-perf/papers/keith_a_smith_thesis.pdf |format=[[PDF]] |accessdate=2006-12-14 }}</ref>
 
==Types of fragmentation==
File system fragmentation may occur on several levels:
* Fragmentation within individual [[computer file|file]]s and their [[metadata]].
Line 12 ⟶ 14:
* The decrease of [[locality of reference]] between separate, but related files.
 
==Types of fragmentation==
====File fragmentation====
Individual file fragmentation occurs when a single file has been broken into multiple pieces (called [[extent]]s on extent-based file systems). While disk file systems attempt to keep individual files contiguous, this is not often possible without significant performance penalties.
Line 20 ⟶ 21:
 
====Related file fragmentation====
Related file fragmentation refers to the lack of [[locality of reference]] ofbetween related files. Unlike the previous two types of fragmentation, related file fragmentation is a much more vague concept, as it heavily depends on the access pattern of specific applications. This also makes objectively measuring or estimating it very difficult. However, arguably, it is the most critical type of fragmentation, as studies have found that the most frequently accessed files tend to be small compared to available disk throughput per second.<ref name=filesys-contents>{{cite journal |author=John R. Douceur, William J. Bolosky |date=1999-06 |title=A Large-Scale Study of File-System Contents |publisher=[[Microsoft Research]] |pages=pages&nbsp;59&ndash;70 |journal=[[ACM SIGMETRICS]] Performance Evaluation Review |volume=volume&nbsp;27 |issue=issue&nbsp;1 |url=http://research.microsoft.com/~bolosky/papers/SIGMETRICS99/filesys.pdf |format=[[PDF]] |issn=0163-5999 |accessdate=2006-12-14 }}</ref>
 
==Techniques for mitigating fragmentation==
Line 28 ⟶ 29:
Proactive techniques attempt to keep fragmentation at a minimum at the time data is being written on the disk. The simplest of such is, perhaps, appending data to an existing fragment in place where possible, instead of allocating new blocks to a new fragment.
 
Most today's file systems attempt to preallocate longer chunks, foror chunks from different free space fragments, to files that are actively appended to. This mainly avoids file fragmentation when several files are appended to concurrently, avoiding them from becoming excessively intertwined.<ref name=mcvoy-extent/>
 
A relatively recent technique is [[delayed allocation]] in [[XFS]] and [[ZFS]], the same technique is also called allocate-on-flush in [[reiser4]] and [[ext4]]. This means that when the file system is being written to, file system blocks are reserved, but their locations are not laid down yet. Later, when the file system is forced to flush as a result of memory pressure or a transaction commit, the allocator will have much better knowledge of the files' characteristics. Most file systems with this approach try to flush files in a single directory contiguously. Assuming that multiple reads from a single directory are common, locality of reference is improved.<ref name=xfs-scalability>{{cite conference |author=Adam Sweeney, Doug Doucette, Wei Hu, Curtis Anderson, Mike Nishimoto, Geoff Peck |date=1996-01 |title=Scalability in the XFS File System |publisher=[[Silicon Graphics]] |booktitle=Proceedings of the USENIX 1996 Annual Technical Conference |___location=San Diego, California |url=http://www.soe.ucsc.edu/~sbrandt/290S/xfs.pdf |format=[[PDF]] |accessdate=2006-12-14 }}</ref> Reiser4 also orders the layout of files according to the directory [[hash table]], so that when files are being accessed in file system order (as dictated by [[readdir]]), they are always accessed sequentially.<ref name=reiser4-google>{{cite web |author=Hans Reiser |date=2006-02-06 |title=The Reiser4 Filesystem |work=A lecture given by the author or Reiser4, Hans Reiser |url=http://video.google.com/videoplay?docid=6866770590245111825&q=reiser4 |format=[[Google Video]] |accessdate=2006-12-14 }}</ref>
A recent technique is [[allocate-on-flush]] in [[XFS]] and [[ZFS]], also called delayed allocation in [[reiser4]] and [[ext4]].
 
====Retroactive techniques====
Retroactive techniques attempt to reduce fragmentation, or the effect of fragmentation, after it has occurred. Many file systems provide [[defragmentation]] tools, which attempt to reorder fragments of files, and often also increase [[locality of reference]] by keeping smaller files in [[directory (file systems)|directories]], or directory trees, close to each another on the disk. Some file systems, such as [[HFS Plus]], exploitutilize idle time to defragment data on the disk in the background.
 
==See also==
Line 41 ⟶ 42:
==References==
<references />
 
<!-- http://www.eecs.harvard.edu/~keith/papers/tr94.ps.gz — Keith Smith, Margo Seltzer, File Layout and File System Performance -->
 
{{compu-storage-stub}}