Log-structured file system: Difference between revisions

Content deleted Content added
Implementations: +{{debated}}
Minor grammar fixes
 
(136 intermediate revisions by 93 users not shown)
Line 1:
{{Short description|Structure of file system that writes all information to a circular buffer}}
:''This is {{about |the general concept of log-structured file systems. For |the NetBSD file system, see [[|Log-structured File System]].'' (BSD)|the Linux log-structured Flash file system|LogFS}}
 
A '''log-structured filesystem''' is a [[file system]] in which data and metadata are written sequentially to a [[circular buffer]], called a [[log file|log]]. The design was first proposed by [[John K. Ousterhout]] and Fred Douglis in 1988 and first implemented in 1992 by Ousterhout and [[Mendel Rosenblum]] for the Unix-like [[Sprite (operating system)|Sprite]] distributed operating system.<ref name="rblub">{{citation|title=The Design and Implementation of a Log-Structured File System|url=https://people.eecs.berkeley.edu/~brewer/cs262/LFS.pdf|publisher=University of California, Berkeley|year = 1991|first1 = Mendel Rosenblum.|last1 =John K. Ousterhout}}</ref>
{{expand}}
 
A '''log-structured filesystem''' is a [[file system]] design first proposed by [[John K. Ousterhout]] and [[Fred Douglis]]. It writes data to the file system in a sequential "log" format, as opposed to the normal scattered blocks format.
 
== Rationale ==
{{More citations needed|date=April 2025}}
Conventional file systems tend to lay out files on disk with great care for spatial locality and make in-place changes to their data structures on disk in order to perform well on optical and magnetic disks, which tend to seek relatively slowly.
 
The design of log-structured file systems is based on the hypothesis that this will no longer be effective because ever-increasing memory sizes on modern computers would lead to disk I/O becoming write-heavy because disksince reads would be almost always satisfied from memory cache. A log-structured file system thus treats its storage as a [[Circular buffer|circular log]] and writes sequentially to the head of the log.
Conventional file systems tend to lay out files on disk with great care for spatial locality and make in-place changes to data structures on disk in order to perform well on magnetic disks, which tend to seek relatively slowly.
 
This has several important side effects:
The design of log-structured file systems is based on the hypothesis that this will no longer be effective because ever-increasing memory sizes on modern computers would lead to disk I/O becoming write-heavy because disk reads would be almost always satisfied from memory cache.
* Write throughput on optical and magnetic disks is improved because they can be batched into large sequential runs and costly seeks are kept to a minimum.
** The structure is naturally suited to media with [[append-only]] zones or pages such as [[flash storage]]s and [[shingled magnetic recording]] HDDs<ref name=dbsmr>{{Cite web|url=https://dropbox.tech/infrastructure/extending-magic-pocket-innovation-with-the-first-petabyte-scale-smr-drive-deployment|title=Extending Magic Pocket Innovation with the first petabyte scale SMR drive deployment|author=Magic Pocket Hardware Engineering Teams|website=dropbox.tech}}</ref><ref name=flash>{{cite journal |last1=Reid |first1=Colin |last2=Bernstein |first2=Phil |title=Implementing an Append-Only Interface for Semiconductor Storage |journal=IEEE Data Eng. Bull. |date=1 January 2010 |volume=33 |page=14-20 |url=http://sites.computer.org/debull/A10dec/hyder.pdf}}</ref>
* Writes create multiple, chronologically-advancing versions of both file data and meta-data. Some implementations make these old file versions nameable and accessible, a feature sometimes called time-travel or [[snapshot (computer storage)|snapshotting]]. This is very similar to a [[versioning file system]].
* Recovery from crashes is simpler. Upon its next mount, the file system does not need to walk all its data structures to fix any inconsistencies, but can reconstruct its state from the last consistent point in the log.
 
Log-structured file systems, must reclaim free space from the tail of the log to prevent the file system from becoming full when the head of the log wraps around to meet it. The tail can release space and move forward by skipping over data where newer versions exist further ahead in the log. If there are no newer versions, then the data is moved and appended to the head.
To maximize write throughput, a log-structured file system treats the disk as a circular log and writes sequentially to the head of the log. This has the side effect of creating multiple, chronologically-advancing versions of both file data and meta-data. This log is undoable.
 
To reduce the overhead incurred by this [[garbage collection (computer science)|garbage collection]], most implementations avoid purely circular logs and divide up their storage into segments. The head of the log simply advances into non-adjacent segments which are already free. If space is needed, the least-full segments are reclaimed first. This decreases the I/O load (and decreases the [[write amplification]]) of the garbage collector, but becomes increasingly ineffective as the file system fills up and nears capacity.
Due to the seek and rotational latencies of disk access, write operations in traditional file systems are slow and do not fully utilize data bus capacities. With long sequential writes, log-structured file systems more fully utilize data bus capacities.
 
== Disadvantages ==
Such filesystems<ref name="rosenblum92">Rosenblum, Mendel and Ousterhout, John K. (February 1992) - "[http://www.hhhh.org/perseant/lfs/lfsSOSP91.ps.gz The Design and Implementation of a Log-Structured File System]". ''ACM Transactions on Computer Systems, Vol. 10 Issue 1''. pp26-52.</ref>:
The [[design rationale]] for log-structured file systems assumes that most reads will be optimized away by ever-enlarging memory caches. This assumption does not always hold:
* May allow access to old versions of files or the filesystem, a feature sometimes called time-travel or [[snapshot (computer storage)|snapshotting]].
* On magnetic media—where seeks are relatively expensive—the log structure may actually make reads much slower, since it [[fragmentation (computer)#External fragmentation|fragments]] files that conventional file systems normally keep contiguous with in-place writes.
* Recover quickly after crashes because consistency checks are needed only from the last consistent point in the log. (Called ''roll-forward'', this mechanism is explained on the [[Talk:Log-structured_file_system#Roll-backward_vs._Roll-forward|talk page]].)
* On flash memory—where seek times are usually negligible—the log structure may not confer a worthwhile performance gain because write fragmentation has much less of an impact on write throughput. Another issue is stacking one log on top of another log, which is not a very good idea as it forces multiple erases with unaligned access.<ref name="logonlog">{{citation|title=Don't stack your Log on my Log|url=https://www.usenix.org/system/files/conference/inflow14/inflow14-yang.pdf|publisher=SanDisk Corporation|year = 2014|first1 = Jingpei Yang|last1 =Swaminathan Sundararaman}}</ref> However, many flash based devices cannot rewrite part of a block, and they must first perform a (slow) erase cycle of each block before being able to re-write. By putting all the writes in one block, this can help performance as opposed to writes scattered into various blocks, and each one must be copied into a buffer, erased, and written back, which is a clear advantage for so-called "raw" flash memory where flash translation layer is bypassed.{{cn|date=June 2017}}
* Tend to have good write performance.
 
== ImplementationsSee also ==
* [[Comparison of file systems]]
* [[JFFS2]]List are bothof log-structured file systems.]]
 
== References ==
* [[John K. Ousterhout]] and [[Mendel Rosenblum]] implemented the first log-structured file system for the [[Sprite operating system]] in 1992.<ref name="rosenblum90">Rosenblum, Mendel and Ousterhout, John K. (June 1990) - "[http://www.hhhh.org/perseant/lfs/lfs-storage.ps.gz The LFS Storage Manager]". ''Proceedings of the 1990 Summer Usenix''. pp315-324.</ref>
{{reflist}}
* [[Log-structured File System|BSD-LFS]], an implementation by [[Margo Seltzer]] was added to 4.4BSD, and was later ported to [[386BSD]]. It lacks support for snapshots. It was removed from FreeBSD and OpenBSD, but still lives on in [[NetBSD]].
* [[Plan 9 from Bell Labs|Plan 9]]'s [[Fossil (file system)|Fossil]] file system is also log-structured and supports snapshotting.
* [[NILFS]] is a log-structured file system implementation for [[Linux]] by [[NTT/Verio]] which supports snapshots. It is still in [[development stage]] and not ready for production use.
* [[LogFS]] and [[LinLogFS]] are names used for various Linux log-structured file system implementations, the latest one written for [[Google Summer of Code 2005]], however all of these projects were cancelled.
* {{debated}}[[Reiser4]] is a log-structured file system, but it calls the concept "wandering logs". It has no support snapshotting, yet.
* [[ZFS]] from [[Sun Microsystems|Sun]] is a log-structured file system which supports snapshotting. However, it uses [[reference-counting]] instead of [[garbage collection (computer science)|garbage collection]] to manage free space.
* There is a log-structured file system in development at at Charles University, Prague. [http://aiya.ms.mff.cuni.cz/lfs/]It is available for recent versions of the Linux kernel. The major features of this project are a working garbage collector, snapshots and indexed directories.
 
== Further reading ==
Log-structured file systems have also been used on storage media like [[flash memory]] and [[CD-RW]] for entirely different reasons. These degrade slowly as they are written to and have a limited number of erase/write cycles:
* [http://pages.cs.wisc.edu/~remzi/OSTEP/file-lfs.pdf Log-structured File Systems (2014), Arpaci-Dusseau, Remzi H.; Arpaci-Dusseau, Andrea C.; Arpaci-Dusseau Books]
* [[Universal Disk Format|UDF]], and
* [[JFFS2]] are both log-structured file systems.
Compared to conventional file systems, these file systems use fewer in-place writes, improving [[wear levelling]] and prolonging the life of the device.
 
{{File systems}}
== Disadvantages of a log-structured file system ==
* It consists of a sequential log of different data files chained together. When you delete any file you will create holes in the log and to access this space later we need to have indirect addressing in place which adds overhead. This is solved in modern log-structured file systems by using a garbage collector that defragments the log.
* It is mainly aimed at improving the performance of write operation. On the other hand, read operations will get some added overhead. This was taken into account by [[John K. Ousterhout]] in the design, his reasoning being that modern computers have so much memory that most file system read operations can be served from cache. <!-- Can anyone make sense of this?: Since they [read operations] will require constructing the actual data chunk from the log even if you have access to high speed caches. -->
 
== References ==
<references />
 
[[Category:Computer file systems]]
[[Category:Computer storage]]
[[Category:Unix]]
[[Category:Linux]]
[[Category:Sun Microsystems]]
[[Category:Bell Labs]]
[[Category:Fault-tolerant computer systems]]