Log-structured file system: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 18:29, 15 June 2009 edit SmackBot (talk \| contribs) 3,734,324 edits m Date maintenance tags and general fixes ← Previous edit		Latest revision as of 21:50, 25 July 2025 edit undo Live4momentz (talk \| contribs) 132 edits Minor grammar fixes Tag: Visual edit
(75 intermediate revisions by 56 users not shown)
Line 1: {{~~Otheruses4~~Short description\|Structure of file system that writes all information to a circular buffer}} {{about\|the general concept of log-structured file systems\|the NetBSD file system\|Log-structured File System (BSD)\|the Linux log-structured Flash file system\|LogFS}} A '''log-structured filesystem''' is a [[file system]] in which data and metadata are written sequentially to a [[circular buffer]], called a [[log file\|log]]. The design was first proposed ~~in 1988~~ by [[John K. Ousterhout]] and [[Fred Douglis~~]].~~ ~~Designed~~in ~~for~~1988 ~~high~~and ~~write~~first ~~throughput,~~implemented ~~all~~in ~~updates~~1992 toby ~~data~~Ousterhout and ~~metadata~~[[Mendel ~~are~~Rosenblum]] ~~written~~for ~~sequentially~~the toUnix-like a[[Sprite ~~continuous~~(operating ~~stream,~~system)\|Sprite]] ~~called~~distributed aoperating ~~log~~system.<ref name="rblub">{{citation\|title=The ~~design~~Design ~~was~~and ~~first~~Implementation ~~implemented~~of bya ~~Ousterhout~~Log-Structured ~~and~~File System\|url=https://people.eecs.berkeley.edu/~brewer/cs262/LFS.pdf\|publisher=University of California, Berkeley\|year = 1991\|first1 = [[Mendel Rosenblum]].\|last1 =John K. Ousterhout}}</ref> == Rationale == {{More citations needed\|date=April 2025}} Conventional file systems ~~tend to~~ lay out files with great care for spatial locality and make in-place changes to their data structures in order to perform well on optical and magnetic disks, which tend to seek relatively slowly.▼ The design of log-structured file systems is based on the hypothesis that this will no longer be effective because ever-increasing memory sizes on modern computers would lead to I/O becoming write-heavy ~~because~~since reads would be almost always satisfied from memory cache. A log-structured file system thus treats its storage as a [[Circular buffer\|circular log]] and writes sequentially to the head of the log. ~~This maximizes write throughput on magnetic media by avoiding costly seeks.~~ ▼ ▲Conventional file systems tend to lay out files with great care for spatial locality and make in-place changes to their data structures in order to perform well on magnetic disks, which tend to seek relatively slowly. ~~Keeping a log~~This has several important side effects:▼ ▲The design of log-structured file systems is based on the hypothesis that this will no longer be effective because ever-increasing memory sizes on modern computers would lead to I/O becoming write-heavy because reads would be almost always satisfied from memory cache. A log-structured file system thus treats its storage as a [[circular log]] and writes sequentially to the head of the log. This maximizes write throughput on magnetic media by avoiding costly seeks. * Write throughput on optical and magnetic disks is improved because they can be batched into large sequential runs and costly seeks are kept to a minimum. ** The structure is naturally suited to media with [[append-only]] zones or pages such as [[flash storage]]s and [[shingled magnetic recording]] HDDs<ref name=dbsmr>{{Cite web\|url=https://dropbox.tech/infrastructure/extending-magic-pocket-innovation-with-the-first-petabyte-scale-smr-drive-deployment\|title=Extending Magic Pocket Innovation with the first petabyte scale SMR drive deployment\|author=Magic Pocket Hardware Engineering Teams\|website=dropbox.tech}}</ref><ref name=flash>{{cite journal \|last1=Reid \|first1=Colin \|last2=Bernstein \|first2=Phil \|title=Implementing an Append-Only Interface for Semiconductor Storage \|journal=IEEE Data Eng. Bull. \|date=1 January 2010 \|volume=33 \|page=14-20 \|url=http://sites.computer.org/debull/A10dec/hyder.pdf}}</ref> * Writes create multiple, chronologically-advancing versions of both file data and meta-data. Some implementations make these old file versions nameable and accessible, a feature sometimes called time-travel or [[snapshot (computer storage)\|snapshotting]]. This is very similar to a [[versioning file system]].▼ * Recovery from crashes is simpler. Upon its next mount, the file system does not need to walk all its data structures to fix any inconsistencies, but can reconstruct its state from the last consistent point in the log.▼ Log-structured file systems, must reclaim free space from the tail of the log to prevent the file system from becoming full when the head of the log wraps around to meet it. The tail can release space and move forward by skipping over data where newer versions exist further ahead in the log. If there are no newer versions, then the data is moved and appended to the head. ▲Keeping a log has several important side effects: ▲* Writes create multiple, chronologically-advancing versions of both file data and meta-data. Some implementations make these old file versions nameable and accessible, a feature sometimes called time-travel or [[snapshot (computer storage)\|snapshotting]]. This is very similar to a [[versioning file system]]. ▲* Recovery from crashes is simpler. Upon its next mount, the file system does not need to walk all its data structures to fix any inconsistencies, but can reconstruct its state from the last consistent point in the log. * Free space must be constantly reclaimed from the tail of the log to prevent the file system from becoming full when the head of the log wraps around to meet it. The tail itself can skip forward over data for which newer versions exist farther ahead in the log; the remainder is simply moved out of the way by appending it back to the head. To minimize the overhead incurred by this [[garbage collection (computer science)\|garbage collection]], most implementations avoid purely circular logs and divide up their storage into segments or extents. The head of the log can then advance onto non-adjacent segments which are already free or which are less full than at the tail, thus reducing the amount of garbage collection I/O needed to reclaim space. To reduce the overhead incurred by this [[garbage collection (computer science)\|garbage collection]], most implementations avoid purely circular logs and divide up their storage into segments. The head of the log simply advances into non-adjacent segments which are already free. If space is needed, the least-full segments are reclaimed first. This decreases the I/O load (and decreases the [[write amplification]]) of the garbage collector, but becomes increasingly ineffective as the file system fills up and nears capacity. ~~== Implementations ==~~ * [[John K. Ousterhout]] and [[Mendel Rosenblum]] implemented the first log-structured file system for the [[Sprite operating system]] in 1992.<ref name="rosenblum90">Rosenblum, Mendel and Ousterhout, John K. (June 1990) - "[http://citeseer.ist.psu.edu/rosenblum90lfs.html The LFS Storage Manager]". ''Proceedings of the 1990 Summer Usenix''. pp315-324.</ref><ref name="rosenblum92">Rosenblum, Mendel and Ousterhout, John K. (February 1992) - "[http://citeseer.ist.psu.edu/rosenblum91design.html The Design and Implementation of a Log-Structured File System]". ''ACM Transactions on Computer Systems, Vol. 10 Issue 1''. pp26-52.</ref> * [[Log-structured File System (BSD)\|BSD-LFS]], an implementation by [[Margo Seltzer]] was added to 4.4BSD, and was later ported to [[386BSD]]. It lacked support for snapshots. It was removed from FreeBSD and OpenBSD, but still lives on in [[NetBSD]]. * [[Plan 9 from Bell Labs\|Plan 9]]'s [[Fossil (file system)\|Fossil]] file system is also log-structured and supports snapshots. * [[NILFS]] is a log-structured file system implementation for [[Linux]] by [[NTT/Verio]] which supports snapshots. * [[LinLogFS]] (formerly dtfs) and LFS ([http://logfs.sourceforge.net/ http://logfs.sourceforge.net/]) are log-structured file system implementations for Linux. The latter was part of [[Google Summer of Code\|Google Summer of Code 2005]]. Both projects have been abandoned. * [http://aiya.ms.mff.cuni.cz/lfs LFS] is another log-structured file system for Linux developed by Charles University, Prague. It was to include support for snapshots and indexed directories, but development has since ceased. * [[LogFS]] is a scalable flash filesystem for [[Linux]], intended to replace [[JFFS2]] (in development). * [[UBIFS]] for [[Linux]], also intended to replace [[JFFS2]] (in development). * [[ULFS]] is a User-Level Log-structured File System (http://ulfs.sf.net) Using FUSE (http://fuse.sf.net). * [[ZFS]] is a log-structured file system available on Solaris, Mac OS X, and FreeBSD.{{Fact\|date=June 2009}}{{Dubious\|date=June 2009}} Some kinds of storage media, such as [[flash memory]] and [[CD-RW]], slowly degrade as they are written to and have a limited number of erase/write cycles at any one ___location. Log-structured file systems are sometimes used on these media because they make fewer in-place writes and thus prolong the life of the device by [[wear levelling]]. The more common such file systems include: * [[Universal Disk Format\|UDF]] is a file system commonly used on [[optical disc]]s. * [[JFFS]] and its successor [[JFFS2]] are simple [[Linux]] file systems intended for flash-based devices. * [[YAFFS]] is a NAND flash-specific file system for many operating systems (including Linux). == Disadvantages == * The [[design rationale]] for log-structured file systems assumes that most reads will be optimized away by ever-enlarging memory caches. This assumption does not always hold:▼ On magnetic media—where seeks are relatively expensive—the log structure may actually make reads much slower, since it [[fragmentation (computer)#~~External_fragmentation~~External fragmentation\|fragments]] files that conventional file systems normally keep contiguous with in-place writes.▼ On flash memory—where seek times are usually negligible—the log structure may not confer a worthwhile performance gain because write fragmentation has much less of an impact on write throughput. Another issue is stacking one log on top of another log, which is not a very good idea as it forces multiple erases with unaligned access.<ref name="logonlog">{{~~Fact~~citation\|~~date~~title=~~November~~Don't ~~2008~~stack your Log on my Log\|url=https://www.usenix.org/system/files/conference/inflow14/inflow14-yang.pdf\|publisher=SanDisk Corporation\|year = 2014\|first1 = Jingpei Yang\|last1 =Swaminathan Sundararaman}}.</ref> However, many flash based devices ~~can~~cannot ~~only~~rewrite ~~write~~part of a ~~complete~~ block, ~~at a time because~~and they must first perform a (slow) erase cycle of each block before being able to re-write,. ~~so by~~By putting all the writes in one block, this can help performance vsas opposed to writes scattered into various blocks, and each one ~~of which~~ must be copied into a buffer, erased, and written back, which is a clear advantage for so-called "raw" flash memory where flash translation layer is bypassed.{{cn\|date=June 2017}}▼ == See also ==▼ ▲* The [[design rationale]] for log-structured file systems assumes that most reads will be optimized away by ever-enlarging memory caches. This assumption does not always hold: * [[Comparison of file systems]]▼ ▲** On magnetic media—where seeks are relatively expensive—the log structure may actually make reads much slower, since it [[fragmentation (computer)#External_fragmentation\|fragments]] files that conventional file systems normally keep contiguous with in-place writes. * [[~~it:Log~~List of log-structured ~~File~~file ~~System~~systems]]▼ ▲** On flash memory—where seek times are usually negligible—the log structure may not confer a worthwhile performance gain because write fragmentation has much less of an impact on write throughput{{Fact\|date=November 2008}}. However many flash based devices can only write a complete block at a time because they must first perform a (slow) erase cycle before being able to write, so by putting all the writes in one block this can help performance vs writes scattered into various blocks, each one of which must be copied into a buffer, erased, and written back. == References == {{reflist}} == Further reading == ~~<references />~~ * [http://pages.cs.wisc.edu/~remzi/OSTEP/file-lfs.pdf Log-structured File Systems (2014), Arpaci-Dusseau, Remzi H.; Arpaci-Dusseau, Andrea C.; Arpaci-Dusseau Books] {{File systems}} ▲== See also == ▲* [[Comparison of file systems]] [[Category:Computer file systems]] ~~[[Category:Computer storage]]~~ ~~[[Category:Unix]]~~ [[Category:Bell Labs]] [[Category:Fault-tolerant computer systems]] ▲[[it:Log-structured File System]]