Copy-on-write: Difference between revisions

Content deleted Content added
Tags: Reverted references removed
Update storage description to make clear entire file is not copied when modified.
Tags: Visual edit Mobile edit Mobile web edit
 
(2 intermediate revisions by 2 users not shown)
Line 4:
'''Copy-on-write''' ('''COW'''), also called '''implicit sharing'''<ref>{{cite web |title=Implicit Sharing |url=https://doc.qt.io/qt-5/implicit-sharing.html |website=Qt Project |access-date=10 November 2023 |archive-date=8 February 2024 |archive-url=https://web.archive.org/web/20240208175543/https://doc.qt.io/qt-5/implicit-sharing.html |url-status=live }}</ref> or '''shadowing''',<ref>{{cite journal |last=Rodeh |first=Ohad |title=B-Trees, Shadowing, and Clones |journal=ACM Transactions on Storage |volume=3 |issue=4 |date=1 February 2008 |page=1 |citeseerx=10.1.1.161.6863 |s2cid=207166167 |doi=10.1145/1326542.1326544 |url=http://liw.fi/larch/ohad-btrees-shadowing-clones.pdf |archive-url=https://web.archive.org/web/20170102212904/http://liw.fi/larch/ohad-btrees-shadowing-clones.pdf |archive-date=2 January 2017 |access-date=10 November 2023 }}</ref> is a [[Resource management (computing)|resource-management]] technique<ref name="Linux">{{cite book |last1=Bovet |first1=Daniel Pierre |last2=Cesati |first2=Marco |date=1 January 2002 |title=Understanding the Linux Kernel |url=https://books.google.com/books?id=9yIEji1UheIC&q=%22copy%20on%20write%22&pg=PA295 |publisher=O'Reilly Media |isbn=9780596002138 |page=295 |access-date=10 November 2023 |archive-date=15 September 2024 |archive-url=https://web.archive.org/web/20240915132745/https://books.google.com/books?id=9yIEji1UheIC&q=%22copy%20on%20write%22&pg=PA295#v=snippet&q=%22copy%20on%20write%22&f=false |url-status=live }}</ref> used in [[computer programming|programming]] to manage shared data efficiently. Instead of copying data right away when multiple programs use it, the same data is shared between programs until one tries to modify it. If no changes are made, no private copy is created, saving [[System resource#General resources|resources]].<ref name="Linux" /> A copy is only made when needed, ensuring each program has its own version when modifications occur. This technique is commonly applied to memory, files, and data structures.
 
==In virtual memory management==
46JRTSVB Y
Copy-on-write finds its main use in [[operating system]]s, sharing the [[physical memory]] of computers running multiple [[Process (computing)|processes]], in the implementation of the [[Fork (system call)|fork() system call]]. Typically, the new process does not modify any memory and immediately executes a new process, replacing the address space entirely. It would waste processor time and memory to copy all of the old process's memory during the fork only to immediately discard the copy.<ref>{{Cite book |last=Silberschatz |first=Abraham |title=Operating System Concepts |last2=Galvin, Peter B. |last3=Gagne |first3=Greg |publisher=Wiley |year=2018 |isbn=978-1119456339 |edition=10th |pages=120–123}}</ref>
 
Copy-on-write can be implemented efficiently using the [[page table]] by marking certain pages of [[Computer storage|memory]] as read-only and keeping a count of the number of references to the page. When data is written to these pages, the operating-system [[Kernel (operating system)|kernel]] intercepts the write attempt and allocates a new physical page, initialized with the copy-on-write data, although the allocation can be skipped if there is only one reference. The kernel then updates the page table with the new (writable) page, decrements the number of references, and performs the write. The new allocation ensures that a change in the memory of one process is not visible in another's.{{Citation needed|date=November 2023}}
 
The copy-on-write technique can be extended to support efficient [[memory allocation]] by keeping one page of [[physical memory]] filled with zeros. When the memory is allocated, all the pages returned refer to the page of zeros and are all marked copy-on-write. This way, physical memory is not allocated for the process until data is written, allowing processes to reserve more virtual memory than physical memory and use memory sparsely, at the risk of running out of virtual address space. The combined algorithm is similar to [[demand paging]].<ref name="Linux" />
 
Copy-on-write pages are also used in the [[Linux kernel]]'s [[Kernel same-page merging|same-page merging]] feature.<ref>{{Cite web |last=Abbas |first=Ali |title=The Kernel Samepage Merging Process |url=http://alouche.net/blog/2011/07/18/the-kernel-samepage-merging-process/ |url-status=usurped |archive-url=https://web.archive.org/web/20160808174912/http://alouche.net/blog/2011/07/18/the-kernel-samepage-merging-process/ |archive-date=8 August 2016 |access-date=10 November 2023 |website=alouche.net}}</ref>
 
==In software==
Line 26 ⟶ 33:
 
==In computer storage==
COW is used as the underlying mechanism in file systems like [[ZFS]], [[Btrfs]], [[ReFS]], and [[Bcachefs]],<ref>{{cite web |last=Kasampalis |first=Sakis |date=2010 |title=Copy-on-Write Based File Systems Performance Analysis and Implementation |page=19 |url=https://sakisk.me/files/copy-on-write-based-file-systems.pdf |access-date=10 November 2023 |archive-date=5 May 2024 |archive-url=https://web.archive.org/web/20240505220510/https://sakisk.me/files/copy-on-write-based-file-systems.pdf |url-status=live }}</ref> [[ReFS]], and [[Bcachefs]], as well as in [[logical volume management]] and database servers such as [[Microsoft SQL Server#Replication Services|Microsoft SQL Server]].
 
In traditional file systems, filemodifying changesa file overwriteoverwrites the original data blocks in place. WithIn a copy-on-write (COW) file system, whenthe changesoriginal areblocks made,remain aunchanged. newWhen versionpart of thea file is createdmodified, whileonly keepingthe affected blocks are written to new locations, and metadata is updated to point to them, preserving the original intactversion until it’s no longer needed. This approach enables features like [[Snapshot (computer storage)|snapshots]], which capture the state of a file at a specific time without consuming much additional space. Snapshots typically store only the modified data and are kept close to the original. However, they are considered a weak form of [[incremental backup]] and cannot replace a full backup.<ref>{{cite web |last=Chien |first=Tim |title=Snapshots Are NOT Backups |url=https://www.oracle.com/database/technologies/rman-fra-snapshot.html |website=Oracle.com |publisher=Oracle |access-date=10 November 2023 |archive-date=10 November 2023 |archive-url=https://web.archive.org/web/20231110024434/https://www.oracle.com/database/technologies/rman-fra-snapshot.html |url-status=live }}</ref>
 
==See also==