Copy-on-write: Difference between revisions

Content deleted Content added
Rm old uncited.
Update storage description to make clear entire file is not copied when modified.
Tags: Visual edit Mobile edit Mobile web edit
 
(35 intermediate revisions by 18 users not shown)
Line 1:
{{Short description|Programming technique for efficiently duplicating data}}
{{MoreUse citationsdmy neededdates|date=AugustNovember 20202023}}
 
'''Copy-on-write''' ('''COW'''), sometimesalso referred to ascalled '''implicit sharing'''<ref>{{cite web |title= Implicit Sharing |url= httphttps://doc.qt.io/qt-5/implicit-sharing.html |website= Qt Project |access-date=10 4November August2023 |archive-date=8 February 2024 |archive-url=https://web.archive.org/web/20240208175543/https://doc.qt.io/qt-5/implicit-sharing.html |url-status=live 2016}}</ref> or '''shadowing''',<ref>{{cite journal |last= Rodeh |first=Ohad Ohad|title= B-Trees, Shadowing, and Clones |journal= ACM Transactions on Storage |datevolume=3 |issue=4 |date=1 February 2008 |volumepage=1 3|issueciteseerx=10.1.1.161.6863 4|pages2cid=207166167 1|doi= 10.1145/1326542.1326544 |url=http://liw.fi/larch/ohad-btrees-shadowing-clones.pdf |archive-url=https://web.archive.org/web/20170102212904/http://liw.fi/larch/ohad-btrees-shadowing-clones.pdf |accessarchive-date=2 4January August2017 2016|citeseerxaccess-date=10 10.1.1.161.6863|s2cid=November 2023 207166167}}</ref> is a [[Resource management (computing)|resource-management technique used in [[computer programming]] to efficiently implement a "duplicate" or "copy" operation on modifiable resources.technique<ref name="Linux">{{cite book |last1=Bovet |first1=Daniel Pierre |last2=Cesati |first2=Marco |date=1 January 2002 |title=Understanding the Linux Kernel |url=https://books.google.com/books?id=9yIEji1UheIC&q=%22copy%20on%20write%22&pg=PA295 |last1publisher=BovetO'Reilly |first1=Daniel PierreMedia |last2isbn=Cesati9780596002138 |first2page=Marco295 |access-date=2002-01-0110 November 2023 |publisherarchive-date=O'Reilly15 MediaSeptember 2024 |isbnarchive-url=9780596002138https://web.archive.org/web/20240915132745/https://books.google.com/books?id=9yIEji1UheIC&q=%22copy%20on%20write%22&pg=PA295#v=snippet&q=%22copy%20on%20write%22&f=false |pageurl-status=295live }}</ref> Ifused ain resource[[computer isprogramming|programming]] duplicatedto butmanage notshared modified,data itefficiently. isInstead notof necessarycopying todata createright aaway newwhen resource;multiple programs use it, the resourcesame candata beis shared between theprograms copyuntil andone thetries original.to Modificationsmodify mustit. stillIf createno achanges copyare made, henceno the technique: theprivate copy operation is deferredcreated, untilsaving the[[System firstresource#General writeresources|resources]].<ref Byname="Linux" sharing/> resourcesA incopy thisis way,only itmade iswhen possibleneeded, toensuring significantlyeach reduceprogram thehas resourceits consumptionown ofversion unmodifiedwhen copies,modifications whileoccur. addingThis atechnique smallis overheadcommonly applied to resource-modifyingmemory, files, and data operationsstructures.
 
==In virtual memory management==
Copy-on-write finds its main use in [[operating system]]s, sharing the [[virtualphysical memory]] of [[operatingcomputers system]]running multiple [[computerProcess process(computing)|processprocesses]]es, in the implementation of the [[Fork (system call)|fork() system call]]. Typically, the new process does not modify any memory and immediately executes a new process, replacing the address space entirely. Thus, itIt would bewaste wastefulprocessor time and memory to copy all of the old process's memory during athe fork, andonly insteadto immediately discard the copy-on-write.<ref>{{Cite techniquebook is|last=Silberschatz used|first=Abraham |title=Operating System Concepts |last2=Galvin, Peter B. |last3=Gagne |first3=Greg |publisher=Wiley |year=2018 |isbn=978-1119456339 |edition=10th |pages=120–123}}</ref>
 
Copy-on-write can be implemented efficiently using the [[page table]] by marking certain pages of [[computerComputer storage|memory]] as read-only and keeping a count of the number of references to the page. When data is written to these pages, the operating-system [[kernelKernel (operating system)|kernel]] intercepts the write attempt and allocates a new physical page, initialized with the copy-on-write data, although the allocation can be skipped if there is only one reference. The kernel then updates the page table with the new (writable) page, decrements the number of references, and performs the write. The new allocation ensures that a change in the memory of one process is not visible in another's.{{Citation needed|date=November 2023}}
 
The copy-on-write technique can be extended to support efficient [[memory allocation]] by havingkeeping aone page of [[physical memory]] filled with zeros. When the memory is allocated, all the pages returned refer to the page of zeros and are all marked copy-on-write. This way, physical memory is not allocated for the process until data is written, allowing processes to reserve more virtual memory than physical memory and use memory sparsely, at the risk of running out of virtual address space. The combined algorithm is similar to [[demand paging]].<ref name="Linux" />
 
Copy-on-write pages are also used in the [[Linux kernel]]'s [[kernelKernel same-page merging|same-page merging]] feature.<ref>{{citeCite web |last1last=Abbas |first1first=Ali |title=The Kernel Samepage Merging Process |url=http://alouche.net/blog/2011/07/18/the-kernel-samepage-merging-process/ |website=alouche.net|accessurl-datestatus=4usurped August 2016|archive-url=https://web.archive.org/web/20160808174912/http://alouche.net/blog/2011/07/18/the-kernel-samepage-merging-process/ |archive-date=8 August 2016 |urlaccess-statusdate=dead10 November 2023 |website=alouche.net}}</ref>
 
Loading the libraries for an application is also a use of copy-on-write technique. The dynamic linker maps libraries as private like follows. Any writing action on the libraries will trigger a COW in virtual memory management.
<syntaxhighlight lang="c">
openat(AT_FDCWD, "/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
mmap(NULL, 3906144, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0)
mmap(0x7f8a3ced4000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1b0000)
</syntaxhighlight>
 
==In software==
{{expandExpand section|date=October 2017}}
COW is also used in [[Library (computer science)|library]], [[Application software|application]] and [[System software|system]] code.
 
COW is also used in [[Library (computer science)|library]], [[Application software|application]], and [[System software|system]] code.
In [[Multithreading (computer architecture)|multithreaded]] systems, COW can be implemented without the use of traditional [[Lock (software engineering)|locking]] and instead use [[compare-and-swap]] to increment or decrement the internal reference counter. Since the original resource will never be altered, it can safely be copied by multiple threads (after the reference count was increased) without the need of performance-expensive locking such as [[Lock (computer science)|mutexes]]. If the reference counter turns 0, then by definition only 1 thread was holding a reference so the resource can safely be de-allocated from memory, again without the use of performance-expensive locking mechanisms. The benefit of not having to copy the resource (and the resulting performance gain over traditional deep-copying) will therefore be valid in both single- and multithreaded systems.
 
===Examples===
The [[String (C++)|string]] class provided by the [[C++ standard library]] was specifically designed to allow copy-on-write implementations in the initial C++98 standard,<ref name="meyers">{{citationcite book |first=Scott |last=Meyers |author-link=Scott Meyers |yeardate=2012 |title=Effective STL |publisher=Addison-Wesley |pages=64–65 |url=https://books.google.com/books?id=U7lTySXdFk0C&pg=PT734|isbn=9780132979184 }}</ref> but not in the newer C++11 standard:<ref>{{cite web |title=Concurrency Modifications to Basic String |url=httphttps://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2534.html |website=Open Standards |access-date=1310 FebruaryNovember 2023 |archive-date=10 November 2023 |archive-url=https://web.archive.org/web/20231110024434/https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2534.html |url-status=live 2015}}</ref>
<syntaxhighlight lang="cpp">
std::string x("Hello");
 
std::string y = x; // x and y use the same buffer.
 
y += ", World!"; // nowNow y uses a different buffer; x still uses the same old buffer.
// x still uses the same old buffer
</syntaxhighlight>
In the [[PHP]] programming language, all types except references are implemented as copy-on-write. For example, strings and arrays are passed by reference, but when modified, they are duplicated if they have non-zero reference counts. This allows them to act as value types without the performance problems of copying on assignment or making them immutable.<ref>{{cite web|last1=Pauli|first1=Julien|last2=Ferrara|first2=Anthony|last3=Popov|first3=Nikita|title=Memory management|url=http://www.phpinternalsbook.com/zvals/memory_management.html#reference-counting-and-copy-on-write|website=www.phpinternalsbook.com|publisher=PHP Internals Book|access-date=4 August 2016|date=2013}}</ref>
 
In the [[PHP]] programming language, all types except references are implemented as copy-on-write. For example, strings and arrays are passed by reference, but when modified, they are duplicated if they have non-zero reference counts. This allows them to act as value types without the performance problems of copying on assignment or making them immutable.<ref>{{cite web |last1=Pauli |first1=Julien |last2=Ferrara |first2=Anthony |last3=Popov |first3=Nikita |title=Memory management |date=2013 |url=httphttps://www.phpinternalsbook.com/php5/zvals/memory_management.html#reference-counting-and-copy-on-write |website=www.phpinternalsbookPhpInternalsBook.com |publisheraccess-date=PHP10 November Internals2023 Book|accessarchive-date=410 AugustNovember 2023 2016|datearchive-url=2013https://web.archive.org/web/20231110024435/https://www.phpinternalsbook.com/php5/zvals/memory_management.html#reference-counting-and-copy-on-write |url-status=live }}</ref>
In the [[Qt (software)|Qt]] framework, many types are copy-on-write ("implicitly shared" in Qt's terms). Qt uses atomic [[compare-and-swap]] operations to increment or decrement the internal reference counter. Since the copies are cheap, Qt types can often be safely used by multiple threads without the need of locking mechanisms such as [[Mutual exclusion|mutexes]]. The benefits of COW are thus valid in both single- and multithreaded systems.<ref>{{cite web|title=Threads and Implicitly Shared Classes|url=http://doc.qt.io/qt-5/threads-modules.html#threads-and-implicitly-shared-classes|website=Qt Project|access-date=4 August 2016}}</ref>
 
In the [[Qt (software)|Qt]] framework, many types are copy-on-write ("implicitly shared" in Qt's terms). Qt uses atomic [[compare-and-swap]] operations to increment or decrement the internal reference counter. Since the copies are cheap, Qt types can often be safely used by [[Multithreading (computer architecture)|multiple threads]] without the need of [[Lock (computer science)|locking mechanisms]] such as [[Mutual exclusion|mutexes]]. The benefits of COW are thus valid in both single- and multithreaded systems.<ref>{{cite web |title=Threads and Implicitly Shared Classes |website=Qt Project |url=httphttps://doc.qt.io/qt-5/threads-modules.html#threads-and-implicitly-shared-classes |websiteaccess-date=Qt10 November 2023 Project|accessarchive-date=43 AugustDecember 2023 |archive-url=https://web.archive.org/web/20231203002914/https://doc.qt.io/QT-5/threads-modules.html#threads-and-implicitly-shared-classes |url-status=live 2016}}</ref>
 
==In computer storage==
COW is used as the underlying mechanism in file systems like [[ZFS]], [[Btrfs]],<ref>{{cite web |last=Kasampalis |first=Sakis |date=2010 |title=Copy-on-Write Based File Systems Performance Analysis and Implementation |page=19 |url=https://sakisk.me/files/copy-on-write-based-file-systems.pdf |access-date=10 November 2023 |archive-date=5 May 2024 |archive-url=https://web.archive.org/web/20240505220510/https://sakisk.me/files/copy-on-write-based-file-systems.pdf |url-status=live }}</ref> [[ReFS]], and [[Bcachefs]], as well as in [[logical volume management]] and database servers such as [[Microsoft SQL Server#Replication Services|Microsoft SQL Server]].
COW may also be used as the underlying mechanism for [[Snapshot (computer storage)|snapshots]], such as those provided by [[logical volume management]], file systems such as [[Btrfs]] and [[ZFS]],<ref>{{cite web|url=http://sakisk.me/files/copy-on-write-based-file-systems.pdf|title=Copy On Write Based File Systems Performance Analysis And Implementation|last=Kasampalis|first=Sakis|year=2010|page=19|access-date=11 January 2013}}</ref> and database servers such as [[Microsoft SQL Server#Replication Services|Microsoft SQL Server]]. Typically, the snapshots store only the modified data, and are stored close to the main array, so they are only a weak form of [[incremental backup]] and cannot substitute for a [[full backup]].<ref>{{cite web |last=Chien |first=Tim |title=Snapshots Are NOT Backups |url=http://www.oracle.com/technetwork/documentation/rman-fra-snapshot-322251.html |website=Oracle.com |publisher=Oracle |access-date=4 August 2016 }}</ref> Some systems also use a COW technique to avoid the [[fuzzy backup]]s, otherwise incurred when any file in the set of files being backed up changes during that backup.
 
In traditional file systems, modifying a file overwrites the original data blocks in place. In a copy-on-write (COW) file system, the original blocks remain unchanged. When part of a file is modified, only the affected blocks are written to new locations, and metadata is updated to point to them, preserving the original version until it’s no longer needed. This approach enables features like [[Snapshot (computer storage)|snapshots]], which capture the state of a file at a specific time without consuming much additional space. Snapshots typically store only the modified data and are kept close to the original. However, they are considered a weak form of [[incremental backup]] and cannot replace a full backup.<ref>{{cite web |last=Chien |first=Tim |title=Snapshots Are NOT Backups |url=https://www.oracle.com/database/technologies/rman-fra-snapshot.html |website=Oracle.com |publisher=Oracle |access-date=10 November 2023 |archive-date=10 November 2023 |archive-url=https://web.archive.org/web/20231110024434/https://www.oracle.com/database/technologies/rman-fra-snapshot.html |url-status=live }}</ref>
When implementing snapshots, there are two techniques:
* Redirect-on-write or ROW: the original storage is never modified. When a write request is made, it is redirected away from the original data into a new storage area.
* Copy-on-write or COW: when a write request is made, the data are copied into a new storage area, and then the original data are modified.
 
Despite their names, copy-on-write usually refers to the first technique. COW does two data writes compared to ROW's one; it is difficult to implement efficiently and thus used infrequently.
 
==See also==
* [[Allocate-on-flush]]
* [[Dirty COW]] – Aa computer security vulnerability for the Linux operating system kernel
* [[Btrfs]]
* [[Demand paging]]
* [[Dirty COW]] – A computer security vulnerability for the Linux operating system kernel
* [[Flyweight pattern]]
* [[Memory management]]
* [[Persistent data structure]]
* [[ReFS]]
* [[Snapshot (computer storage)]]
* [[Virtual memory]]
* [[Wear leveling]]
 
==References==
{{Reflist}}
 
==External links==
* {{cite web |title=A history of copy-on-write memory management |url=https://obvious.services.net/2011/01/history-of-copy-on-write-memory.html |website=A keen grasp of the obvious |access-date=18 November 2024 |date=21 January 2011}}
 
{{File systems}}
Line 71 ⟶ 57:
[[Category:Software optimization]]
[[Category:Virtual memory]]
[[Category:Virtualization software]]