Distributed version control: Difference between revisions

Content deleted Content added
m Made first appearance of "Subversion" a link, changed second appearance to normal text.
m Reverted edits by 185.181.109.220 (talk) (AV)
 
(339 intermediate revisions by more than 100 users not shown)
Line 1:
{{Short description|Software engineering tool}}
{{nofootnotes|date=July 2008}}
[[File:Git session.svg|thumb|The process of initializing a git repository. Git is one of the most popularly used distributed version control software.]]
{{Cleanup|date=November 2007}}
In [[software development]], '''distributed version control''' (also known as '''distributed revision control''') is a form of [[version control]] in which the complete [[codebase]], including its full history, is mirrored on every developer's computer.<ref name="git-scm">{{cite book | chapter = About version control | chapter-url = https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control | title = Pro Git | first1 = Scott | last1 = Chacon | first2 = Ben | last2 = Straub | edition = 2nd | date = 2014 | publisher = Apress | at = Chapter 1.1 | access-date = 4 June 2019}}</ref> Compared to '''centralized version control''', this enables automatic management [[Branching (version control)|branching]] and [[Merge (version control)|merging]], speeds up most operations (except pushing and fetching), improves the ability to work offline, and does not rely on a single ___location for backups.<ref name="git-scm"/><ref name="Joel 2010" /><ref>{{cite web|title=Intro to Distributed Version Control (Illustrated)|url=https://betterexplained.com/articles/intro-to-distributed-version-control-illustrated/|website=www.betterexplained.com|access-date=7 January 2018}}</ref> [[Git (software)|Git]], the world's most popular version control system,<ref name=":1" /> is a distributed version control system.
{{Cleanup-laundry|date=January 2008}}
{{merge|revision control#Distributed revision control|Talk:Distributed revision control#Merger proposal|date=January 2008}}
'''Distributed revision control''' (or '''Distributed Version Control (Systems) (DVCS)''', or '''Decentralized Version Control''') is a fairly recent innovation in [[Computer software|software]] [[revision control]]. It provides some significant advantages over the more traditional centralized approach to revision control, and it has some defining characteristics that separate it from centralized systems. However, the line between distributed and centralized systems is graying in some regards, especially since DVCSs can be used in a "centralized mode".
 
In 2010, software development author [[Joel Spolsky]] described distributed version control systems as "possibly the biggest advance in software development technology in the [past] ten years".<ref name="Joel 2010">{{cite web
==Vs Centralised==
| url=http://joelonsoftware.com/items/2010/03/17.html
{{Original research|date=January 2008}}
| first=Joel
| last=Spolsky
| title=Distributed Version Control Is Here to Stay, Baby
| work=Joel on Software
| date=17 March 2010
| access-date=4 June 2019}}</ref>
 
==Distributed vs. centralized==
Comparisons are often made between
Distributed version control systems (DVCS) use a [[peer-to-peer]] approach to [[version control]], as opposed to the [[client–server model|client–server]] approach of centralized systems. Distributed revision control synchronizes repositories by transferring [[Patch (Unix)|patches]] from peer to peer. There is no single central version of the codebase; instead, each user has a working copy and the full change history.
 
'''Advantages of DVCS (compared with centralized systems) include:'''
===Differences===
* Allows users to work productively when not connected to a network.
* Each developer does work on his own local [[Software repository|repository]].
* Common operations (such as commits, viewing history, and reverting changes) are faster for DVCS, because there is no need to communicate with a central server.<ref name='OSullivan'>{{cite web
* Working model epitomizes [[The Cathedral and the Bazaar|bazaar]]-style development in that ''anyone'' can create their own [[Branching (software)|branch]].
| last = O'Sullivan
* Repositories are [[Repository clone|cloned]] by anyone, and are often cloned many times.
| first = Bryan
* There may be many "central" repositories.
| title = Distributed revision control with Mercurial
* [[Access control list]]s are not employed. Instead code from disparate repositories are [[Merge (revision control)|merged]] based on a [[web of trust]], i.e., historical merit or quality of changes.
| url = http://hgbook.red-bean.com/hgbook.html
* [[Repository lieutenant|Lieutenants]] are project members who have the power to dynamically decide which branches to merge.
| access-date = July 13, 2007 }}</ref> With DVCS, communication is necessary only when sharing changes among other peers.
* Network is not involved in most operations.
* Allows private work, so users can use their changes even for early drafts they do not want to publish.{{cn|date=August 2019|reason=This isn't unique to dvcs; any source code control system allows 'private work', though on some it requires changing (private) file permissions}}
* A separate set of "sync" operations are available for committing or receiving changes with remote repositories.
* Working copies effectively function as remote backups, which avoids relying on one physical machine as a single point of failure.<ref name='OSullivan'/>
* Allows various development models to be used, such as using [[Branching (version control)#Development branch|development branches]] or a Commander/Lieutenant model.<ref>{{cite book|first1=Scott|last1=Chacon|first2=Ben|last2=Straub|edition=2nd|date=2014|publisher=Apress|at=Chapter 5.1|chapter=Distributed workflows|chapter-url=https://git-scm.com/book/en/v2/Distributed-Git-Distributed-Workflows|title=Pro Git}}</ref>
* Permits centralized control of the "release version" of the project{{cn|date=August 2019|reason=Not specific to dvcs; centralized systems generally control release version}}
* On [[FOSS]] software projects it is much easier to create a [[Fork (software development)|project fork]] from a project that is stalled because of leadership conflicts or design disagreements.
 
'''Disadvantages of DVCS (compared with centralized systems) include:'''
===Advantages===
* Initial checkout of a repository is slower as compared to checkout in a centralized version control system, because all branches and revision history are copied to the local machine by default.
* Allows users to work productively even when not connected to a network
* The lack of locking mechanisms that is part of most centralized VCS and still plays an important role when it comes to non-mergeable binary files such as graphic assets or too complex single file binary or XML packages (e.g. office documents, PowerBI files, SQL Server Data Tools BI packages, etc.).{{citation needed|date=January 2018}}
* Makes most operations much faster since no network is involved
* Additional storage required for every user to have a complete copy of the complete codebase history.<ref>{{cite web|title=What is version control: centralized vs. DVCS|url=https://www.atlassian.com/blog/software-teams/version-control-centralized-dvcs|website=www.atlassian.com|date=14 February 2012 |access-date=7 January 2018}}</ref>
* Allows participation in projects without requiring permissions from project authorities, and thus arguably better fosters culture of meritocracy instead of requiring "committer" status.
* Increased exposure of the code base since every participant has a locally vulnerable copy.{{cn|date=August 2019|reason=Also true of centralized codebases}}
* Allows private work, so you can use your revision control system even for early drafts you don't want to publish
* Avoids relying on a single physical machine. A server disk crash is a non-event with Distributed revision control
* Code migration into "pristine environment" can take a different route from developer to its final destination. e.g. through a review process or a continuous integration environment before it gets merged to its final destination.
 
Some originally centralized systems now offer some distributed features. [[Team Foundation Server]] and Visual Studio Team Services now host centralized and distributed version control repositories via hosting Git.
===Disadvantages===
* Will often require many more merge conflict resolution by hand efforts.
* Many teams have long used and grown accustomed to the centralized model, and are reluctant to change
* [[Source code]] is considered the "crown jewels" of a software group. Centralized VCSs have been around much longer and thus perceived to be more stable
* Some projects want or need centralized control
* Distributed systems can end up with a person as the central point of control, rather than a server
* Concepts of DVCSs are slightly more difficult for developers to grasp. They become required to know more about infrastructure.
 
Similarly, some distributed systems now offer features that mitigate the issues of checkout times and storage costs, such as the [[Virtual File System for Git]] developed by Microsoft to work with very large codebases,<ref>{{cite web|author=Jonathan Allen|url=https://www.infoq.com/news/2017/02/GVFS/|title=How Microsoft Solved Git's Problem with Large Repositories|date=2017-02-08|access-date=2019-08-06}}</ref> which exposes a virtual file system that downloads files to local storage only as they are needed.
==Work Model==
The distributed model also impacts the traditional developer working model.
 
==Work model==
{{Expand-section|date=June 2008}}
{{Expand section|date=June 2008}}
 
A distributed model is generally better suited for large projects with partly independent developers, such as the [[Linux kernel|Linux Kernel]]. It allows developers to work in independent branches and apply changes that can later be committed, audited and merged (or rejected)<ref>{{Cite web |title=Submitting patches: the essential guide to getting your code into the kernel — The Linux Kernel documentation |url=https://www.kernel.org/doc/html/v5.1/process/submitting-patches.html |access-date=2024-11-22 |website=www.kernel.org}}</ref> by others. This model allows for better flexibility and permits for the creation and adaptation of custom source code branches ([[Fork (software development)|forks]]) whose purpose might differ from the original project. In addition, it permits developers to locally clone an existing code repository and work on such from a local environment where changes are tracked and committed to the local repository<ref>{{Cite web |title=Git - Revision Selection |url=https://git-scm.com/book/en/v2/Git-Tools-Revision-Selection |access-date=2024-11-22 |website=git-scm.com}}</ref> allowing for better tracking of changes before being committed to the master branch of the repository. Such an approach enables developers to work in local and disconnected branches, making it more convenient for larger distributed teams.
==History==
The first DVCS is Reliable Software's [[Code Co-op]] (1997). First generation DVCSes include [[GNU arch|Arch]] and [[Monotone (software)|Monotone]]. The second generation was prompted by the arrival of [[Darcs]], followed by a host of others, including [[Mercurial (software)|Mercurial]], [[Bazaar (software)|Bazaar]], and [[Git (software)|Git]].
 
===Central and branch repositories===
See the [[List of revision control software]] for a more comprehensive list.
In a truly distributed project, such as [[Linux]], every contributor maintains their own version of the project, with different contributors hosting their own respective versions and pulling in changes from other users as needed, resulting in a general consensus emerging from multiple different nodes. This also makes the process of "forking" easy, as all that is required is one contributor stop accepting pull requests from other contributors and letting the codebases gradually grow apart.
 
This arrangement, however, can be difficult to maintain, resulting in many projects choosing to shift to a paradigm in which one contributor is the universal "upstream", a repository from whom changes are almost always pulled. Under this paradigm, development is somewhat recentralized, as every project now has a central repository that is informally considered as the official repository, managed by the project maintainers collectively. While distributed version control systems make it easy for new developers to "clone" a copy of any other contributor's repository, in a central model, new developers always clone the central repository to create identical local copies of the code base. Under this system, code changes in the central repository are periodically synchronized with the local repository, and once the development is done, the change should be integrated into the central repository as soon as possible.
==Future==
Some natively centralized systems are starting to grow distributed features. For example, [[Subversion (software)|Subversion]] is able to do many operations with no network, and it is planning to implement [[local commit]]s. It may become more difficult to separate natively distributed vs centralized systems.
 
Organizations utilizing this centralize pattern often choose to host the central repository on a third party service like [[GitHub]], which offers not only more reliable [[uptime]] than self-hosted repositories, but can also add centralized features like [[issue tracking system|issue trackers]] and [[continuous integration]].
Due to the explosion of new DVCSs in the last couple years, it is likely that some of them will slow down or die off.
 
===Pull requests===
There are many tools that rely on version control, such as wikis, [[file systems]], and [[text editor]]s. Some are starting to adopt DVCS features, and even integrate with them. E.g., the [http://ygingras.net/gazest Gazest wiki].
Contributions to a source code repository that uses a distributed version control system are commonly made by means of a '''pull request''', also known as a '''merge request'''.<ref name="gitlab-merge-req">{{cite web|last=Sijbrandij|first=Sytse|title=GitLab Flow|date=29 September 2014|access-date=4 August 2018|website=GitLab|url=https://about.gitlab.com/2014/09/29/gitlab-flow/}}</ref> The contributor requests that the project maintainer ''pull'' the source code change, hence the name "pull request". The maintainer has to ''merge'' the pull request if the contribution should become part of the source base.<ref name="ossw">{{cite web|last1=Johnson|first1=Mark|title=What is a pull request?|url=http://oss-watch.ac.uk/resources/pullrequest|website=Oaawatch|access-date=27 March 2016|date=8 November 2013}}</ref>
 
The developer creates a pull request to notify maintainers of a new change; a comment thread is associated with each pull request. This allows for [[Code review|focused discussion of code changes]]. Submitted pull requests are visible to anyone with repository access. A pull request can be accepted or rejected by maintainers.<ref>{{cite web|title=Using pull requests|url=https://help.github.com/articles/using-pull-requests/|publisher=GitHub|access-date=27 March 2016}}</ref>
 
Once the pull request is reviewed and approved, it is merged into the repository. Depending on the established workflow, the code may need to be tested before being included into official release. Therefore, some projects contain a special branch for merging untested pull requests.<ref name="ossw" /><ref>{{cite web|title=Making a Pull Request|url=https://www.atlassian.com/git/tutorials/making-a-pull-request|publisher=Atlassian|access-date=27 March 2016}}</ref> Other projects run an automated test suite on every pull request, using a [[continuous integration]] tool, and the reviewer checks that any new code has appropriate test coverage.
 
==History==
The first open-source DVCS systems included [[GNU arch|Arch]], [[Monotone (software)|Monotone]], and [[Darcs]]. However, open source DVCSs were never very popular until the release of [[Git (software)|Git]] and [[Mercurial (software)|Mercurial]].
 
[[BitKeeper]] was used in the development of the [[Linux kernel]] from 2002 to 2005.<ref name=":0">{{Cite news|url=http://www.infoworld.com/article/2670360/operating-systems/linus-torvalds--bitkeeper-blunder.html|title=Linus Torvalds' BitKeeper blunder|last=McAllister|first=Neil|work=InfoWorld|access-date=2017-03-19|language=en}}</ref> The development of [[Git (software)|Git]], now the world's most popular version control system,<ref name=":1">{{cite web|title=Version Control Systems Popularity in 2016|url=https://rhodecode.com/insights/version-control-systems-2016|website=www.rhodecode.com|access-date=7 January 2018}}</ref> was prompted by the decision of the company that made BitKeeper to rescind the free license that Linus Torvalds and some other Linux kernel developers had previously taken advantage of.<ref name=":0" />
 
==See also==
{{columns-list|colwidth=30em|
* [[List of revision control software]]
* [[Comparison of revisionVersion control software]]
* [[List of version-control software]]
* [[Comparison of version-control software]]
* [[:Category:Software using distributed version control]]
* [[Repository clone]]
* [[Git]], an [[Open-source software|open source]] DVCS developed for Linux Kernel development
* [[Mercurial]], a cross-platform system similar to Git
* [[Fossil (software)|Fossil]], a distributed version control system, bug tracking system and wiki software
* [[BitKeeper]]
* [[GNU Bazaar]]
* [[Darcs]]
* [[Concurrent Versions System]], a predecessor of distributed version control systems
* [[TortoiseHg]], a graphical interface for Mercurial
* [[Code Co-op]], a peer-to-peer version control system
}}
 
==References==
{{reflist}}
 
==External links==
* [http://www.dwheeler.com/essays/scm.html Essay on various revision control systems], especially the section "Centralized vs. Decentralized SCM"
* [https://web.archive.org/web/20090602084310/http://www.ibm.com/developerworks/aix/library/au-dist_ver_control/ Introduction to distributed version control systems] - IBM Developer Works article
* [http://lwn.net/Articles/246381/ Linus Torvalds email describing DVCS to KDE developers]
 
* [http://www.youtube.com/watch?v=4XpnKHJAok8 Video of a talk Linus Torvalds gave about Git]
{{Version control software}}
* [http://video.google.com/videoplay?docid=-7724296011317502612 Bryan O'Sullivan video on Mercurial]
* [http://blog.red-bean.com/sussman/?p=20 Ben Collins-Sussman] (one of Subversion's authors) article on "The Risks of Distributed Version Control"
* [http://people.ubuntu.com/~ianc/papers/dvcs-why-and-how.xhtml Distributed Version Control Systems - Why and How] by Ian Clatworthy, Bazaar/Canonical
 
{{DEFAULTSORT:Distributed Revision Control}}
[[Category:Version control]]
[[Category:Free software projects]]
[[Category:Free version control software]]
[[Category:Distributed version control systems]]
[[Category:Concurrent Versions System| ]]
 
[[de:Versionsverwaltung#Verteilte Versionsverwaltung]]
[[fr:Gestion de version décentralisée]]
[[ja:分散型バージョン管理システム]]