Distributed version control: Difference between revisions

Content deleted Content added
I expanded the article by adding more detailed subheadings, elaborating on the advantages and contrasts of distributed version control systems (DVCS) compared to centralized systems, and providing more insights into managing distributed projects, integration processes, and collaboration methods.
m Reverted edits by 185.181.109.220 (talk) (AV)
 
(14 intermediate revisions by 13 users not shown)
Line 1:
{{Short description|Software engineering tool}}
[[File:Git session.svg|thumb|The process of initializing a git repository. Git is one of the most popularly used distributed version control software.]]
In software development, distributed version control (also known as distributed revision control) represents a form of version control where the entire codebase, along with its complete history, is mirrored on each developer's computer. This stands in contrast to centralized version control, facilitating automatic management of branching and merging, speeding up most operations (excluding pushing and pulling), enabling improved offline work capabilities, and eliminating reliance on a single ___location for backups. Git, recognized as the world's most popular version control system, exemplifies a distributed version control system.
In [[software development]], '''distributed version control''' (also known as '''distributed revision control''') is a form of [[version control]] in which the complete [[codebase]], including its full history, is mirrored on every developer's computer.<ref name="git-scm">{{cite book | chapter = About version control | chapter-url = https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control | title = Pro Git | first1 = Scott | last1 = Chacon | first2 = Ben | last2 = Straub | edition = 2nd | date = 2014 | publisher = Apress | at = Chapter 1.1 | access-date = 4 June 2019}}</ref> Compared to '''centralized version control''', this enables automatic management [[Branching (version control)|branching]] and [[Merge (version control)|merging]], speeds up most operations (except pushing and fetching), improves the ability to work offline, and does not rely on a single ___location for backups.<ref name="git-scm"/><ref name="Joel 2010" /><ref>{{cite web|title=Intro to Distributed Version Control (Illustrated)|url=https://betterexplained.com/articles/intro-to-distributed-version-control-illustrated/|website=www.betterexplained.com|access-date=7 January 2018}}</ref> [[Git (software)|Git]], the world's most popular version control system,<ref name=":1" /> is a distributed version control system.
 
Back inIn 2010, software development expertauthor [[Joel Spolsky]] described distributed version control systems as "perhapspossibly the mostbiggest significant advancementadvance in software development technology overin the [past] decadeten years".<ref name="Joel 2010">{{cite web
| url=http://joelonsoftware.com/items/2010/03/17.html
| first=Joel
| last=Spolsky
| title=Distributed Version Control Is Here to Stay, Baby
| work=Joel on Software
| date=17 March 2010
| access-date=4 June 2019}}</ref>
 
Comparison between ==Distributed and Centralizedvs. Systems:centralized==
Distributed version control systems (DVCS) adoptuse a [[peer-to-peer]] approach to [[version control]], differingas fromopposed to the client-server[[client–server model|client–server]] approach of centralized systems. Synchronization of distributedDistributed revision control occurssynchronizes repositories by transferring [[Patch (Unix)|patches]] directlyfrom betweenpeer peersto peer. There is no singularsingle central version of the codebase; instead, each user possesseshas a working copy alongsideand the completefull change history.
 
Disadvantages'''Advantages of DVCS (compared towith centralized systems) include:'''
Distributed version control systems (DVCS) adopt a peer-to-peer approach to version control, differing from the client-server approach of centralized systems. Synchronization of distributed revision control occurs by transferring patches directly between peers. There is no singular central version of the codebase; instead, each user possesses a working copy alongside the complete change history.
* Allows users to work productively when not connected to a network.
* Common operations (such as commits, viewing history, and reverting changes) are faster for DVCS, because there is no need to communicate with a central server.<ref name='OSullivan'>{{cite web
| last = O'Sullivan
| first = Bryan
| title = Distributed revision control with Mercurial
| url = http://hgbook.red-bean.com/hgbook.html
| access-date = July 13, 2007 }}</ref> With DVCS, communication is necessary only when sharing changes among other peers.
* Allows private work, so users can use their changes even for early drafts they do not want to publish.{{cn|date=August 2019|reason=This isn't unique to dvcs; any source code control system allows 'private work', though on some it requires changing (private) file permissions}}
* Working copies serveeffectively asfunction effectiveas remote backups, mitigatingwhich relianceavoids relying on aone singlephysical machine as a potentialsingle point of failure.<ref name='OSullivan'/>
* Allows various development models to be used, such as using [[Branching (version control)#Development branch|development branches]] or a Commander/Lieutenant model.<ref>{{cite book|first1=Scott|last1=Chacon|first2=Ben|last2=Straub|edition=2nd|date=2014|publisher=Apress|at=Chapter 5.1|chapter=Distributed workflows|chapter-url=https://git-scm.com/book/en/v2/Distributed-Git-Distributed-Workflows|title=Pro Git}}</ref>
* Permits centralized control of the "release version" of the project{{cn|date=August 2019|reason=Not specific to dvcs; centralized systems generally control release version}}
* On [[FOSS]] software projects it is much easier to create a [[Fork (software development)|project fork]] from a project that is stalled because of leadership conflicts or design disagreements.
 
Advantages'''Disadvantages of DVCS (compared towith centralized systems) include:'''
* Initial checkout of a repository is slower as compared to checkout in a centralized version control system, because all branches and revision history are copied to the local machine by default.
* LackThe lack of locking mechanisms crucialthat foris handlingpart of most centralized VCS and still plays an important role when it comes to non-mergeable binary files likesuch graphicsas graphic assets or too complex single- file binary/ or XML packages (e.g., office documents, PowerBI files, SQL Server Data Tools BI packages, etc.).{{citation needed|date=January 2018}}
* Additional storage required for every user to have a complete copy of the complete codebase history.<ref>{{cite web|title=What is version control: centralized vs. DVCS|url=https://www.atlassian.com/blog/software-teams/version-control-centralized-dvcs|website=www.atlassian.com|date=14 February 2012 |access-date=7 January 2018}}</ref>
* HigherIncreased exposure of the code base, assince eachevery participant possesseshas a locally vulnerable copy.{{cn|date=August 2019|reason=Also true of centralized codebases}}
 
Some originally centralized systems now integrateoffer some distributed features. For instance, [[Team Foundation Server]] and Visual Studio Team Services hostnow bothhost centralized and distributed version control repositories via Git hosting Git.
* Facilitation of productive work even when disconnected from a network.
* Enhanced speed for common operations such as commits, history viewing, and reverting changes, as these do not necessitate communication with a central server. With DVCS, communication is essential only when sharing changes among peers.
* Support for private work, allowing users to utilize changes even for early drafts they do not intend to publish.
* Working copies serve as effective remote backups, mitigating reliance on a single machine as a potential point of failure.
* Facilitation of various development models, such as employing development branches or a Commander/Lieutenant model.
* Permission for centralized control of the "release version" of a project.
 
Similarly, some distributed systems now offer features that mitigate the issues of checkout times and storage costs, such as the [[Virtual File System for Git]] developed by Microsoft to work with very large codebases,<ref>{{cite web|author=Jonathan Allen|url=https://www.infoq.com/news/2017/02/GVFS/|title=How Microsoft Solved Git's Problem with Large Repositories|date=2017-02-08|access-date=2019-08-06}}</ref> which exposes a virtual file system that downloads files to local storage only as they are needed.
On open-source software projects, DVCS significantly simplifies the process of forking a project that has stalled due to leadership conflicts or design disputes.
 
Disadvantages of DVCS (compared to centralized systems) include:
 
* Slower initial checkout of a repository due to the default copying of all branches and revision history to local machines.
* Lack of locking mechanisms crucial for handling non-mergeable binary files like graphics or complex single-file binary/XML packages (e.g., office documents, PowerBI files, SQL Server Data Tools BI packages, etc.).
* Increased storage requirements due to each user maintaining a complete copy of the codebase history locally.
* Higher exposure of the code base, as each participant possesses a locally vulnerable copy.
 
Some originally centralized systems now integrate distributed features. For instance, Team Foundation Server and Visual Studio Team Services host both centralized and distributed version control repositories via Git hosting.
 
Similarly, certain distributed systems now incorporate features addressing checkout times and storage costs, such as Microsoft's Virtual File System for Git. This system works with extensive codebases by exposing a virtual file system that downloads files to local storage only as required.
 
For more insights into modern software development practices and technology trends, visit [https://www.techtrendst0day.website TechTrends Today]'s blog on [https://www.techtrendst0day.website/2024/07/5g-and-its-impact-on-internet-of-things.html 5G Technology and Its Impact on Distributed Version Control Systems].
 
==Work model==
{{Expand section|date=June 2008}}
 
A distributed model is generally better suited for large projects with partly independent developers, such as the [[Linux kernel|Linux Kernel]]. It allows developers to work in independent branches and apply changes that can later be committed, audited and merged (or rejected)<ref>{{Cite web |title=Submitting patches: the essential guide to getting your code into the kernel — The Linux Kernel documentation |url=https://www.kernel.org/doc/html/v5.1/process/submitting-patches.html |access-date=2024-11-22 |website=www.kernel.org}}</ref> by others. This model allows for better flexibility and permits for the creation and adaptation of custom source code branches ([[Fork (software development)|forks]]) whose purpose might differ from the original project. In addition, it permits developers to locally clone an existing code repository and work on such from a local environment where changes are tracked and committed to the local repository<ref>{{Cite web |title=Git - Revision Selection |url=https://git-scm.com/book/en/v2/Git-Tools-Revision-Selection |access-date=2024-11-22 |website=git-scm.com}}</ref> allowing for better tracking of changes before being committed to the master branch of the repository. Such an approach enables developers to work in local and disconnected branches, making it more convenient for larger distributed teams.
'''Distributed Version Control Systems (DVCS) in Software Development:'''
 
Distributed Version Control Systems (DVCS), such as Git, represent a significant evolution in software development practices. Unlike centralized systems, DVCS decentralizes the entire codebase and its history, allowing each developer to maintain a full copy on their local machine. This decentralization fosters greater autonomy and flexibility in development workflows, making it particularly suitable for large projects with diverse contributors.
 
'''Advantages of the Distributed Model:'''
 
The distributed model offers several advantages over traditional centralized systems:
 
* '''Autonomy and Parallel Development:''' Developers can work independently on their local copies of the repository without immediate reliance on a central server. This autonomy allows for parallel development efforts, where multiple contributors can work on different features or fixes simultaneously.
* '''Offline Work Capability:''' Since each developer has a complete copy of the repository, they can continue working and making commits even when disconnected from the network. This feature is crucial for distributed teams or developers working in remote locations with intermittent connectivity.
* '''Efficient Branching and Merging:''' DVCS simplifies branching and merging operations, which are essential for managing concurrent workstreams and integrating changes seamlessly. Developers can create branches for new features or bug fixes, test them locally, and merge them back into the main branch when ready.
* '''Redundancy and Backup:''' Every developer's local repository serves as a backup of the entire code history. This redundancy reduces the risk of data loss due to hardware failures or other unforeseen circumstances.
* '''Flexibility in Workflow:''' DVCS accommodates various development workflows, such as the integrator workflow where a designated integrator merges changes into the main repository after review. This flexibility allows teams to adapt their workflow to fit project requirements and team dynamics.
 
'''Contrast with Centralized Systems:'''
 
In contrast to centralized version control systems, where developers must synchronize their changes through a central server:
 
* '''Dependency on Central Server:''' Centralized systems require developers to interact with a central server for most operations, including committing changes, retrieving history, and merging branches. This centralized dependency can lead to bottlenecks and delays when multiple developers are working simultaneously.
* '''Risk of Single Point of Failure:''' A centralized server represents a single point of failure. If the server goes down or experiences issues, developers may be unable to commit changes or access the latest codebase, disrupting workflow and productivity.
 
'''Managing Distributed Projects:'''
 
===Central and branch repositories===
In a fully distributed project environment, such as open-source initiatives like the Linux kernel:
In a truly distributed project, such as [[Linux]], every contributor maintains their own version of the project, with different contributors hosting their own respective versions and pulling in changes from other users as needed, resulting in a general consensus emerging from multiple different nodes. This also makes the process of "forking" easy, as all that is required is one contributor stop accepting pull requests from other contributors and letting the codebases gradually grow apart.
 
This arrangement, however, can be difficult to maintain, resulting in many projects choosing to shift to a paradigm in which one contributor is the universal "upstream", a repository from whom changes are almost always pulled. Under this paradigm, development is somewhat recentralized, as every project now has a central repository that is informally considered as the official repository, managed by the project maintainers collectively. While distributed version control systems make it easy for new developers to "clone" a copy of any other contributor's repository, in a central model, new developers always clone the central repository to create identical local copies of the code base. Under this system, code changes in the central repository are periodically synchronized with the local repository, and once the development is done, the change should be integrated into the central repository as soon as possible.
* '''Independence of Contributors:''' Each contributor maintains their own repository with the complete project history. Contributors can work on their versions independently, making changes and experimenting without affecting the main codebase until ready.
* '''Forking and Divergence:''' DVCS facilitates forking, where contributors can create separate branches or forks of the project to explore new ideas or directions. This ability to diverge from the main project allows for innovation and experimentation within the community.
* '''Recentralization Trends:''' Despite the benefits of decentralization, some projects adopt a recentralized approach where one repository acts as the primary "upstream" source. This recentralization simplifies governance and ensures consistency across contributions, with maintainers managing the central repository.
 
Organizations utilizing this centralize pattern often choose to host the central repository on a third party service like [[GitHub]], which offers not only more reliable [[uptime]] than self-hosted repositories, but can also add centralized features like [[issue tracking system|issue trackers]] and [[continuous integration]].
'''Integration and Collaboration:'''
 
===Pull requests===
* '''Platform Utilization:''' Many projects leverage platforms like GitHub or GitLab for hosting their repositories. These platforms provide robust features such as issue tracking, code review tools, and continuous integration (CI) pipelines, enhancing collaboration and project management.
Contributions to a source code repository that uses a distributed version control system are commonly made by means of a '''pull request''', also known as a '''merge request'''.<ref name="gitlab-merge-req">{{cite web|last=Sijbrandij|first=Sytse|title=GitLab Flow|date=29 September 2014|access-date=4 August 2018|website=GitLab|url=https://about.gitlab.com/2014/09/29/gitlab-flow/}}</ref> The contributor requests that the project maintainer ''pull'' the source code change, hence the name "pull request". The maintainer has to ''merge'' the pull request if the contribution should become part of the source base.<ref name="ossw">{{cite web|last1=Johnson|first1=Mark|title=What is a pull request?|url=http://oss-watch.ac.uk/resources/pullrequest|website=Oaawatch|access-date=27 March 2016|date=8 November 2013}}</ref>
* '''Pull Requests and Code Review:''' Contributions to DVCS-hosted repositories typically occur through pull requests (or merge requests). Contributors initiate a pull request to propose changes, prompting discussion and review by maintainers and peers. Pull requests serve as a transparent mechanism for code review and integration into the main codebase.
* '''Testing and Quality Assurance:''' Before merging, pull requests undergo rigorous testing, often through automated CI pipelines. These pipelines validate code changes against predefined tests and quality standards, ensuring new features or fixes maintain the project's stability and functionality.
 
The developer creates a pull request to notify maintainers of a new change; a comment thread is associated with each pull request. This allows for [[Code review|focused discussion of code changes]]. Submitted pull requests are visible to anyone with repository access. A pull request can be accepted or rejected by maintainers.<ref>{{cite web|title=Using pull requests|url=https://help.github.com/articles/using-pull-requests/|publisher=GitHub|access-date=27 March 2016}}</ref>
'''Conclusion:'''
 
Once the pull request is reviewed and approved, it is merged into the repository. Depending on the established workflow, the code may need to be tested before being included into official release. Therefore, some projects contain a special branch for merging untested pull requests.<ref name="ossw" /><ref>{{cite web|title=Making a Pull Request|url=https://www.atlassian.com/git/tutorials/making-a-pull-request|publisher=Atlassian|access-date=27 March 2016}}</ref> Other projects run an automated test suite on every pull request, using a [[continuous integration]] tool, and the reviewer checks that any new code has appropriate test coverage.
Distributed version control systems have revolutionized software development by empowering developers with greater autonomy, flexibility, and collaboration capabilities. While they offer significant advantages over centralized systems, including offline work support and decentralized backups, they also introduce challenges such as managing divergent codebases and ensuring consistent integration practices. By leveraging modern DVCS platforms and adopting best practices for branching, merging, and code review, development teams can maximize the benefits of distributed workflows while maintaining project integrity and efficiency.
 
==History==
Line 91 ⟶ 75:
* [[BitKeeper]]
* [[GNU Bazaar]]
* [[Darcs]]
* [[Concurrent Versions System]], a predecessor of distributed version control systems
* [[TortoiseHg]], a graphical interface for Mercurial