Replication (computing): Difference between revisions

Content deleted Content added
No edit summary
Tags: Mobile edit Mobile web edit Advanced mobile edit
Citation bot (talk | contribs)
Alter: title, template type, pages. Add: url, isbn, page, chapter. Removed parameters. Formatted dashes. | Use this bot. Report bugs. | Suggested by Headbomb | Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox | #UCB_webform_linked 366/492
 
(37 intermediate revisions by 30 users not shown)
Line 1:
{{short description|Sharing information to ensure consistency in computing}}
{{More footnotes needed|date=October 2012}}
{{redirect|Replag|information about the replication lag of Wikipedia databases|Wikipedia:REPLAG}}
'''Replication''' in [[computing]] refers to maintaining multiple copies of data, processes, or resources to ensure consistency across redundant components. This fundamental technique spans [[database management system|databases]], [[file system|file systems]], and [[distributed computing|distributed systems]], serving to improve [[high availability|availability]], [[fault-tolerance]], accessibility, and performance.<ref name="kleppmann"/> Through replication, systems can continue operating when components fail ([[failover]]), serve requests from geographically distributed locations, and balance load across multiple machines. The challenge lies in maintaining consistency between replicas while managing the fundamental tradeoffs between data consistency, system availability, and [[Network partition|network partition tolerance]] – constraints known as the [[CAP theorem]].<ref>{{cite book |last=Brewer |first=Eric A. |chapter=Towards robust distributed systems (Abstract) |page=7 |title=Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing |year=2000 |doi=10.1145/343477.343502|isbn=1-58113-183-6 }}</ref>
{{More footnotes|date=October 2012}}
'''Replication''' in [[computing]] involves sharing information so as to ensure consistency between redundant resources, such as [[software]] or [[computer hardware|hardware]] components, to improve reliability, [[fault-tolerance]], or accessibility.
 
== {{Anchor|MASTER-ELECTION}}Terminology ==
Line 13 ⟶ 12:
Replication in space or in time is often linked to scheduling algorithms.<ref>Mansouri, Najme, Gholam, Hosein Dastghaibyfard, and Ehsan Mansouri. "Combination of data replication and scheduling algorithm for improving data availability in Data Grids", ''Journal of Network and Computer Applications'' (2013)</ref>
 
Access to a replicated entity is typically uniform with access to a single non-replicated entity. The replication itself should be [[transparency (human-computer interaction)|transparent]] to an external user. In a failure scenario, a [[failover]] of replicas should be hidden as much as possible with respect to [[quality of service]].<ref>V. Andronikou, K. Mamouras, K. TserpesIzan, D. Kyriazis, T. Varvarigou, "Dynamic QoS-aware Data Replication in Grid Environments", ''Elsevier Future Generation Computer Systems - The International Journal of Grid Computing and eScience'', 2012</ref>
 
Computer scientists further describe replication as being either:
Line 19 ⟶ 18:
* '''Passive replication''', which involves processing every request on a single replica and transferring the result to the other replicas
 
When one leader replica is designated via [[leader election]] to process all the requests, the system is using a primary-backup or [[Master-slave (computers)|masterprimary-slavereplica]] scheme, which is predominant in [[high-availability cluster]]s. In comparison, if any replica can process a request and distribute a new state, the system is using a multi-primary or [[Multi-master replication|multi-master]] scheme. In the latter case, some form of [[distributed concurrency control]] must be used, such as a [[distributed lock manager]].
 
[[Load balancing (computing)|Load balancing]] differs from task replication, since it distributes a load of different computations across machines, and allows a single computation to be dropped in case of failure. Load balancing, however, sometimes uses data replication (especially [[multi-master replication]]) internally, to distribute its data among machines.
Line 32 ⟶ 31:
* '''Transactional replication''': used for replicating [[transactional data]], such as a database. The [[one-copy serializability]] model is employed, which defines valid outcomes of a transaction on replicated data in accordance with the overall [[ACID]] (atomicity, consistency, isolation, durability) properties that transactional systems seek to guarantee.
* '''[[State machine replication]]''': assumes that the replicated process is a [[deterministic finite automaton]] and that [[atomic broadcast]] of every event is possible. It is based on [[Consensus (computer science)|distributed consensus]] and has a great deal in common with the transactional replication model. This is sometimes mistakenly used as a synonym of active replication. State machine replication is usually implemented by a replicated log consisting of multiple subsequent rounds of the [[Paxos algorithm]]. This was popularized by Google's Chubby system, and is the core behind the open-source [[Keyspace (data store)|Keyspace data store]].<ref name=keyspace>{{cite web | access-date=2010-04-18 | year = 2009 | url=http://scalien.com/whitepapers |title=Keyspace: A Consistently Replicated, Highly-Available Key-Value Store | author=Marton Trencseni, Attila Gazso}}</ref><ref name=chubby>{{cite web | access-date=2010-04-18 | year=2006 | url=http://labs.google.com/papers/chubby.html | title=The Chubby Lock Service for Loosely-Coupled Distributed Systems | author=Mike Burrows | url-status=dead | archive-url=https://web.archive.org/web/20100209225931/http://labs.google.com/papers/chubby.html | archive-date=2010-02-09 }}</ref>
* '''[[Virtual synchrony]]''': involves a group of processes which cooperate to replicate in-memory data or to coordinate actions. The model defines a distributed entity called a ''process group''. A process can join a group and is provided with a checkpoint containing the current state of the data replicated by group members. Processes can then send [[multicast]]s to the group and will see incoming multicasts in the identical order. Membership changes are handled as a special multicast that delivers a new "membership view" to the processes in the group.<ref>{{Cite book |last1=Birman |first1=K. |last2=Joseph |first2=T. |title=Proceedings of the eleventh ACM Symposium on Operating systems principles - SOSP '87 |chapter=Exploiting virtual synchrony in distributed systems |date=1987-11-01 |chapter-url=https://doi.org/10.1145/41457.37515 |___location=New York, NY, USA |publisher=Association for Computing Machinery |pages=123–138 |doi=10.1145/41457.37515 |isbn=978-0-89791-242-6|s2cid=7739589 }}</ref>
 
== {{Anchor|DATABASE}}Database replication ==
[[Database]] replication involves maintaining copies of the same data on multiple machines, typically implemented through three main approaches: single-leader, multi-leader, and leaderless replication.<ref name="kleppmann">{{cite book |last=Kleppmann |first=Martin |title=Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems |year=2017 |publisher=O'Reilly Media |isbn=9781491903100 |pages=151–185}}</ref>
[[Database]] replication can be used on many [[database management system]]s (DBMS), usually with a [[master/slave (technology)| primary/replica]] relationship between the original and the copies. The master logs the updates, which then ripple through to the replicas. Each replica outputs a message stating that it has received the update successfully, thus allowing the sending of subsequent updates.
 
In [[Master–slave (technology)|single-leader]] (also called primary/replica) replication, one database instance is designated as the leader (primary), which handles all write operations. The leader logs these updates, which then propagate to replica nodes. Each replica acknowledges receipt of updates, enabling subsequent write operations. Replicas primarily serve read requests, though they may serve stale data due to replication lag – the delay in propagating changes from the leader.
In [[multi-master replication]], updates can be submitted to any database node, and then ripple through to other servers. This is often desired but introduces substantially increased costs and complexity which may make it impractical in some situations. The most common challenge that exists in multi-master replication is transactional conflict prevention or [[conflict resolution|resolution]]. Most synchronous (or eager) replication solutions perform conflict prevention, while asynchronous (or lazy) solutions have to perform conflict resolution. For instance, if the same record is changed on two nodes simultaneously, an eager replication system would detect the conflict before confirming the commit and abort one of the transactions. A [[lazy replication]] system would allow both [[database transaction|transactions]] to commit and run a conflict resolution during re-synchronization.<ref>{{cite web|title=Conflict Resolution|url=http://www.ittia.com/html/ittia-db-docs/users-guide/replication.html#conflict-resolution|publisher=ITTIA|access-date=21 October 2016}}</ref> The resolution of such a conflict may be based on a [[timestamp]] of the transaction, on the hierarchy of the origin nodes or on much more complex logic, which decides consistently across all nodes.
 
In [[multi-master replication]] (also called multi-leader), updates can be submitted to any database node, which then propagate to other servers. This approach is particularly beneficial in multi-data center deployments, where it enables local write processing while masking inter-data center network latency.<ref name="kleppmann"/> However, it introduces substantially increased costs and complexity which may make it impractical in some situations. The most common challenge that exists in multi-master replication is transactional conflict prevention or [[conflict resolution|resolution]] when concurrent modifications occur on different leader nodes.
Database replication becomes more complex when it scales up [[horizontal scalability|horizontally]] and vertically. Horizontal scale-up has more data replicas, while vertical scale-up has data replicas located at greater physical distances. Problems raised by horizontal scale-up can be alleviated by a multi-layer, multi-view access [[network protocol|protocol]]. The early problems of vertical scale-up have largely been addressed by improving Internet [[Reliability (computer networking)|reliability]] and performance.<ref>{{cite web
| url = http://facta.junis.ni.ac.rs/eae/fu2k71/4obradovic.pdf
| title = Measurement of the Achieved Performance Levels of the WEB Applications With Distributed Relational Database
| work = Electronics and Energetics | volume = 20 | number = 1 | page = 31{{ndash}}43
| date = April 2007 | access-date = 30 January 2014
| author1 = Dragan Simic | author2 = Srecko Ristic | author3 = Slobodan Obradovic
| publisher = Facta Universitatis
| format = PDF
}}</ref><ref>{{cite web
| url = http://oatao.univ-toulouse.fr/12933/1/Mokadem_12933.pdf
| title = Data Replication Strategies with Performance Objective in Data Grid Systems: A Survey
| work = Internal journal of grid and utility computing | volume = 6 | number = 1 | page = 30{{ndash}}46
| date = December 2014 | access-date = 18 December 2014
| author1 = Mokadem Riad | author2 = Hameurlain Abdelkader
| publisher = Underscience Publisher
| format = PDF
}}</ref>
 
In [[multi-master replication]], updates can be submitted to any database node, and then ripple through to other servers. This is often desired but introduces substantially increased costs and complexity which may make it impractical in some situations. The most common challenge that exists in multi-master replication is transactional conflict prevention or [[conflict resolution|resolution]]. Most synchronous (or eager) replication solutions perform conflict prevention, while asynchronous (or lazy) solutions have to perform conflict resolution. For instance, if the same record is changed on two nodes simultaneously, an eager replication system would detect the conflict before confirming the commit and abort one of the transactions. A [[lazy replication]] system would allow both [[database transaction|transactions]] to commit and run a conflict resolution during re-synchronization.<ref>{{cite web|title=Conflict Resolution|url=http://www.ittia.com/html/ittia-db-docs/users-guide/replication.html#conflict-resolution|publisher=ITTIA|access-date=21 Octobermethods 2016}}</ref>can Theinclude resolutiontechniques oflike such a conflict may be based on a [[timestamp]] of the transactionlast-write-wins, on the hierarchy of the origin nodes or on much more complexapplication-specific logic, whichor decidesmerging consistentlyconcurrent acrossupdates.<ref all nodes.name="kleppmann"/>
When data is replicated between database servers, so that the information remains consistent throughout the database system and users cannot tell or even know which server in the DBMS they are using, the system is said to exhibit replication transparency.
 
However, replication transparency can not always be achieved. When data is replicated in a database, they will be constrained by [[CAP theorem]] or [[PACELC theorem]]. In the NoSQL movement, data consistency is usually sacrificed in exchange for other more desired properties, such as availability (A), partition tolerance (P), etc. Various [[Consistency model|data consistency models]] have also been developed to serve as Service Level Agreement (SLA) between service providers and the users.
 
There are several techniques for replicating data changes between nodes:<ref name="kleppmann"/>
* '''Statement-based replication''': Write requests (such as SQL statements) are logged and transmitted to replicas for execution. This can be problematic with non-deterministic functions or statements having side effects.
* '''Write-ahead log (WAL) shipping''': The storage engine's low-level write-ahead log is replicated, ensuring identical data structures across nodes.
* '''Logical (row-based) replication''': Changes are described at the row level using a dedicated log format, providing greater flexibility and independence from storage engine internals.
 
== Disk storage replication ==
[[File:Storage replication-en.pngsvg|thumb|Storage replication]]
Active (real-time) storage replication is usually implemented by distributing updates of a [[block device]] to several physical [[hard disk]]s. This way, any [[file system]] supported by the [[operating system]] can be replicated without modification, as the file system code works on a level above the block device driver layer. It is implemented either in hardware (in a [[disk array controller]]) or in software (in a [[device driver]]).
 
The most basic method is [[disk mirroring]], which is typical for locally connected disks. The storage industry narrows the definitions, so ''mirroring'' is a local (short-distance) operation. A replication is extendable across a [[computer network]], so that the disks can be located in physically distant locations, and the master-slaveprimary/replica database replication model is usually applied. The purpose of replication is to prevent damage from failures or [[Disaster Recovery|disaster]]s that may occur in one ___location – or in case such events do occur, to improve the ability to recover data. For replication, latency is the key factor because it determines either how far apart the sites can be or the type of replication that can be employed.
 
The main characteristic of such cross-site replication is how write operations are handled, through either asynchronous or synchronous replication; synchronous replication needs to wait for the destination server's response in any write operation whereas asynchronous replication does not.
Line 102 ⟶ 92:
 
One of the notable implementations is [[rsync]].
 
== Replication within file ==
 
In a [[paging]] operating system, pages in a paging file are sometimes replicated within a track to reduce rotational latency.
 
In [[IBM]]'s [[VSAM]], index data are sometimes replicated within a track to reduce rotational latency.
 
== Distributed shared memory replication ==
Line 112 ⟶ 108:
A weakness of primary-backup schemes is that only one is actually performing operations. Fault-tolerance is gained, but the identical backup system doubles the costs. For this reason, starting {{circa|1985}}, the distributed systems research community began to explore alternative methods of replicating data. An outgrowth of this work was the emergence of schemes in which a group of replicas could cooperate, with each process acting as a backup while also handling a share of the workload.
 
Computer scientist [[Jim Gray (computer scientist)|Jim Gray]] analyzed multi-primary replication schemes under the transactional model and published a widely cited paper skeptical of the approach "The Dangers of Replication and a Solution".<ref>[http://research.microsoft.com/~gray/replicas.ps "The Dangers of Replication and a Solution"]</ref><ref>''Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data: SIGMOD '99'', Philadelphia, PA, US; June 1–3, 1999, Volume 28; p. 3.</ref> He argued that unless the data splits in some natural way so that the database can be treated as ''n'' {{math|n}} disjoint sub-databases, concurrency control conflicts will result in seriously degraded performance and the group of replicas will probably slow as a function of ''n''. Gray suggested that the most common approaches are likely to result in degradation that scales as ''O(n³)''. His solution, which is to partition the data, is only viable in situations where data actually has a natural partitioning key.
 
In the 1985–1987, the [[virtual synchrony]] model was proposed and emerged as a widely adopted standard (it was used in the Isis Toolkit, Horus, Transis, Ensemble, Totem, [[Spread Toolkit|Spread]], C-Ensemble, Phoenix and Quicksilver systems, and is the basis for the [[Common Object Request Broker Architecture|CORBA]] fault-tolerant computing standard). Virtual synchrony permits a multi-primary approach in which a group of processes cooperates to parallelize some aspects of request processing. The scheme can only be used for some forms of in-memory data, but can provide linear speedups in the size of the group.
 
A number of modern products support similar schemes. For example, the Spread Toolkit supports this same virtual synchrony model and can be used to implement a multi-primary replication scheme; it would also be possible to use C-Ensemble or Quicksilver in this manner. [[WANdisco]] permits active replication where every node on a network is an exact copy or replica and hence every node on the network is active at one time; this scheme is optimized for use in a [[wide area network]] (WAN).
 
Modern multi-primary replication protocols optimize for the common failure-free operation. Chain replication<ref>{{Cite journal |last1=van Renesse |first1=Robbert |last2=Schneider |first2=Fred B. |date=2004-12-06 |title=Chain replication for supporting high throughput and availability |url=https://dl.acm.org/doi/abs/10.5555/1251254.1251261 |journal=Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation - Volume 6 |series=OSDI'04 |___location=USA |publisher=USENIX Association |pages=7 |doi=}}</ref> is a  popular family of such protocols. State-of-the-art protocol variants<ref>{{Cite journal |last1=Terrace |first1=Jeff |last2=Freedman |first2=Michael J. |date=2009-06-14 |title=Object storage on CRAQ: high-throughput chain replication for read-mostly workloads |url=https://dl.acm.org/doi/abs/10.5555/1855807.1855818 |journal=USENIX Annual Technical Conference |series=USENIX'09 |___location=USA |pages=11 |doi=}}</ref> of chain replication offer high throughput and strong consistency by arranging replicas in a chain for writes. This approach enables local reads on all replica nodes but has high latency for writes that must traverse multiple nodes sequentially.
 
A more recent multi-primary protocol, [https://hermes-protocol.com/ Hermes],<ref>{{Cite book |last1=Katsarakis |first1=Antonios |last2=Gavrielatos |first2=Vasilis |last3=Katebzadeh |first3=M.R. Siavash |last4=Joshi |first4=Arpit |last5=Dragojevic |first5=Aleksandar |last6=Grot |first6=Boris |last7=Nagarajan |first7=Vijay |title=Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems |chapter=Hermes: A Fast, Fault-Tolerant and Linearizable Replication Protocol |date=2020-03-13 |chapter-url=https://doi.org/10.1145/3373376.3378496 |series=ASPLOS '20 |___location=New York, NY, USA |publisher=Association for Computing Machinery |pages=201–217 |doi=10.1145/3373376.3378496 |hdl=20.500.11820/c8bd74e1-5612-4b81-87fe-175c1823d693 |isbn=978-1-4503-7102-5|s2cid=210921224 |url=https://www.pure.ed.ac.uk/ws/files/130434070/Hermes_a_Fast_KATASARAKIS_DOA02122019_AFV.pdf }}</ref> combines cache-coherent-inspired invalidations and logical timestamps to achieve strong consistency with local reads and high-performance writes from all replicas. During fault-free operation, its broadcast-based writes are non-conflicting and commit after just one multicast round-trip to replica nodes. This design results in high throughput and low latency for both reads and writes.
 
==See also==
Line 125:
* [[Multi-master replication]]
* [[Optimistic replication]]
* [[Shard (data)]]
* [[State machine replication]]
* [[Virtual synchrony]]
Line 130 ⟶ 131:
==References==
{{Reflist|30em}}
 
{{-}}
 
{{Authority control}}
Line 137 ⟶ 136:
[[Category:Data synchronization]]
[[Category:Fault-tolerant computer systems]]
[[Category:Database management systems]]