Replication (computing): Difference between revisions

Content deleted Content added
References: add cat from redirect
rework para
Line 34:
 
== {{Anchor|DATABASE}}Database replication ==
[[Database]] replication involves maintaining copies of the same data on multiple machines, typically implemented through three main approaches: single-leader, multi-leader, and leaderless replication.<ref name="kleppmann">{{cite book |last=Kleppmann |first=Martin |title=Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems |year=2017 |publisher=O'Reilly Media |isbn=9781491903100 |pages=151-185}}</ref>
[[Database]] replication can be used on many [[database management system]]s (DBMS), usually with a [[master/slave (technology)|primary/replica]] relationship between the original and the copies. The primary logs the updates, which then ripple through to the replicas. Each replica outputs a message stating that it has received the update successfully, thus allowing the sending of subsequent updates.
 
In [[Master–slave (technology)|single-leader]] (also called primary/replica) replication, one database instance is designated as the leader (primary), which handles all write operations. The leader logs these updates, which then propagate to replica nodes. Each replica acknowledges receipt of updates, enabling subsequent write operations. Replicas primarily serve read requests, though they may serve stale data due to replication lag – the delay in propagating changes from the leader.
In [[multi-master replication]], updates can be submitted to any database node, and then ripple through to other servers. This is often desired but introduces substantially increased costs and complexity which may make it impractical in some situations. The most common challenge that exists in multi-master replication is transactional conflict prevention or [[conflict resolution|resolution]]. Most synchronous (or eager) replication solutions perform conflict prevention, while asynchronous (or lazy) solutions have to perform conflict resolution. For instance, if the same record is changed on two nodes simultaneously, an eager replication system would detect the conflict before confirming the commit and abort one of the transactions. A [[lazy replication]] system would allow both [[database transaction|transactions]] to commit and run a conflict resolution during re-synchronization.<ref>{{cite book
|title=ITTIA DB SQL™ User's Guide
|chapter=Replication -- Conflict Resolution
|chapter-url=http://www.ittia.com/html/ittia-db-docs/users-guide/replication.html#conflict-resolution
|publisher=ITTIA L.L.C.
|access-date=21 October 2016
|archive-date=24 November 2018
|archive-url=https://web.archive.org/web/20181124055015/http://www.ittia.com/html/ittia-db-docs/users-guide/replication.html}}</ref>
The resolution of such a conflict may be based on a [[timestamp]] of the transaction, on the hierarchy of the origin nodes or on much more complex logic, which decides consistently across all nodes.
 
In [[multi-master replication]] (also called multi-leader), updates can be submitted to any database node, which then propagate to other servers. This approach is particularly beneficial in multi-data center deployments, where it enables local write processing while masking inter-data center network latency.<ref name="kleppmann"/> However, it introduces substantially increased costs and complexity which may make it impractical in some situations. The most common challenge that exists in multi-master replication is transactional conflict prevention or [[conflict resolution|resolution]] when concurrent modifications occur on different leader nodes.
Database replication becomes more complex when it scales up [[horizontal scalability|horizontally]] and vertically. Horizontal scale-up has more data replicas, while vertical scale-up has data replicas located at greater physical distances. Problems raised by horizontal scale-up can be alleviated by a multi-layer, multi-view access [[network protocol|protocol]]. The early problems of vertical scale-up have largely been addressed by improving Internet [[Reliability (computer networking)|reliability]] and performance.<ref>{{cite web
| url = http://facta.junis.ni.ac.rs/eae/fu2k71/4obradovic.pdf
| title = Measurement of the Achieved Performance Levels of the WEB Applications With Distributed Relational Database
| work = Electronics and Energetics | volume = 20 | number = 1 | page = 31{{ndash}}43
| date = April 2007 | access-date = 30 January 2014
| author1 = Dragan Simic | author2 = Srecko Ristic | author3 = Slobodan Obradovic
| publisher = Facta Universitatis
}}</ref><ref>{{cite web
| url = http://oatao.univ-toulouse.fr/12933/1/Mokadem_12933.pdf
| title = Data Replication Strategies with Performance Objective in Data Grid Systems: A Survey
| work = Internal journal of grid and utility computing | volume = 6 | number = 1 | page = 30{{ndash}}46
| date = December 2014 | access-date = 18 December 2014
| author1 = Mokadem Riad | author2 = Hameurlain Abdelkader
| publisher = Underscience Publisher
}}</ref>
 
In [[multi-master replication]], updates can be submitted to any database node, and then ripple through to other servers. This is often desired but introduces substantially increased costs and complexity which may make it impractical in some situations. The most common challenge that exists in multi-master replication is transactional conflict prevention or [[conflict resolution|resolution]]. Most synchronous (or eager) replication solutions perform conflict prevention, while asynchronous (or lazy) solutions have to perform conflict resolution. For instance, if the same record is changed on two nodes simultaneously, an eager replication system would detect the conflict before confirming the commit and abort one of the transactions. A [[lazy replication]] system would allow both [[database transaction|transactions]] to commit and run a conflict resolution during re-synchronization. Conflict resolution methods can include techniques like last-write-wins, application-specific logic, or merging concurrent updates.<ref>{{cite bookname="kleppmann"/>
When data is replicated between database servers, so that the information remains consistent throughout the database system and users cannot tell or even know which server in the DBMS they are using, the system is said to exhibit replication transparency.
 
However, replication transparency can not always be achieved. When data is replicated in a database, they will be constrained by [[CAP theorem]] or [[PACELC theorem]]. In the NoSQL movement, data consistency is usually sacrificed in exchange for other more desired properties, such as availability (A), partition tolerance (P), etc. Various [[Consistency model|data consistency models]] have also been developed to serve as Service Level Agreement (SLA) between service providers and the users.
 
There are several techniques for replicating data changes between nodes:<ref name="kleppmann"/>
* '''Statement-based replication''': Write requests (such as SQL statements) are logged and transmitted to replicas for execution. This can be problematic with non-deterministic functions or statements having side effects.
* '''Write-ahead log (WAL) shipping''': The storage engine's low-level write-ahead log is replicated, ensuring identical data structures across nodes.
* '''Logical (row-based) replication''': Changes are described at the row level using a dedicated log format, providing greater flexibility and independence from storage engine internals.
 
== Disk storage replication ==