Talk:Three-phase commit protocol: Difference between revisions

Content deleted Content added
No edit summary
Implementing WP:PIQA (Task 26)
 
(6 intermediate revisions by 6 users not shown)
Line 1:
{{WikiProject banner shell|class=Start|
{{WikiProject Computing|importance=Low}}
{{WikiProject Databases|importance=Mid}}
{{WikiProject Software|importance=Low}}
}}
I have removed large parts of the article and corrected the citation of the seminal report by Skeen, which was incorrectly a reference to a follow-up theoretical analysis by Skeen and Stonebraker.
The claims made in the previous version of the Wikipedia article were theoretically unsound, and I wasted quite some time trying to convince myself otherwise, until I finally gave up and read the original papers. It is not possible to place an upper bound on the time it takes to resolve a distributed transaction without violating the basic soundness criterion, as the Wikipedia article previously claimed. This would imply that one could solve the two generals problem in finite time. Indeed, it wasn't hard to find an example of a network partitioning where the timeout-based protocol would cause two cohorts to respectively commit and abort the same transaction. This eliminates the whole point of the protocol to begin with, as it is no better than just sending the transaction to all cohorts and hoping for the best.
 
I haven't replaced the description with a better one. The seminal technical report by Skeen is publicly available and very readable, and I don't think I can describe it any better than him. Note in particular that his description does not involve the use of timeouts at all: it is a [[Quorum_(distributed_computing)|quorum]] based algorithm, and timeouts would be an implementation detail used to detect failures.
[[User:UlrikRasmussen|Ulrik Rasmussen]] ([[User talk:UlrikRasmussen|talk]]) 08:52, 10 October 2019 (UTC)
 
----
The protocol presented on the page at present conforms to the Skeen article which actually differs slightly from the description given at
[http://courses.cs.vt.edu/~cs5204/fall00/distributedDBMS/sreenu/3pc.html]. Specifically, the state transition on the cohort from prepared to committed only happens when receiving a commit message from the coordinator in the original article. Was there a change to the protocol in the meantime?
Line 6 ⟶ 18:
----
i reformatted the protocol description at the bottom of the page to look similiar to [[two-phase_commit]]. hope nobody minds. [[User:Gba|gba]] 18:56, 4 March 2006 (UTC)
 
----
Ah, got 3PC only after Tanenbaum's book description. Both picture and description have principal mistakes.
 
1. Picture: on coordinator's side "Finalizing commit. Timeout causes abort". Coordinator MUST commit, because cohorts are commited.
2. Coordinator's action, item #3: "However if the coordinator times out while waiting for an acknowledgement from a cohort, it will abort the transaction." Also invalid, because after prepared state, whole system have no way back. It could be only commited earlier or later.
 
This base algorithm's idea:
 
1. both coordinator and cohorts are change their phases together and only after all parts are entered previous phase.
 
2. there is "point of no return", after which we cannot roll back transaction, only commit. If someone failed after that point, it will commit transaction later at the restore state.
 
[[User:Shmuma|Shmuma]] ([[User talk:Shmuma|talk]]) 10:48, 23 April 2010 (UTC)
 
== Atomicity reliability ==
Line 46 ⟶ 73:
- It appears that the figure is still inconsistent with the text. In particular, the figure seems to indicate that, for a cohort that has ACK'd a pre-commit but not received a do-commit, a timeout will cause a commit to take place. However the text says "In the prepared state, if the cohort receives an abort message from the coordinator, fails, '''or times out waiting for a commit, it aborts.'''"
[[Special:Contributions/98.212.216.20|98.212.216.20]] ([[User talk:98.212.216.20|talk]]) 18:38, 22 April 2008 (UTC)
 
== modes of failure ==
 
I wanted to use this article as a brief introduction to the kinds of problems that must
be considered in distributed consensus, but was disappointed by the brevity of the
explanation of how this is an improvement over the two-phase commit. I think the
discussion is fine as a definition for those already familiar with the ___domain, but
needs a little more justification for pedagogical use. I will take a shot at this,
and would welcome improvement from anyone.
 
[[User:MarkKampe|MarkKampe]] ([[User talk:MarkKampe|talk]]) 18:55, 13 March 2010 (UTC)
 
== Unreferenced quote ==
 
This passage:
 
"Three-phase commit assumes a network with bounded delay and nodes with bounded response times; In most practical systems with unbounded network delay and process pauses, it cannot guarantee atomicity."
 
is a direct quote from Martin Kleppmann's ''Designing Data-Intensive Applications'', p. 359.
 
I am not a regular contributor, not sure what's the approach to fix it.