Talk:Three-phase commit protocol: Difference between revisions

Content deleted Content added
m Figure: Fixed my signature.
Implementing WP:PIQA (Task 26)
 
(9 intermediate revisions by 9 users not shown)
Line 1:
{{WikiProject banner shell|class=Start|
{{WikiProject Computing|importance=Low}}
{{WikiProject Databases|importance=Mid}}
{{WikiProject Software|importance=Low}}
}}
I have removed large parts of the article and corrected the citation of the seminal report by Skeen, which was incorrectly a reference to a follow-up theoretical analysis by Skeen and Stonebraker.
The claims made in the previous version of the Wikipedia article were theoretically unsound, and I wasted quite some time trying to convince myself otherwise, until I finally gave up and read the original papers. It is not possible to place an upper bound on the time it takes to resolve a distributed transaction without violating the basic soundness criterion, as the Wikipedia article previously claimed. This would imply that one could solve the two generals problem in finite time. Indeed, it wasn't hard to find an example of a network partitioning where the timeout-based protocol would cause two cohorts to respectively commit and abort the same transaction. This eliminates the whole point of the protocol to begin with, as it is no better than just sending the transaction to all cohorts and hoping for the best.
 
I haven't replaced the description with a better one. The seminal technical report by Skeen is publicly available and very readable, and I don't think I can describe it any better than him. Note in particular that his description does not involve the use of timeouts at all: it is a [[Quorum_(distributed_computing)|quorum]] based algorithm, and timeouts would be an implementation detail used to detect failures.
[[User:UlrikRasmussen|Ulrik Rasmussen]] ([[User talk:UlrikRasmussen|talk]]) 08:52, 10 October 2019 (UTC)
 
----
The protocol presented on the page at present conforms to the Skeen article which actually differs slightly from the description given at
[http://courses.cs.vt.edu/~cs5204/fall00/distributedDBMS/sreenu/3pc.html]. Specifically, the state transition on the cohort from prepared to committed only happens when receiving a commit message from the coordinator in the original article. Was there a change to the protocol in the meantime?
 
Agreed, and it's definitely not a slight discrepancy; the description about the cohort states matches neither the diagram shown or the state diagram in the source material. I'd rather someone more knowledgeable about the subject matter commits a change though. --[[Special:Contributions/130.15.80.105|130.15.80.105]] ([[User talk:130.15.80.105|talk]]) 16:07, 31 March 2009 (UTC)
 
----
i reformatted the protocol description at the bottom of the page to look similiar to [[two-phase_commit]]. hope nobody minds. [[User:Gba|gba]] 18:56, 4 March 2006 (UTC)
 
----
Ah, got 3PC only after Tanenbaum's book description. Both picture and description have principal mistakes.
 
1. Picture: on coordinator's side "Finalizing commit. Timeout causes abort". Coordinator MUST commit, because cohorts are commited.
2. Coordinator's action, item #3: "However if the coordinator times out while waiting for an acknowledgement from a cohort, it will abort the transaction." Also invalid, because after prepared state, whole system have no way back. It could be only commited earlier or later.
 
This base algorithm's idea:
 
1. both coordinator and cohorts are change their phases together and only after all parts are entered previous phase.
 
2. there is "point of no return", after which we cannot roll back transaction, only commit. If someone failed after that point, it will commit transaction later at the restore state.
 
[[User:Shmuma|Shmuma]] ([[User talk:Shmuma|talk]]) 10:48, 23 April 2010 (UTC)
 
== Atomicity reliability ==
Line 26 ⟶ 55:
Thanks,
[[User:Nbeckman|Nels Beckman]] 14:28, 7 September 2006 (UTC)
 
Hello again after a long time!
First of all, I was speaking about the 3PC, as described by the state automata on the
[http://ei.cs.vt.edu/~cs5204/fall99/distributedDBMS/sreenu/3pc.html]. Notice the P1 (coordinator), and the Pi (cohort) states. For one the timeout leads to abort and for the other it leads to commit. Isn't that wrong?
I didn't study your link, but this confuses me: "By the time any cohort can send an ACK, it has already been decided by the coordinator whether the transaction is committing or aborting" - if the coordinator has decided, then what is the need for further ACKs or NACKs?
I believe the timeouts necessarily introduce a degree of uncertainty whether both will actually abort or commit.
 
Regards,
[[User:Igorecan|Igorecan]] ([[User talk:Igorecan|talk]]) 00:20, 15 November 2008 (UTC)
 
== Figure ==
Line 32 ⟶ 70:
:Ah, good point. I'll update it sometime within the next few days. If I forget, feel free to send me an email to remind me. --[[User:Tjohns|Tjohns]] [[User_talk:Tjohns | ✎]] 18:46, 11 March 2008 (UTC)
::{{confirmed|Fixed.}} Feel free to let me know if any other changes need to be made. [[User:Tjohns|Tjohns]] [[User_talk:Tjohns | ✎]] 06:49, 21 March 2008 (UTC)
 
- It appears that the figure is still inconsistent with the text. In particular, the figure seems to indicate that, for a cohort that has ACK'd a pre-commit but not received a do-commit, a timeout will cause a commit to take place. However the text says "In the prepared state, if the cohort receives an abort message from the coordinator, fails, '''or times out waiting for a commit, it aborts.'''"
[[Special:Contributions/98.212.216.20|98.212.216.20]] ([[User talk:98.212.216.20|talk]]) 18:38, 22 April 2008 (UTC)
 
== modes of failure ==
 
I wanted to use this article as a brief introduction to the kinds of problems that must
be considered in distributed consensus, but was disappointed by the brevity of the
explanation of how this is an improvement over the two-phase commit. I think the
discussion is fine as a definition for those already familiar with the ___domain, but
needs a little more justification for pedagogical use. I will take a shot at this,
and would welcome improvement from anyone.
 
[[User:MarkKampe|MarkKampe]] ([[User talk:MarkKampe|talk]]) 18:55, 13 March 2010 (UTC)
 
== Unreferenced quote ==
 
This passage:
 
"Three-phase commit assumes a network with bounded delay and nodes with bounded response times; In most practical systems with unbounded network delay and process pauses, it cannot guarantee atomicity."
 
is a direct quote from Martin Kleppmann's ''Designing Data-Intensive Applications'', p. 359.
 
I am not a regular contributor, not sure what's the approach to fix it.