Hopcroft–Karp algorithm: Difference between revisions

Content deleted Content added
AnomieBOT (talk | contribs)
m Dating maintenance tags: {{Reference needed}}
Ajmullen (talk | contribs)
m Explanation: Formatting.
 
(44 intermediate revisions by 24 users not shown)
Line 1:
{{Short description|Algorithm for maximum cardinality matching}}
{{Infobox algorithm|image = |data = [[Graph (data structure)|Graph]]|time = <math>O(E \sqrt V)</math>|class = Graph algorithm|space = <math>O(V)</math>}}In [[computer science]], the '''Hopcroft–Karp algorithm''' (sometimes more accurately called the '''Hopcroft–Karp–Karzanov algorithm''')<ref>{{harvtxt|Gabow|2017}}; {{harvtxt|Annamalai|2018}}</ref> is an [[algorithm]] that takes as input a [[bipartite graph]] and produces as output a [[Maximum cardinality matching|maximum cardinality matching]] – a set of as many edges as possible with the property that no two edges share an endpoint. It runs in <math>O(|E|\sqrt{|V|})</math> time in the [[worst case analysis|worst case]], where <math>E</math> is set of edges in the graph, <math>V</math> is set of vertices of the graph, and it is assumed that <math>|E|=\Omega(|V|)</math>. In the case of [[dense graph]]s the time bound becomes <math>O(|V|^{2.5})</math>, and for sparse [[random graph]]s it runs in near-linear (in |E|) time{{reference needed|date=April 2020}}.
{{Infobox algorithm|image = |data = [[Graph (data structure)|Graph]]|time = <math>O(E \sqrt V)</math>|class = Graph algorithm|space = <math>O(V)</math>}}
{{Infobox algorithm|image = |data = [[Graph (data structure)|Graph]]|time = <math>O(E \sqrt V)</math>|class = Graph algorithm|space = <math>O(V)</math>}}In [[computer science]], the '''Hopcroft–Karp algorithm''' (sometimes more accurately called the '''Hopcroft–Karp–Karzanov algorithm''')<ref>{{harvtxt|Gabow|2017}}; {{harvtxt|Annamalai|2018}}</ref> is an [[algorithm]] that takes as input a [[bipartite graph]] as input and produces as output a [[Maximum cardinality matching|maximum -cardinality matching]] as output — a set of as many edges as possible with the property that no two edges share an endpoint. It runs in <math>O(|E|\sqrt{|V|})</math> time in the [[worst case analysis|worst case]], where <math>E</math> is set of edges in the graph, <math>V</math> is set of vertices of the graph, and it is assumed that <math>|E|=\Omega(|V|)</math>. In the case of [[dense graph]]s the time bound becomes <math>O(|V|^{2.5})</math>, and for sparse [[random graph]]s it runs in near-lineartime <math>O(in |E|\log |V|)</math> timewith high probability.{{reference neededsfnp|date=April 2020Bast|Mehlhorn|Schäfer|Tamaki|2006}}.
 
The algorithm was founddiscovered by {{harvs|first1=John|last1=Hopcroft|author1-link=John Hopcroft|first2=Richard|last2=Karp|author2-link=Richard Karp|txt|year=1973}} and independently by {{harvs|first=Alexander|last=Karzanov|authorlink=Alexander V. Karzanov|year=1973|txt}}.{{sfnp|Dinitz|2006}} As in previous methods for matching such as the [[Hungarian algorithm]] and the work of {{harvtxt|Edmonds|1965}}, the Hopcroft–Karp algorithm repeatedly increases the size of a partial matching by finding ''augmenting paths''. These paths are sequences of edges of the graph, which alternate between edges in the matching and edges out of the partial matching, and where the initial and final edge are not in the partial matching. Finding an augmenting path allows us to increment the size of the partial matching, by simply toggling the edges of the augmenting path (putting in the partial matching those that were not, and vice versa). Simpler algorithms for bipartite matching, such as the [[Ford–Fulkerson algorithm]]‚ find one augmenting path per iteration: the HopkroftHopcroft-Karp algorithm instead finds a maximal set of shortest augmenting paths, so as to ensure that only <math>O(\sqrt{|V|})</math> iterations are needed instead of <math>O(|V|)</math> iterations. The same performance of <math>O(|E|\sqrt{|V|})</math> can be achieved to find maximum -cardinality matchings in arbitrary graphs, with the more complicated algorithm of Micali and Vazirani<ref>.{{Cite journalsfnp|last=Peterson|first=Paul A.|last2=Loui|first2=Michael C.|date=1988-11-01|title=The general maximum matching algorithm of micali and vazirani|url=https://doi.org/10.1007/BF01762129|journal=Algorithmica|language=en|volume=3|issue=1|pages=511–533|doi=10.1007/BF01762129|issn=1432-0541}}</ref>.
 
The Hopcroft–Karp algorithm can be seen as a special case of [[Dinic's algorithm]] for the [[maximum-flow problem]].{{sfnp|Tarjan|1983|p=102}}
 
==Augmenting paths==
Line 10 ⟶ 14:
Conversely, suppose that a matching <math>M</math> is not optimal, and let <math>P</math> be the symmetric difference <math>M \oplus M^*</math> where <math>M^*</math> is an optimal matching. Because <math>M</math> and <math>M^*</math> are both matchings, every vertex has degree at most 2 in <math>P</math>. So <math>P</math> must form a collection of disjoint cycles, of paths with an equal number of matched and unmatched edges in <math>M</math>, of augmenting paths for <math>M</math>, and of augmenting paths for <math>M^*</math>; but the latter is impossible because <math>M^*</math> is optimal. Now, the cycles and the paths with equal numbers of matched and unmatched vertices do not contribute to the difference in size between <math>M</math> and <math>M^*</math>, so this difference is equal to the number of augmenting paths for <math>M</math> in <math>P</math>. Thus, whenever there exists a matching <math>M^*</math> larger than the current matching <math>M</math>, there must also exist an augmenting path. If no augmenting path can be found, an algorithm may safely terminate, since in this case <math>M</math> must be optimal.
 
An augmenting path in a matching problem is closely related to the [[augmenting path]]s arising in [[maximum flow problem]]s, paths along which one may increase the amount of flow between the terminals of the flow. It is possible to transform the bipartite matching problem into a maximum flow instance, such that the alternating paths of the matching problem become augmenting paths of the flow problem. It suffices to insert two vertices, source and sink, and insert edges of unit capacity from the source to each vertex in <math>U</math>, and from each vertex in <math>V</math> to the sink; and let edges from <math>U</math> to <math>V</math> have unit capacity.<ref>{{harvtxt|Ahuja|Magnanti|Orlin|1993}}, section 12.3, bipartite cardinality matching problem, pp. 469–470.</ref> In fact, aA generalization of the technique used in Hopcroft–Karp algorithm to arbitraryfind maximum flow networksin an arbitrary network is known as [[Dinic's algorithm]].
 
==Algorithm==
Line 27 ⟶ 31:
* A [[breadth-first search]] partitions the vertices of the graph into layers. The free vertices in <math>U</math> are used as the starting vertices of this search and form the first layer of the partitioning. At the first level of the search, there are only unmatched edges, since the free vertices in <math>U</math> are by definition not adjacent to any matched edges. At subsequent levels of the search, the traversed edges are required to alternate between matched and unmatched. That is, when searching for successors from a vertex in <math>U</math>, only unmatched edges may be traversed, while from a vertex in <math>V</math> only matched edges may be traversed. The search terminates at the first layer <math>k</math> where one or more free vertices in <math>V</math> are reached.
* All free vertices in <math>V</math> at layer <math>k</math> are collected into a set <math>F</math>. That is, a vertex <math>v</math> is put into <math>F</math> if and only if it ends a shortest augmenting path.
* The algorithm finds a maximal set of ''vertex disjoint'' augmenting paths of length <math>k</math>. (''Maximal'' means that no more such paths can be added. This is different from finding the ''maximum'' number of such paths, which would be harder to do. Fortunately, it is sufficient here to find a maximal set of paths.) This set may be computed by [[depth -first search]] (DFS) from <math>F</math> to the free vertices in <math>U</math>, using the breadth first layering to guide the search: the DFS is only allowed to follow edges that lead to an unused vertex in the previous layer, and paths in the DFS tree must alternate between matched and unmatched edges. Once an augmenting path is found that involves one of the vertices in <math>F</math>, the DFS is continued from the next starting vertex. Any vertex encountered during the DFS can immediately be marked as used, since if there is no path from it to <math>U</math> at the current point in the DFS, then that vertex can't be used to reach <math>U</math> at any other point in the DFS. This ensures <math>O(|E|)</math> running time for the DFS. It is also possible to work in the other direction, from free vertices in <math>U</math> to those in <math>V</math>, which is the variant used in the pseudocode.
* Every one of the paths found in this way is used to enlarge <math>M</math>.
 
Line 33 ⟶ 37:
 
==Analysis==
Each phase consists of a single breadth first search and a single depth -first search. Thus, a single phase may be implemented in <math>O(|E|)</math> time.
Therefore, the first <math>\sqrt{|V|}</math> phases, in a graph with <math>|V|</math> vertices and <math>|E|</math> edges, take time <math>O(|E|\sqrt{|V|})</math>.
 
Line 40 ⟶ 44:
Since the algorithm performs a total of at most <math>2\sqrt{|V|}</math> phases, it takes a total time of <math>O(|E|\sqrt{|V|})</math> in the worst case.
 
In many instances, however, the time taken by the algorithm may be even faster than this worst case analysis indicates. For instance, in the [[average case analysis|average case]] for [[sparse graph|sparse]] bipartite [[random graph]]s, {{harvtxt|Bast|Mehlhorn|SchaferSchäfer|Tamaki|2006}} (improving a previous result of {{harvnb|Motwani|1994}}) showed that with high probability all non-optimal matchings have augmenting paths of [[logarithm]]ic length. As a consequence, for these graphs, the Hopcroft–Karp algorithm takes <math>O(\log |V|)</math> phases and <math>O(|E| \log |V|)</math> total time.
 
==Comparison with other bipartite matching algorithms==
Line 54 ⟶ 58:
/*
G = U ∪ V ∪ {NIL}
where U and V are partitionthe left and right sides of the bipartite graph and NIL is a special null vertex
*/
Line 102 ⟶ 106:
 
=== Explanation ===
Let the vertices of our graph havebe twopartitioned partitionsin <code>U</code> and <code>V</code>, and consider a partial matching, as indicated by the <code>Pair_U</code> and <code>Pair_V</code> tables that contain the one vertex to which each vertex of <code>U</code> and of <code>V</code> is matched, or <code>NIL</code> for unmatched vertices. The key idea is to add two dummy vertices on each side of the graph: uDummy connecting itconnected to all unmatched vertices in <code>U</code> and vDummy connectingconnected to all unmatched vertices in <code>V</code>. Now, if we run a [[breadth-first search]] (BFS) from uDummy to vDummy then we can get shortestthe pathpaths betweenof anminimal length that connect currently unmatched vertexvertices in <code>U</code> to currently unmatched vertexvertices in <code>V</code>. DueNote tothat, bipartite nature ofas the graph is bipartite, thisthese pathpaths wouldalways zigalternate zagbetween fromvertices in <code>U</code> toand vertices in <code>V.</code>, Howeverand we needrequire toin makeour sureBFS that when going from <code>V</code> to <code>U</code>, we always select a matched edge. If therewe isreach noan matchedunmatched edgevertex of <code>V</code>, then we end at vDummy. Ifand wethe makesearch surefor ofpaths thisin criteriathe duringBFS terminate. To summarize, the BFS starts at unmatched vertices in <code>U</code>, goes to all their neighbors in <code>V</code>, if all are matched then it goes back to the generatedvertices pathin would<code>U</code> meetto which all these vertices are matched (and which were not visited before), then it goes to all the criterianeighbors forof beingthese anvertices, augmentedetc., shortestuntil pathone of the vertices reached in <code>V</code> is unmatched.
 
Observe in particular that BFS marks the unmatched nodes of <code>U</code> with distance 0, then increments the distance every time it comes back to <code>U</code>. This guarantees that the paths considered in the BFS are of minimal length to connect unmatched vertices of <code>U</code> to unmatched vertices of <code>V</code> while always going back from <code>V</code> to <code>U</code> on edges that are currently part of the matching. In particular, the special <code>NIL</code> vertex, which corresponds to vDummy, then gets assigned a finite distance, so the BFS function returns true iff some path has been found. If no path has been found, then there are no augmenting paths left and the matching is maximal.
 
OnceIf weBFS havereturns foundtrue, thethen augmentedwe shortestcan path,go weahead wantand toupdate makethe surepairing wefor ignorevertices anyon otherthe minimal-length paths thatfound arefrom longer<code>U</code> thanto this<code>V</code>: shortestwe paths.do BFSso algorithmusing marksa nodes[[depth-first insearch]] path(DFS). withNote distancethat witheach sourcevertex beingin 0.<code>V</code> Thus,on aftersuch doinga BFSpath, weexcept canfor startthe atlast eachone, unmatchedis vertexcurrently inmatched. U,So followwe thecan pathexplore bywith followingthe nodesDFS, withmaking distancesure that incrementsthe bypaths 1. Whenthat we finallyfollow arrivecorrespond atto the destinationdistances vDummy,computed ifin itsthe distanceBFS. isWe 1update morealong thanevery lastsuch nodepath inby Vremoving then we know thatfrom the pathmatching weall followed is (oneedges of the possibly many) shortest path. In that caseare wecurrently canin gothe aheadmatching, and updateadding to the pairingmatching forall verticesedges onof the path inthat Uare andcurrently V.not Notein thatthe eachmatching: vertexas inthis Vis onan augmenting path, except(the forfirst theand last one,edges isof non-freethe vertex.path Sowere updatingnot pairingpart forof thesethe verticesmatching, inand Vthe topath differentalternated verticesbetween inmatched Uand isunmatched equivalentedges), tothen removingthis previouslyincreases matchedthe edgenumber andof addingedges newin unmatched edge inthe matching. This is same as doingreplacing the symmetriccurrent differencematching (i.e.by removethe edgessymmetric commondifference tobetween the previouscurrent matching and addthe non-common edges in augmentedentire path in new matching)..
 
HowNote dothat wethe makecode sureensures augmentedthat all augmenting paths that we consider are vertex disjoint?. It isIndeed, already guaranteed: Afterafter doing the symmetric difference for a path, none of its vertices could be considered again in the DFS, just because the <code>Dist[ Pair_V[v] ]</code> will not be equal to <code>Dist[u] + 1</code> (Itit would be exactly <code>Dist[u]</code>).
 
Also observe that the DFS does not visit the same vertex multiple times. This is thanks to the following lines:
So what is the mission of these two lines in pseudocode?:
Dist[u] = ∞
return false
When we were not able to find any shortest augmentedaugmenting path from a vertex <code>u</code>, then the DFS returnsmarks false.vertex In<code>u</code> thisby casesetting it<code>Dist[u]</code> wouldto beinfinity, goodso to markthat these vertices toare not to visit themvisited again. This marking is simply done by setting Dist[u] to infinity.
 
Finally,One last observation is that we actually don't need uDummy: becauseits it'srole thereis justsimply to put all unmatched vertices of <code>U</code> in the queue when BFS starts. That we canstart dothe as just as initializationBFS. The vDummy can be appended in UAs for convenience in many implementations and initialize default pairing for all V to point to vDummy. That way, ifit finalis vertexdenoted inas V doesn't have any matching vertex<code>NIL</code> in U then we finally end at vDummy which is the end of our augmented path. Inpseudocode above pseudocode vDummy is denoted as Nil.
 
== See also ==
* [[Maximum cardinality matching]], the problem solved by the algorithm, and its generalization to non-bipartite graphs
* [[Bipartite matching]]
* [[Assignment problem]], a generalization of this problem on [[weighted graphs]], solved e.g. by the [[Hungarian algorithm]]
* [[Hungarian algorithm]]
* [[Edmonds–Karp algorithm]] for finding maximum flow, a generalization of the Hopcroft–Karp algorithm
* [[Assignment problem]]
* [[Edmonds–Karp algorithm]] for finding maximum flow
 
==Notes==
Line 125 ⟶ 130:
 
==References==
{{refbegin}}
*{{citation|first1=Ravindra K.|last1=Ahuja|author1-link=Ravindra K. Ahuja|first2=Thomas L.|last2=Magnanti|author2-link=Thomas L. Magnanti|first3=James B.|last3=Orlin|author3-link=James B. Orlin|title=Network Flows: Theory, Algorithms and Applications|publisher=Prentice-Hall|year=1993}}.
*{{citation|first1=H.|last1=Alt|first2=N.|last2=Blum|first3=K.|last3=Mehlhorn|author3-link=Kurt Mehlhorn|first4=M.|last4=Paul|title=Computing a maximum cardinality matching in a bipartite graph in time <math>\scriptstyle O\left(n^{1.5}\sqrt{\frac{m}{\log n}}\right)</math>|journal=Information Processing Letters|volume=37|issue=4|pages=237–240|year=1991|doi=10.1016/0020-0190(91)90195-N}}.
*{{citation|last=Annamalai|first=Chidambaram|doi=10.1007/s00493-017-3567-2|issue=6|journal=Combinatorica|mr=3910876|pages=1285–1307|title=Finding perfect matchings in bipartite hypergraphs|volume=38|year=2018|arxiv=1509.07007|s2cid=1997334}}
*{{citation
*{{citation|first1=Holger|last1=Bast|author1-link=Hannah Bast|first2=Kurt|last2=Mehlhorn|author2-link=Kurt Mehlhorn|first3=Guido|last3=Schafer|first4=Hisao|last4=Tamaki|year=2006|title= Matching algorithms are fast in sparse random graphs|journal=Theory of Computing Systems|pages=3–14|volume=39|issue=1|doi=10.1007/s00224-005-1254-y}}.
| last1 = Bast | first1 = Holger
| last2 = Mehlhorn | first2 = Kurt
| last3 = Schäfer | first3 = Guido
| last4 = Tamaki | first4 = Hisao
| doi = 10.1007/s00224-005-1254-y
| issue = 1
| journal = Theory of Computing Systems
| mr = 2189556
| pages = 3–14
| title = Matching algorithms are fast in sparse random graphs
| volume = 39
| year = 2006| s2cid = 9321036
| citeseerx = 10.1.1.395.6643
}}
*{{citation|first1=S. Frank|last1=Chang|first2=S. Thomas|last2=McCormick|title=A faster implementation of a bipartite cardinality matching algorithm|publisher=Tech. Rep. 90-MSC-005, Faculty of Commerce and Business Administration, Univ. of British Columbia|year=1990}}. As cited by {{harvtxt|Setubal|1996}}.
*{{citation|first=Kenneth|last=Darby-Dowman|title=The exploitation of sparsity in large scale linear programming problems – Data structures and restructuring algorithms|publisher=Ph.D. thesis, Brunel University |year=1980}}. As cited by {{harvtxt|Setubal|1996}}.
*{{citation|last=Dinitz|first=Yefim|editor1-last=Goldreich|editor1-first=Oded|editor1-link=Oded Goldreich |editor2-last=Rosenberg|editor2-first=Arnold L.|editor2-link=Arnold L. Rosenberg|editor3-last=Selman |editor3-first=Alan L. |editor3-link=Alan Selman|contribution=Dinitz' Algorithm: The Original Version and Even's Version|url=https://www.cs.bgu.ac.il/~dinitz/Papers/Dinitz_alg.pdf|doi=10.1007/11685654_10|___location=Berlin and Heidelberg|pages=218–240|publisher=Springer |series=Lecture Notes in Computer Science |title=Theoretical Computer Science: Essays in Memory of Shimon Even|volume=3895|year=2006|isbn=978-3-540-32880-3 }}.
*{{citation | doi = 10.4153/CJM-1965-045-4 | last = Edmonds | first = Jack | author-link = Jack Edmonds | journal = Canadian Journal of Mathematics | pages = 449–467 | title = Paths, Trees and Flowers | volume = 17 | year = 1965 | mr=0177907|s2cid=18909734 |doi-access=free 0177907}}.
*{{citation|last=Gabow|first=Harold N.|author-link=Harold N. Gabow|doi=10.3233/FI-2017-1555|issue=1–4|journal=Fundamenta Informaticae|mr=3690573|pages=109–130|title=The weighted matching approach to maximum cardinality matching|volume=154|year=2017|arxiv=1703.03998|s2cid=386509}}
*{{citation|first1=Harold N.|last1=Gabow|author1-link=Harold N. Gabow|first2=Robert E.|last2=Tarjan|author2-link=Robert Tarjan|title=Faster scaling algorithms for general graph matching problems|journal=Journal of the ACM|volume=38|issue=4|year=1991|pages=815–853|doi=10.1145/115234.115366|s2cid=18350108|doi-access=free}}.
*{{citation|first1=John E.|last1=Hopcroft|author1-link=John Hopcroft|first2=Richard M.|last2=Karp|author2-link=Richard Karp|title=An ''n''<sup>5/2</sup> algorithm for maximum matchings in bipartite graphs|journal=SIAM Journal on Computing|volume=2|issue=4|pages=225–231|year=1973|doi=10.1137/0202019}}. Previously announced at the 12th Annual Symposium on Switching and Automata Theory, 1971.
*{{citation|first=A. V.|last=Karzanov|authorlink=Alexander V. Karzanov|title=An exact estimate of an algorithm for fndingfinding a maximum flow, applied to the problem on representatives|journal=Problems in Cybernetics |volume=5 |pages=66–70|year=1973}}. Previously announced at the Seminar on Combinatorial Mathematics (Moscow, 1971).
*{{citation | last1 = Micali | first1 = S. | author1-link = Silvio Micali | last2 = Vazirani | first2 = V. V. | author2-link = Vijay Vazirani | contribution = An <math>\scriptstyle O(\sqrt{|V|}\cdot|E|)</math> algorithm for finding maximum matching in general graphs | doi = 10.1109/SFCS.1980.12 | pages = 17–27 | title = Proc. 21st IEEE Symp. Foundations of Computer Science | year = 1980| title-link = Symposium on Foundations of Computer Science |s2cid=27467816}}.
*{{citation | last1 = Peterson | first1 = Paul A. | last2 = Loui | first2 = Michael C. | title= The general maximum matching algorithm of Micali and Vazirani | doi = 10.1007/BF01762129 | pages = 511–533 | journal= [[Algorithmica]] | year date=November 1988 | volume=3|issue=1–4| citeseerx = 10.1.1.228.9625 |s2cid=16820 |issn=1432-0541}}.
*{{citation|first=Rajeev|last=Motwani|authorlink=Rajeev Motwani|title=Average-case analysis of algorithms for matchings and related problems|year=1994|journal=Journal of the ACM|volume=41|issue=6|pages=1329–1356|doi=10.1145/195613.195663|s2cid=2968208|doi-access=free}}.
*{{citation|last=Setubal|first=João C.|contribution=New experimental results for bipartite matching |title=Proc. Netflow93|publisher=Dept. of Informatics, Univ. of Pisa|pages=211–216|year=1993}}. As cited by {{harvtxt|Setubal|1996}}.
*{{citation|last=Setubal|first=João C.|title=Sequential and parallel experimental results with bipartite matching algorithms|publisher=Tech. Rep. IC-96-09, Inst. of Computing, Univ. of Campinas|year=1996 |citeseerx=10.1.1.48.3539}}.
*{{Cite book|last=Tarjan|first=Robert Endre|title=Data Structures and Network Algorithms|year=1983 |publisher=Society for Industrial and Applied Mathematics|isbn=978-0-89871-187-5|series=CBMS-NSF Regional Conference Series in Applied Mathematics|doi=10.1137/1.9781611970265}}
*{{citation|last=Vazirani|first=Vijay|title=An Improved Definition of Blossoms and a Simpler Proof of the MV Matching Algorithm|publisher= CoRR abs/1210.4594|year=2012|arxiv=1210.4594|bibcode=2012arXiv1210.4594V}}.
{{refend}}
 
{{DEFAULTSORT:Hopcroft-Karp algorithm}}
[[Category:Graph algorithms]]
[[Category:Matching (graph theory)]]