In computer science, the Hopcroft–Karp algorithm is an algorithm that finds maximum cardinality matchings in bipartite graphs. It runs in O(m √n) time in the worst case, where O invokes big O notation, m is the number of edges in the graph, and n is the number of vertices of the graph. In the case of dense graphs time bound becomes O(n5/2).
The algorithm was found by John Hopcroft and Richard Karp (1973). It increases the size of the matching by a sequence of augmenting paths, a standard technique in matching algorithms, but instead of finding just a single augmenting path, the algorithm finds a maximal set of shortest augmenting paths in each iteration. As a result only O(√n) iterations are needed.
Augmenting paths
A vertex that is not the endpoint of an edge in some partial matching M is called a free vertex. The basic concept that the algorithm relies on is that of an augmenting path, a path that starts at a free vertex, ends at a free vertex, and alternates between unmatched and matched edges within the path. If M is a matching of size n, and P is an augmenting path relative to M, then the symmetric difference of the two sets of edges, M ⊕ P, would form a matching with size n + 1. Thus, by finding augmenting paths, an algorithm may increase the size of the matching.
Conversely, suppose that a matching M is not optimal, and let P be the symmetric difference M ⊕ M* where M* is an optimal matching. Then P must form a collection of disjoint cycles and augmenting paths; the difference in size between M and M* is the number of paths in P. Thus, if no augmenting path can be found, an algorithm may safely terminate, since in this case M must be optimal.
An augmenting path in a matching problem is closely related to the augmenting paths arising in maximum flow problems, paths along which one may increase the amount of flow between the terminals of the flow. It is possible to transform the bipartite matching problem into a maximum flow instance, such that the alternating paths of the matching problem become augmenting paths of the flow problem.[1] The Hopcroft–Karp algorithm may therefore be seen as an adaptation of the Edmonds-Karp algorithm for maximum flow. The same technical can be generalized to problems of maximum flow on graphs with unit edge capacities, achieving a running time of O(min(m3/2, n2/3m) for these more general problems.[2]
Algorithm
Let U and V be the two sets in the bipartition of G, and let the matching from U to V at any time be represented as the set M.
The algorithm is run in phases. Each phase consists of the following steps.
- A breadth first search partitions the vertices of the graph into layers. The free vertices in U are used as the starting vertices of this search, and form the first layer of the partition. At the first level of the search, only unmatched edges may be traversed (since U is not adjacent to any matched edges); at subsequent levels of the search, restrict the traversed edges so that they alternate between unmatched and matched. Terminate the search at the first layer k where one or more free vertices in V are reached.
- All free vertices in V at layer k are collected into a set F. That is, a vertex v is put into F if and only if it ends a shortest augmenting path.
- The algorithm finds a maximal set of vertex disjoint augmenting paths of length k. This set may be computed by depth first search from F to the free vertices in U, using the breadth first layering to guide the search: the depth first search is only allowed to follow edges that lead to an unused vertex in the previous layer, and paths in the depth first search tree must alternate between unmatched and matched edges. Once an augmenting path is found that involves one of the vertices in F, the depth first search is continued from the next starting vertex.
- Every one of the paths found in this way is used to enlarge M.
The algorithm terminates when no more augmenting paths are found in the breadth first search part of one of the phases.
Analysis
Each phase consists of a single breadth first search and a single depth first search. Thus, a single phase may be implemented in linear time. Therefore, the first √n phases, in a graph with n vertices and m edges, take time O(m √n).
It can be shown that each phase increases the length of the shortest augmenting path by at least one: the phase finds a maximal set of augmenting paths of the given length, so any remaining augmenting path must be longer. Therefore, once the initial √n phases of the algorithm are complete, the shortest remaining augmenting path has at least √n edges in it. However, the symmetric difference of the eventual optimal matching and of the partial matching M found by the initial phases forms a collection of vertex-disjoint augmenting paths and alternating cycles. If each of the paths in this collection has length at least √n, there can be at most √n paths in the collection, and the size of the optimal matching can differ from the size of M by at most √n edges. Since each phase of the algorithm increases the size of the matching by at least one, there can be at most √n additional phases before the algorithm terminates.
Since the algorithm performs a total of at most 2√n phases, it takes a total time of O(m √n).
Notes
- ^ Ahuja, Magnanti & Orlin (1993), section 12.3, bipartite cardinality matching problem, pp. 469–470.
- ^ Ahuja, Magnanti & Orlin (1993), section 8.2, flows in unit capacity networks, pp. 252–255.
References
- Ahuja, Ravindra K.; Magnanti, Thomas L.; Orlin, James B. (1993), Network Flows: Theory, ALgorithms and Applications, Prentice-Hall.
- Hopcroft, John E.; Karp, Richard M. (1973), "An n5/2 algorithm for maximum matchings in bipartite graphs", SIAM Journal on Computing, 2 (4): 225–231, doi:10.1137/0202019.</ref>