Kuhn's algorithm for finding the greatest matching in a bipartite graph
Given bipartite graph containing nodes and edges. Required to find the maximum matching, ie, choose as many edges to the selected edge, none had a common vertex with any other selected edge.
Description of the algorithm
Necessary definitions
A matching is a set of pairwise non-adjacent edges of the graph (in other words, every vertex of the graph must be incident to no more than one of a plurality of ribs ). Power matching will call the number of edges in it. The greatest (or maximum) will be called a matching matching, whose power is maximum among all possible matchings in the
graph. All those vertices that have adjacent edges of the matching (ie, that have exactly one degree in the subgraph formed ), we call this a matching saturated.
Chain length will call some simple way (ie not containing repeating vertices or edges) containing the edges.
Alternating chain (in the bipartite graph, with respect to a matching) will be called a chain in which the edges are alternately belong / do not belong to matching.
Increasing the chain (in the bipartite graph, with respect to a matching) will be called an alternating chain, whose initial and final vertices do not belong to matching.
Theorem Berge
Wording . Matching the maximum if and only if there is no enhancing circuits relative thereto.
The proof of necessity . We show that if matching is maximal, then there is increased relative to a chain. Proof of this is constructive: we show you how to increase with increasing the chain using this power matching unit.
To do this, perform the so-called matching alternation along the chain . We recall that, by definition, the first edge chain does not belong to a matching, the second - belongs to the third - again does not belong to the fourth - owned, etc. Let's change the status of all the edges along the chain : those edges that are not included in the matching (the first, third and so on until the last) will include a matching and edges that were previously included in the matching (the second, fourth, and so on up to penultimate) - Removed from him.
It is understood that the power matching with increased by one (because it was added at one edge longer than the deleted). It remains to verify that we have built a correct matching, ie, that no vertex has no right of two adjacent edges of this matching. For all vertices
alternating chain , except the first and last, it follows from the very interleaving algorithm:
first, we have removed the top of each of these adjacent edges, then added. For the first and last vertex chain and nothing could be broken as to interleave they were to be
unsaturated. Finally, for all the other vertices, - not in the chain - obviously nothing has changed. Thus, we really built a matching, and per unit of greater power than the old one, which completes the proof of the necessity.
The proof of sufficiency . We prove that if a relatively matchings not increase the ways it - as much as possible.
The proof is by contradiction. Suppose there is a matching having more power than . Consider the symmetric difference of these two matchings, ie leave all edges belonging to , or in , but not both simultaneously.
Clearly, the set of edges - is certainly not matching. Consider what kind of a set of edges is; For convenience, we consider it as a graph. In this graph, each vertex obviously has degree 2 (because each node can have a maximum of two adjacent edges - one matching and from another). It is easy to understand that while this graph consists only of cycles or paths, with neither one nor the other does not intersect with each other.
Now, note that the path in this graph may not be any, but only of even length. In fact, in any path in the graph edges alternate: after the ribs of a rib comes from , and vice versa. Now, if we look at some way of odd length in the graph , it turns out that in the original graph it will increase the chain or for matching or for . But this could not be, because in the case of matching it contradicts with the condition, and in the case - with its maximum (as we have already proved the necessity of the theorem, which implies that the existence of increasing the matching circuit can not be maximized).
We now prove a similar assertion for cycles all cycles in the graph may have only chёtnuyu length. It is very easy to prove: it is clear that in the cycle edges should also alternate (owned by turns , then ), but this condition can not be executed in a cycle of odd length - in it there are certainly two adjacent edges of the matching one that contradicts the definition of matching.
Thus, all the way and the cycles of the graph are chёtnuyu
length. Therefore, the graph contains an equal number of edges of and from . But considering that contains all the edges and , except for their common edges, it follows that the power and the same. We have a contradiction: on the assumption matching was not maximal, then the theorem is proved.
Algorithm Kuhn
Kuhn's algorithm - a direct application of Theorem Berge. It can be briefly described as follows: first, take an empty matching, and then - until the graph fails to find a magnifying chain - will perform striping matching along the chain, and repeat the process of searching for increasing the chain. Once such a chain could not be found - the process stops - the current matching is maximum.
It remains to detail the method of finding increasing chains. The algorithm Kuhn - just looking for any of these circuits by means of bypass in depth or width . Kuhn's algorithm looks at all the vertices one by one, starting from each round, trying to find a magnifying circuit, starting at this vertex.
More convenient to describe this algorithm, assuming that the graph is already divided into two parts (in fact the algorithm can be implemented and so that he was not given to the input graph is clearly divided into two parts).
The algorithm looks at all the vertices of the first part of the graph: . If the current node is already saturated with a matching current (ie already selected some edge adjacent to it), then skip this vertex. Otherwise - the algorithm is trying to saturate this summit, which starts search for increasing the chain, starting from this node.
Search magnifying circuit by means of a special bypass in depth or width (usually for ease of implementation, use is bypassing depth). Originally bypass in depth is in the unsaturated
current top of the first part. View all the edges of this vertex, let the current edge - this edge . If the top is not saturated with a matching, it means that we could find magnifying circuit: it consists of a single edge ; in that case, just include it in the edge matching and stop increasing the search from the top of the chain . Otherwise - if already filled with some edge , then try to pass along this edge: thus we will try to find a magnifying circuit passing through the ribs , . To do this, simply move on to our crawl to the top - now we are trying to find a magnifying chain of this vertex.
One can understand that as a result of this tour, started from the top , or find magnifying circuit, and thus saturate the top , or as a magnifying circuit will not find (and, therefore, the vertex will not be able to become rich).
After all vertices will be reviewed, current matching is maximal.
Operation time
Thus, the algorithm can be represented as Kuhn series of launches bypass in depth / width over the entire graph. Consequently, all this algorithm is executed during that in the worst case there .
However, this estimate may be a bit better . It turns out that the algorithm Kuhn importantly, what proportion is selected for the first, and which - for the second. In fact, in the above implementation of starting a crawl depth / width of the peaks occur only the first part, so the whole algorithm is executed in a time where - the number of vertices of the first part. In the worst case it is (where - the number of vertices of the second part). This shows that it is cheaper, when the first share contains fewer vertices than the second. Very unbalanced graphs (when and very different), this translates into a significant difference since the work.
Implementation
We give here the implementation of the above algorithm, based on the bypass in depth, and the host in the form of a bipartite graph clearly broken into two parts of the graph. This implementation is very concise, and perhaps it is worth remembering in this form.
Here - the number of vertices in the first part, - in the second part, - a list of edges from the top of the first part (ie, a list of numbers of vertices, in which lead from these edges ). Tops in both lobes are numbered independently, ie, first share - with the numbers , the second - with the numbers .
Then there are two auxiliary array: and . First - - contains information about the current matching. For programming convenience, this information is contained only for the vertices of the second part: - is the number of vertices of the first part, connected by an edge with the top of the second part (or , if no matching edges of the leaves are not). The second array - - the usual array of "visited" vertices crawled deep (you need it, just to bypass the depth did not come at the same vertex twice).
Function - is bypassing the deep. It returns if she managed to find a magnifying chain from the top , it shall be deemed that this function has already made alternation matchings found along the chain.
Inside the function to view all edges emanating from the vertex of the first part, and then checked if this edge leads to the top of unsaturated , or if this node is full, but manages to find a magnifying circuit recursive run out , then we say that we have found a magnifying circuit, and before returning from the function with the result of producing alternating current in the edge: redirect edge adjacent to , at the top .
In the main program first indicates that the current matching - empty (the list is filled with numbers ). Then moved the top of the first part, and it starts from the bypass in
depth , pre-zeroing array .
It is worth noting that the size of the matching is easy to get the number of calls
in the main program, who returned result . Needless desired maximal matching in the array .
int n, k;
vector < vector<int> > g;
vector<int> mt;
vector<char> used;
bool try_kuhn (int v) {
if (used[v]) return false;
used[v] = true;
for (size_t i=0; i<g[v].size(); ++i) { int to = g[v][i];
if (mt[to] == -1 || try_kuhn (mt[to])) { mt[to] = v;
return true;
} }return false;
}
int main() {
... чтение графа ...
mt.assign (k, -1);
for (int v=0; v<n; ++v) { used.assign (n, false);
try_kuhn (v);
}
for (int i=0; i<k; ++i) if (mt[i] != -1)
printf ("%d %d\n", mt[i]+1, i+1);
}
Once again, that Kuhn's algorithm is easy to implement and so he worked on graphs, for which we know that they are bipartite, but clear their division into two parts found. In this
case, will have to abandon the easy division into two parts, and all the information stored for all vertices. For this array lists now specifies not only for the vertices of the first part, and for all vertices (of course, now the top of both lobes are numbered in total numbering - from up ). Arrays and now also determined for the vertices of both lobes and accordingly, they must be maintained in this state.
Improved implementation
We modify the algorithm is as follows. Before the main loop of the algorithm will find some simple algorithm arbitrary matching (a simple heuristic algorithm ), and only then will perform a series of function calls kuhn (), which will improve this matching. As a result, the algorithm will run much faster on random graphs - because in most graphs can easily dial matching sufficiently large weight by heuristics, and then improve the matching found to have a maximum conventional algorithm Kuhn.Thus, we save on starting a crawl depth of the vertices that we have already included using heuristics in the current matching.
For example , you can simply iterate over all the vertices of the first part, and for each of them to find any edge that can be added to matching, and add it. Even such a simple heuristic algorithm can accelerate the Kuna several times.
It should be noted that the main loop will have to be modified slightly. Since the function is called in the main loop is assumed that the current node is not included in the matching, then you need to add the appropriate test.
In implementing change only the code in the function main ():
int main() {
... чтение графа ...
mt.assign (k, -1);
vector<char> used1 (n);
for (int i=0; i<n; ++i)
for (size_t j=0; j<g[i].size(); ++j) if (mt[g[i][j]] == -1) {
mt[g[i][j]] = i;
used1[i] = true;
break;
}
for (int i=0; i<n; ++i) {
if (used1[i]) continue;
used.assign (n, false);
try_kuhn (i);
}
for (int i=0; i<k; ++i) if (mt[i] != -1)
printf ("%d %d\n", mt[i]+1, i+1);
}
Another good heuristic is as follows. At each step will be to look for the top of the least likely (but not isolated), select it from any edge, and add it to the matching, then remove both these vertices with all edges incident to them from the graph. Such greed works very well on
random graphs, even in most cases builds maximal matching (albeit against her have a test in which it finds a matching significantly lower value than the maximum).
Checking on the graph bipartition and splitting into two parts
Suppose we are given an undirected graph. Need to check whether it is bipartite, ie whether it is possible to divide the vertices into two parts so that there is no edges connecting two vertices of one share. If the graph is bipartite, then bring themselves share.
We solve this problem by using breadth-first search for the O (M).
Sign bipartite
Theorem. Graph is bipartite if and only if all its simple cycles have length chёtnuyu.
However, from a practical point of view to look for all the simple cycles uncomfortable. Much easier to check the graph on the bipartition following algorithm:
Algorithm
Proizvedёm series of breadth-first search. Ie will be launching a breadth-first search from each unvisited vertex. The vertex from which we start to go, we put in the first part. In the search for wide, if we go into some new vertex, then we put it in a fraction different from the share of the current top. If we try to pass on to the top edge, which already had, then we check that this vertex and the current vertex were in different proportions. Otherwise, the graph is not bipartite.
At the end of the algorithm, we either find that the graph is not bipartite, or find a partition of vertices of a graph into two parts.
Implementation
int n;
vector <vector <int>> g;
... Read Count ...
vector <char> part (n, -1);
bool ok = true;
vector <int> q (n);
for (int st = 0; st <n; ++ st) if (part [st] == -1) {
int h = 0, t = 0;
q [t ++] = st;
part [st] = 0;
while (h <t) {
int v = q [h ++];
for (size_t i = 0; i <g [v] .size (); ++ i) { int to = g [v] [i];
if (part [to] == -1)
part [to] =! part [v], q [t ++] =
to; else
ok & = part [to]! = part [v];
} } }
puts (ok? "YES": "NO");
Finding the maximum weight vertex-weighted matchings
Given bipartite graph G. For each vertex of the first part of its specified weight. Required to find the maximum weight matching, ie, with the largest sum of the weights of saturated peaks.
Below we describe and prove the algorithm based on the algorithm of Kuhn , who will find the optimal solution.
Algorithm
The algorithm itself is extremely easy. Let's sort the top of the first part, in descending order (more precisely, non-increasing) scales, and apply the resulting graphalgorithm Kuhn . It is alleged that obtained with the maximum (in terms of number of edges) and a matching is optimal in terms of the amount of saturated vertex weights (despite the fact that after sorting, we do not actually use these weight).
Thus, the implementation will be something like this:
int n;
vector <vector <int>> g (n);
vector used (n);
vector <int> order (n); // Vertex list, sorted by weight ... Reading ...
for (int i = 0; i <n; ++ i) { int v = order [i];
used.assign (n, false);
try_kuhn (v);
}
Function try_kuhn () is taken without any changes from the algorithm Kuhn.
Proof
Recall the basic provisions of the theory of matroids .
Matroid M - is an ordered pair (S, I), where S - the a lot of, I - a non-empty family of subsets of S, which satisfy the following conditions:
1. The set S is finite.
2. I family is hereditary, ie if one set belongs to I, then all of its subsets also belong to I.
3. Structure M has the property replacement, ie if A∈I, and B∈I, and | A | <| B |, then there is an element x∈AB, that A∪x∈I.
Elements of the family I called independent subsets.
Called the weighted matroid, if for each element defined x∈S some weight. Weight subset called the sum of the weights of its elements.
Finally, the most important theorem in the theory of weighted matroid: to get the best response, ie, independent subset with the largest weight, you need to act greedily, starting with an empty subsets will be adding (unless, of course, the current item can be added without disturbing independence) all the elements one by one in order of decreasing (or rather, non-increasing) their weights:
sort the set S in non-increasing weight;
ans = [];
foreach (x in S)
if (ans ∪ x ∈ I) ans = ans ∪ x;
It is alleged that at the end of this process we obtain a subset with the largest weight.
Now we prove that our task - not that other, as a weighted matroid .
Let S - set of all vertices of the first part. To reduce the problem in a bipartite graph matroid respect to the vertices of the first part, we assign each a matching is a subset S, which is the set of vertices of the first part saturated. You can also define the opposite line (from the set of saturated peaks - in matching), which, although not unequivocal, but we will be quite happy.
Then I define a family as a family of subsets of S, for which there is at least one corresponding matching.
Next, each element of S, that for each vertex of the first part, in terms of certain of some weight. And the weight of the subset, as we required in the framework of the theory of matroids, defined as the sum of the weights of elements in it.
Then the problem of finding the maximum weight matchings now reformulated as the problem of finding the maximum weight independent subset.