The '''Girvan–Newman algorithm''' (named after Michelle Girvan and [[Mark Newman]]) is one of the methods used to detect [[Community_structureCommunity structure|communities]] in [[complex system]]s.<ref name=newman>Girvan M. and Newman M. E. J., [http://dx.doi.org/10.1073/pnas.122653799 Community structure in social and biological networks], Proc. Natl. Acad. Sci. USA '''99''', 7821–7826 (2002)</ref> The notion of a "community structure" is related to that of clustering, though it isn't quite the same. A community consists of a subset of nodes within which the node-node connections are dense, and the edges to nodes in other communities are less dense.<ref name=newman/> There are numerous alternative methods for detecting communities in networks. These include [[Hierarchical Clustering|hierarchical clustering]], partitioning graphs to maximize [[quality function]]s such as [[Modularity (networks)|network modularity]], [[surprise (networks)|surprise]] maximization, [[clique percolation method|''k''-clique percolation]], etc.
== Edge betweenness and community structure ==
Line 7:
The Girvan–Newman algorithm works the opposite way. Instead of trying to construct a measure that tells us which edges are the most central to communities, it focuses on these edges that are least central, the edges that are most "between" communities. The communities are detected by progressively removing edges from the original graph, rather than by adding the strongest edges to an initially empty network.
[[Betweenness_centralityBetweenness centrality|Vertex betweenness]] has been studied in the past as a measure of the [[centrality]] and influence of nodes in networks. For any node <math>i</math>, vertex betweenness is defined as the number of shortest paths between pairs of nodes that run through it. It is a measure of the influence of a node over the flow of information between other nodes, especially in cases where information flow over a network primarily follows the shortest available path. The Girvan–Newman algorithm extends this definition to the case of edges, defining the "edge betweenness" of an edge as the number of shortest paths between pairs of nodes that run along it. If there is more than one shortest path between a pair of nodes, each path is assigned equal weight such that the total weight of all of the paths is equal to unity. If a network contains communities or groups that are only loosely connected by a few intergroup edges, then all shortest paths between different communities must go along one of these few edges. Thus, the edges connecting communities will have high edge betweenness (at least '''one''' of them). By removing these edges, the groups are separated from one another and so the underlying community structure of the network is revealed.
The algorithm's steps for community detection are summarized below