Betweenness-based methods

One alternative way of finding communities of vertices in a network is to look for the edges that lie between communities. If we can find and remove these edges, we will be left with just the isolated communities. To identify edges between communities one common approach is to use edge betweenness centrality, which counts the number of geodesic paths that run along edges. This method is implemented in igraph with the function edge.betweenness.community.

Our algorithm for detecting communities is then as follows. We calculate the betweenness scores of all edges in our network and then search through them for the edge with the highest score and remove it. In removing the edge we will change the betweenness scores of some edges, because any shortest paths that previously traversed the removed edge will now have to be rerouted another way. So we must recalculate the betweenness scores following the removal. Then we search again for the edge with the highest score and remove it, and so forth. As we remove one edge after another an initially connected network will eventually split into two pieces, and then into three, and so on, until every node belongs to a singleton. The progress of the algorithm can be represented using a hierarchical structure called dendrogram.

This algorithm is somewhat different from previous ones, therefore, in that it doesn't give a single decomposition of a network into communities, but a selection of different possibilities, ranging from coarse divisions into just a few large communities (at the top of the dendrogram) to fine divisions into many small communities (at the bottom). It is up to the user to decide which of the many divisions represented is most useful for their purposes. One could in principle use a measure such as modularity to quantify the quality of the different divisions and select the one with the highest quality in this sense.

However, it is more appropriate simply to think of this betweenness-based algorithm as producing a different kind of output, one that has its own advantages and disadvantages but that can undoubtedly tell us interesting things about network structure. In particular, notice that the divisions represented in the dendrogram form a hierarchical decomposition in which the communities at one level are completely contained within the larger communities at all higher levels.

The betweenness-based algorithm is, unfortunately, quite slow. The calculation of betweenness for all edges takes time of order $O(n(m + n))$ and we have to perform this calculation before the removal of each of the $m$ edges, so the entire algorithm takes time $O(mn(m + n))$, or $O(n^3)$ on a sparse graph.

An interesting variation on the betweenness algorithm has been proposed by Radicchi et al. Their idea revolves around the same basic principle of identifying the edges between communities and removing them, but the measure used to perform the identification is different. Radicchi et al. observe that the edges that fall between otherwise poorly connected communities are unlikely to belong to short loops of edges, because doing so would require there to be a return path along another between-group edge, of which there are, by definition, few. Thus one way to identify the edges between communities would be to look for edges that belong to an unusually small number of short loops. Radicchi et al. found that loops of length three and four gave the best results.

An attractive feature of this method is its speed. The calculation of the number of short loops to which an edge belongs is a local calculation and can be performed for all edges in time that goes like the total size of the network. Thus, in the worst case, the running time of the algorithm will only go as $O(n^2)$ on a sparse graph, which is one order of system size faster than the betweenness-based algorithm and as fast as the earlier methods based on modularity maximization.

On the other hand, the algorithm of Radicchi et al. has the disadvantage that it only works on networks that have a significant number of short loops in the first place. This restricts the method primarily to social networks, which indeed have large numbers of short loops. Other types of network, such as technological and biological networks, tend to have smaller numbers of short loops, and hence there is little to distinguish between-group edges from within-group ones.