Easy, flexible and fast framework for community detection.

\(\)
Good news! In an earlier post, I talked about my flexible implementation of the Louvain algorithm in Python. Now, it is also available in a C++ implementation, making it run much (really much) faster. Whereas the previous version already began to stagger for a thousand nodes, this implementation can easily be applied to graphs of millions of nodes (although you still might need to be patient when your graph is really that big). Even better, you can now simply install it using pip: sudo pip install louvain. For Windows users, you can refer to the binary installer at the PyPi repository. The source code is also available from GitHub.

Using it is really straightforward. To start, make sure to import the packages:

import louvain
import igraph as ig

We’ll create a random graph for testing purposes:

G = ig.Graph.Erdos_Renyi(100, 0.1);

For simply finding a partition use:

part = louvain.find_partition(G, method='Modularity');

Notice that part now contains an additional variable, part.quality which stores the quality of the partition as calculated by the used method. You can always get the quality of the partition using another method by calling

part.significance = louvain.quality(G, partition, method='Significance');

You can also find partitions for multiplex graphs. For each layer you then specify the objective function, and the overall objective function is simply the sum over all layers, weighted by some weight. If we denote by \(q_k\) the quality of layer \(k\) and the weight by \(w_k\), the overall quality is then \(q = \sum_k w_k q_k\). This can also be useful in case you have negative links . In principle, this could also be used to detect temporal communities in a dynamic setting .

For example, assuming you have a graph with positive weights G_positive and a graph with negative weights G_negative, and you want to use Modularity for finding a partition, you can use

membership, quality = louvain.find_partition_multiplex([
louvain.Layer(graph=G_positive, method='Modularity', layer_weight=1.0),
louvain.Layer(graph=G_negative, method='Modularity', layer_weight=-1.0)])

Notice the negative layer weight is -1.0 for the negative graph, since we want those edges to fall between communities rather than within. One particular problem when using negative links, is that the optimal community is no longer guaranteed to be connected (it may be a multipartite partition). You may therefore need the option consider_comms=ALL_COMMS to improve the quality of the partition. Notice that this runs much slower than only considering neighbouring communities (which is the default).

Various methods, such as the Potts model from , or CPM support a (linear) resolution parameter, which can be effectively bisected . You can do this by calling:

res_parts = louvain.bisect(G, method='CPM', resolution_range=[0,1]);

Notice this may take some time to run, as it effectively calls louvain.find_partition for various resolution parameters (depending on the
settings possibly hundreds of times).

Then res_parts is a dictionary containing as keys the resolution, and as values a NamedTuple with variables partition and bisect_value, which contains the partition and the value at which the resolution was bisected (the value of the bisect_func of the bisect function). You could for example plot the bisection value of all the found partitions by using:

import pandas as pd
import matplotlib.pyplot as plt
res_df = pd.DataFrame({
         'resolution': res_parts.keys(),
         'bisect_value': [bisect.bisect_value for bisect in res_parts.values()]});
plt.step(res_df['resolution'], res_df['bisect_value']);
plt.xscale('log');

The result should look something like this for a random graph.
bisect
















Mucha, Peter J, Thomas Richardson, Kevin Macon, Mason A Porter, and Jukka-Pekka Onnela. 2010. “Community Structure in Time-Dependent, Multiscale, and Multiplex Networks.” Science (New York, N.Y.) 328 (5980): 876–78. https://doi.org/10.1126/science.1184819. Cite
Reichardt, Jörg, and Stefan Bornholdt. 2006. “Statistical Mechanics of Community Detection.” Physical Review E 74 (1): 016110+. https://doi.org/10.1103/PhysRevE.74.016110. Cite
Traag, V. A., and Jeroen Bruggeman. 2009. “Community Detection in Networks with Positive and Negative Links.” Physical Review E 80 (3): 036115. https://doi.org/10.1103/PhysRevE.80.036115. Cite
Traag, V. A., Gautier Krings, and Paul Van Dooren. 2013. “Significant Scales in Community Structure.” Scientific Reports 3: 2930. https://doi.org/10.1038/srep02930. Cite
Traag, V. A., P. Van Dooren, and Y. Nesterov. 2011. “Narrow Scope for Resolution-Limit-Free Community Detection.” Physical Review E 84 (1): 016114. https://doi.org/10.1103/PhysRevE.84.016114. Cite

Share

5 thoughts on “Easy, flexible and fast framework for community detection.

  1. Thanks! I have two questions: 1) can you give a hint about you build/extend the set of methods? 2) for the Newman modularity metric, it appears to be a scaled version that is returned?

    1. You’re welcome!
      1) it should be easy to implement your own extensions. What method would you like to see implemented?
      2) yes, indeed, the non-normalised version is returned, perhaps that should be made clearer in the documentation. That is, the value is not divided by the number of edges \(m\) (or \(2m\)).

      1. Super. My intention was to experiment with some non-standard methods (for software modularization), so any quick pointers to how to build locally to experiment with such methods would be great.

  2. Easiest would be to fork the project on GitHub, and use that Git repository. That way, we could also easily reintegrate any contribution you make. You need to derive from the base class MutableVertexPartition and implement the methods quality and diff_move. They provide respectively the quality of the partition \(\mathcal{H}\) and the difference in the quality when moving a node (which is a community in the aggregated network) \(\Delta \mathcal{H}\). You can take a look at the implementation of ModularityVertexPartition for an example.

    The calculation of \(\Delta \mathcal{H}\) is the only real challenge: make sure you calculate this correctly. When putting #define DEBUG in GraphHelper.h (which is included everywhere) both the difference using diff_move and the difference using quality before and after actually moving a node is calculated (in Optimiser.cpp:400-414). You can use that to verify whether at least both expressions are consistent. Whether they are consistently wrong or consistently correct is another matter of course, but they are at least consistent. 😉

    Finally, compiling and installing is simple. I recommend to use python setup.py install --record files.txt so that you can use cat files.txt | xargs rm to remove the installed files easily again (possibly both need to be run as root). This both compiles and installs the python package in the appropriate location. You probably want to uninstall the PyPi package first in order to prevent problems. This is assuming you run a Unix variant. If you are using Windows, I suggest you switch to a decent Unix variant 😉

    Good luck! Let me know how it works out! And if you have any other questions, don’t hesitate to reply here, or just mail me (or open an issue on GitHub).

Leave a Reply

Your email address will not be published. Required fields are marked *

Captcha *
Time limit is exhausted. Please reload CAPTCHA.

This site uses Akismet to reduce spam. Learn how your comment data is processed.