Let me start off by saying that working with igraph through its python bindings has been a great relieve! Although its R bindings seem to be much popular, I much prefer python over R. It’s not that R is bad—on the contrary, it’s very good—but it’s simply not very much a programmer’s tool, it’s more a statistician’s type of tool. I guess I’m not the latter.
Although igraph already implements a wide range of different methods for community detection, I thought it lacked a more general interface. That’s why I decided to create a new framework for community detection, which should provide quite some flexibility. The general algorithm behind it, is the well-known Louvain algorithm. This algorithm is known to perform quite well, both in terms of computational time and in terms of quality of the found partition.
The general idea of the Louvain algorithm is very simple. We start out with a partition in which each node is in its own community. We then move nodes to other communities as long as this increases our quality function. If we can no longer improve the partition by moving around nodes, we aggregate the graph, and then repeat the same procedure on this larger graph.
So, given this most general setup of the framework, there is only one function that needs to be changed if we want to optimise some other quality function: how much do we improve the quality if we move a node.
This is exactly the setup I chose here. I implemented a base class
VertexPartition that takes care of most of the basic administration of a partition (e.g. how many links there are inside a community, how big the community is, etc…). If we want to move a node, we call the function
move_node(v, c) to move node
v to another community
c and all the administration is updated automatically. Internally, in the optimisation procedure, the Louvain algorithm calls
diff_move(v, c) to know what is the improvement of the quality function if were to actually call
move_node(v,c). The Louvain algorithm simply choses the community which maximises this improvement. In summary, the following example illustrates exactly how these different functions should work:
diff = partition.diff_move(v,c); q1 = partition.quality(); partition.move_node(v,c); q2 = partition.quality();
diff == q2 - q1. Always. Or at least that should be the case.
Simply taking this base class, deriving from it, and implementing a different
diff_move function and a different
quality function, allows to quickly implement your favorite method, and optimise it using the Louvain algorithm.
I’ve now implemented four different classes that derived from
ModularityVertexPartition, which implements modularity.
CPMVertexPartition, which implements the CPM method.
SignificanceVertexPartition, which optimises significance.
RBConfigurationVertexPartition, which is the Reichardt and Bornholdt model with a configuration model, which basically comes down to modularity but with a multiplicative resolution parameter.
So how would you use this in python? Well first, we import the package of course:
import igraph as ig import louvain as louvain
What then remains is very simple:
G = ig.Graph.Tree(n=100, children=3); opt = louvain.Optimiser(); partition = opt.find_partition(graph=G, partition_class=louvain.SignificanceVertexPartition);
partition contains the partition which optimises significance. Simply providing a different type of class to the optimiser will use that class to construct the partitions, and hence will use that class’
diff_move function to move around nodes.
But wait, there’s more. Some methods, such as CPM, implement resolution parameters. Given some conditions, these can be effectively scanned using bisectioning. This is also implemented in the optimiser. Again, usage is very simple, and should be nothing more than this:
res_results = opt.bisect(graph=G, partition_class=louvain.CPMVertexPartition, resolution_range=(0,1));
res_results contains all the relevant resolution values (i.e. where the optimal partition changes) including all the partitions.
So, here you have it, everything I mentioned in my most recent paper implemented in python. In fact, this idea started simply to have a nice framework which, once done, I could easily translate to C++. Of course, implementing it in C++ should make it run much faster. On the other hand, the current implementation allows for very fast, flexible and easy extension of the framework with new methods. Also, this python version should work quite well up to 1000 nodes or so. For everything bigger, wait a bit until the C++ version is out.
Almost forgot, you can download it from launchpad: https://launchpad.net/pylouvain-igraph. Oh, and it also works on directed graphs and weighted graphs (if the method supports it).