Let me start off by saying that working with igraph through its python bindings has been a great relieve! Although its R bindings seem to be much popular, I much prefer python over R. It’s not that R is bad—on the contrary, it’s very good—but it’s simply not very much a programmer’s tool, it’s more a statistician’s type of tool. I guess I’m not the latter.
Although igraph already implements a wide range of different methods for community detection, I thought it lacked a more general interface. That’s why I decided to create a new framework for community detection, which should provide quite some flexibility. The general algorithm behind it, is the well-known Louvain algorithm. This algorithm is known to perform quite well, both in terms of computational time and in terms of quality of the found partition.
The general idea of the Louvain algorithm is very simple. We start out with a partition in which each node is in its own community. We then move nodes to other communities as long as this increases our quality function. If we can no longer improve the partition by moving around nodes, we aggregate the graph, and then repeat the same procedure on this larger graph.
So, given this most general setup of the framework, there is only one function that needs to be changed if we want to optimise some other quality function: how much do we improve the quality if we move a node.
This is exactly the setup I chose here. I implemented a base class
VertexPartition that takes care of most of the basic administration of a partition (e.g. how many links there are inside a community, how big the community is, etc…). If we want to move a node, we call the function
move_node(v, c) to move node
v to another community
c and all the administration is updated automatically. Internally, in the optimisation procedure, the Louvain algorithm calls
diff_move(v, c) to know what is the improvement of the quality function if were to actually call
move_node(v,c). The Louvain algorithm simply choses the community which maximises this improvement. In summary, the following example illustrates exactly how these different functions should work:
diff = partition.diff_move(v,c); q1 = partition.quality(); partition.move_node(v,c); q2 = partition.quality();
diff == q2 - q1. Always. Or at least that should be the case.
Simply taking this base class, deriving from it, and implementing a different
diff_move function and a different
quality function, allows to quickly implement your favorite method, and optimise it using the Louvain algorithm.
I’ve now implemented four different classes that derived from
ModularityVertexPartition, which implements modularity.
CPMVertexPartition, which implements the CPM method.
SignificanceVertexPartition, which optimises significance.
RBConfigurationVertexPartition, which is the Reichardt and Bornholdt model with a configuration model, which basically comes down to modularity but with a multiplicative resolution parameter.
So how would you use this in python? Well first, we import the package of course:
import igraph as ig import louvain as louvain
What then remains is very simple:
G = ig.Graph.Tree(n=100, children=3); opt = louvain.Optimiser(); partition = opt.find_partition(graph=G, partition_class=louvain.SignificanceVertexPartition);
partition contains the partition which optimises significance. Simply providing a different type of class to the optimiser will use that class to construct the partitions, and hence will use that class’
diff_move function to move around nodes.
But wait, there’s more. Some methods, such as CPM, implement resolution parameters. Given some conditions, these can be effectively scanned using bisectioning. This is also implemented in the optimiser. Again, usage is very simple, and should be nothing more than this:
res_results = opt.bisect(graph=G, partition_class=louvain.CPMVertexPartition, resolution_range=(0,1));
res_results contains all the relevant resolution values (i.e. where the optimal partition changes) including all the partitions.
So, here you have it, everything I mentioned in my most recent paper implemented in python. In fact, this idea started simply to have a nice framework which, once done, I could easily translate to C++. Of course, implementing it in C++ should make it run much faster. On the other hand, the current implementation allows for very fast, flexible and easy extension of the framework with new methods. Also, this python version should work quite well up to 1000 nodes or so. For everything bigger, wait a bit until the C++ version is out.
Almost forgot, you can download it from launchpad: https://launchpad.net/pylouvain-igraph. Oh, and it also works on directed graphs and weighted graphs (if the method supports it).
18 thoughts on “Easy, flexible framework for community detection”
Great job! Yes, it would have been great if igraph allowed arbitrary quality functions in its implementation of the Louvain method (and also the fast greedy community detection algorithm), but unfortunately it’s more complicated to implement that in the C layer.
I have looked into the source code on Launchpad and I was wondering whether you considered using the
contract_vertices()method of igraph’s Graph objects instead of implementing your own in the
Also, you could probably also make use of igraph’s
compare_communities()function in your implementation in place of the community comparison methods of your class.
Yes, I already feared it would be slightly more difficult to implement in the C layer. That’s why I hoped that starting out with a python prototype would allow me do to it a bit faster… 🙂
I did consider using the
collapse_graphmethod, but for some graphs it crashed. Furthermore, the collapse_graph method does it in-place, so that if you wouldn’t like to change the original graph you’d have to copy the original graph. This consumes more memory than needed (which, admittedly, is only relevant for large graphs, which are impossible to analyse currently anyway with this code, but I’m just thinking ahead…) But indeed, that should be one possible improvement, especially if you could get a new graph returned by
collapse_graph! I really like the flexible way it is implemented! By the way, if you want details for the bug in the
collapse_graphmethod, I would have to check out what makes it crash. I know it has something to do with the vertex attributes that are present in the graph, because if I remove those attributes, the
In fact, ideally I would simply derive from the
VertexClusteringclass, upon which I could then implement the additional framework. Unfortunately, the membership vector in that class seems to be unmutable, which makes moving around nodes in communities impossible. I agree that it shouldn’t be possible to arbitrarily assign a new community to a node, but rather that it should happen through a method that also updates the rest of the administration (as I do now in the
What do you think? Would it be possible to udpate the
collapse_graphfunction so that it returns a new graph? And would it be possible to update the membership vector (even if only through some method)?
collapse_verticesshould definitely not crash so please send me a small example on which I could reproduce the problem. Once I managed to fix this, we can think about making
collapse_verticesreturn a new graph instead of copying, although I would probably avoid premature optimization and try simply copying the graph first using the
Re the mutability of
VertexClustering: yes, it is true that the class was not meant to be mutable (especially because some properties of that class, such as the modularity score, are cached internally, and mutating the object should therefore invalidate all the cached values). However, Python does not stop you from messing around with the internals of the
VertexClusteringclass if you know what you are doing, so you can simply modify the
_membershipproperty (which is just an ordinary list).
A nicer solution would probably be to derive your own
VertexClustering, add your own mutator methods such as
move_node(), and make the method invalidate the cached
Yeah, I’ll have a closer look at what made it crash when I have some time. True that for now it’s not necesarry to re-implement the
collapse_verticesmethod for optimisation purposes. If I re-use the
VertexClusteringby the way, I could of course just use the
cluster_graphof that class.
Good idea to derive a
MutableVertexClusteringclass first! I didn’t look further into the code to see if I could mutate the membership vector some other way, good to know I can use
_membership. I’ll update the code once I get a chance!
I just pushed the new code to launchpad. The
MutableVertexPartitionnow derives form the
VertexClustering, so all previous code should be compatible with it. I haven’t looked yet at the error for cluster_graph.
I am studying your Python code. Thank you for sharing it!
I was looking for the Reichardt-Bornholdt implementation and I have noticed two things:
1) In the RBConfigurationVertexPartition class, in the diff_move function, you write:
if (not self._graph.is_directed()):
diff /= 2.0;
In the modularity, you commented these lines. Is it right, or should it be commented, too?
2) In the page http://www.traag.net/code/ you write that the Reichardt-Bornholdt measure is implemented with an Erdos-Renyi null-model.
In the RBConfigurationVertexPartition class, it seems that in the diff_move function you use a configuration null-model, while in the RBConfiguration function, instead, it seems that you use a different null-model. Could you explain me if I am wrong, please?
Have a nice evening!
Hopefully you can use the code!
To answer your questions:
C++code available from https://launchpad.net/louvain. I should make that more clear. I am not sure what you mean exactly with the
RBConfigurationfunction, as compared to the
diff_movefunction. Could you perhaps elaborate?
Sorry for my late answer, but I have been busy 🙂
1. Ok, thank you. Maybe I am wrong, but I think that function should be very similar to modulary, except for the resolution parameter.
2. Yes, maybe I could have explained better. In your RBConfiguration function, you do:
mod = mod + w - \
w_out*w_in/(float(4 - 3*self._graph.is_directed())*self._total_weight);
If self._graph.is_directed() is 0, then it sums:
mod = mod + w – self.resolution * w_out * w_in / 4 * self._total_weight
Why do you divide by 4?
I tried to write a sort of the derivation of the Reichardt-Bornholdt measure by communities and it should be something like the last equation of this PDF:
In the equation that I have derived, I don’t divide by 4. Am I wrong?
Thank you, goodbye!
Sorry, I was wrong with my derivation. I think the RB Measure is right 🙂
I believe you’re right, I made some changes in the Modularity class at some point, which I didn’t do in the RBConfiguration class. I believe you are in fact right about the formula in the RBConfiguration class, it should now be consistent with the original modularity formula. The thing is that some things depend on whether the graph is directed or not, and for some properties you do need to take this into account, whereas for others you do not. Anyway, thanks for spotting those errors!
I’ve pushed the latest code to launchpad, you should be able to find it there.
Thank you for your quick help!
Your project has been very helpful.
Have a nice weekend!
hi.i was intrested in detecting communities in twitter…please any one help me how to start work
You’re probably better off checking out a website such as https://dev.twitter.com/docs/twitter-libraries for getting some twitter data. Once you have that, resort to libraries such as
igraphfor help with creating networks. You can then either use the method from
igraphitself or use the framework described here.
Should you be interested, I just created a
C++version of this implementation, which is available from https://github.com/vtraag/louvain-igraph. The python package (including Windows binaries) can be downloaded from https://pypi.python.org/pypi/louvain/.
There is also a quite famous Louvain implementation for Networkx.
However, that implementation does not accept directed graphs. I was wondering if you had to redefine any metrics for directed graphs for your algorithm to work.
Yes, I know that Thomas Aynaud also has an implementation, but I never used it. Although it is interesting to have a pure python package, it also makes
networkxin general not very suitable for larger graphs. Similarly with the package referred to in this post, it is nice as idea, but the actual implementation is generally too slow to work with larger graphs.
Therefore, please notice that it is superseded by a
C++version, which is much, much faster. It can deal with directed, undirected, weighted and unweighted graphs. Additionally, it can deal with multiple layers (some of which may contain negative links for example), and implements currently four different methods (Modularity variants, Significance and Surprise). So you should be able to use this package for your purpose.
Thank you for sharing your code. I was wondering what are the methods that allow community detection in weighted directed graphs. Thank you.
All of them support directed graphs, only Significance doesn’t support weighted graphs. Please note that you should use the latest python package built in C++ not the pure Python package (unless for some simple conceptual testing).