GEMSEC: Graph Embedding with Self Clustering

Modern graph embedding procedures can efficiently extract features of nodesfrom graphs with millions of nodes. The features are later used as inputs fordownstream predictive tasks. In this paper we propose GEMSEC a graph embeddingalgorithm which learns a clustering of the nodes simultaneously with computingtheir features. The procedure places nodes in an abstract feature space wherethe vertex features minimize the negative log likelihood of preserving sampledvertex neighborhoods, while the nodes are clustered into a fixed number ofgroups in this space. GEMSEC is a general extension of earlier work in thedomain as it is an augmentation of the core optimization problem of sequencebased graph embedding procedures and is agnostic of the neighborhood samplingstrategy. We show that GEMSEC extracts high quality clusters on real worldsocial networks and is competitive with other community detection algorithms.We demonstrate that the clustering constraint has a positive effect onrepresentation quality and also that our procedure learns to embed and clustergraphs jointly in a robust and scalable manner.