Shape Preserving Facial Landmarks with Graph Attention Networks

Top-performing landmark estimation algorithms are based on exploiting theexcellent ability of large convolutional neural networks (CNNs) to representlocal appearance. However, it is well known that they can only learn weakspatial relationships. To address this problem, we propose a model based on thecombination of a CNN with a cascade of Graph Attention Network regressors. Tothis end, we introduce an encoding that jointly represents the appearance andlocation of facial landmarks and an attention mechanism to weigh theinformation according to its reliability. This is combined with a multi-taskapproach to initialize the location of graph nodes and a coarse-to-finelandmark description scheme. Our experiments confirm that the proposed modellearns a global representation of the structure of the face, achieving topperformance in popular benchmarks on head pose and landmark estimation. Theimprovement provided by our model is most significant in situations involvinglarge changes in the local appearance of landmarks.