Cross-Domain Facial Expression Recognition: A Unified Evaluation Benchmark and Adversarial Graph Learning

To address the problem of data inconsistencies among different facialexpression recognition (FER) datasets, many cross-domain FER methods (CD-FERs)have been extensively devised in recent years. Although each declares toachieve superior performance, fair comparisons are lacking due to theinconsistent choices of the source/target datasets and feature extractors. Inthis work, we first analyze the performance effect caused by these inconsistentchoices, and then re-implement some well-performing CD-FER and recentlypublished domain adaptation algorithms. We ensure that all these algorithmsadopt the same source datasets and feature extractors for fair CD-FERevaluations. We find that most of the current leading algorithms useadversarial learning to learn holistic domain-invariant features to mitigatedomain shifts. However, these algorithms ignore local features, which are moretransferable across different datasets and carry more detailed content forfine-grained adaptation. To address these issues, we integrate graphrepresentation propagation with adversarial learning for cross-domainholistic-local feature co-adaptation by developing a novel adversarial graphrepresentation adaptation (AGRA) framework. Specifically, it first builds twographs to correlate holistic and local regions within each domain and acrossdifferent domains, respectively. Then, it extracts holistic-local features fromthe input image and uses learnable per-class statistical distributions toinitialize the corresponding graph nodes. Finally, two stacked graphconvolution networks (GCNs) are adopted to propagate holistic-local featureswithin each domain to explore their interaction and across different domainsfor holistic-local feature co-adaptation. We conduct extensive and fairevaluations on several popular benchmarks and show that the proposed AGRAframework outperforms previous state-of-the-art methods.