Multi-scale Dynamic and Hierarchical Relationship Modeling for Facial Action Units Recognition

Human facial action units (AUs) are mutually related in a hierarchicalmanner, as not only they are associated with each other in both spatial andtemporal domains but also AUs located in the same/close facial regions showstronger relationships than those of different facial regions. While none ofexisting approach thoroughly model such hierarchical inter-dependencies amongAUs, this paper proposes to comprehensively model multi-scale AU-relateddynamic and hierarchical spatio-temporal relationship among AUs for theiroccurrences recognition. Specifically, we first propose a novel multi-scaletemporal differencing network with an adaptive weighting block to explicitlycapture facial dynamics across frames at different spatial scales, whichspecifically considers the heterogeneity of range and magnitude in differentAUs' activation. Then, a two-stage strategy is introduced to hierarchicallymodel the relationship among AUs based on their spatial distribution (i.e.,local and cross-region AU relationship modelling). Experimental resultsachieved on BP4D and DISFA show that our approach is the new state-of-the-artin the field of AU occurrence recognition. Our code is publicly available athttps://github.com/CVI-SZU/MDHR.