Fine-Grained Age Estimation in the wild with Attention LSTM Networks

Age estimation from a single face image has been an essential task in thefield of human-computer interaction and computer vision, which has a wide rangeof practical application values. Accuracy of age estimation of face images inthe wild is relatively low for existing methods, because they only take intoaccount the global features, while neglecting the fine-grained features ofage-sensitive areas. We propose a novel method based on our attention longshort-term memory (AL) network for fine-grained age estimation in the wild,inspired by the fine-grained categories and the visual attention mechanism.This method combines the residual networks (ResNets) or the residual network ofresidual network (RoR) models with LSTM units to construct AL-ResNets or AL-RoRnetworks to extract local features of age-sensitive regions, which effectivelyimproves the age estimation accuracy. First, a ResNets or a RoR modelpretrained on ImageNet dataset is selected as the basic model, which is thenfine-tuned on the IMDB-WIKI-101 dataset for age estimation. Then, we fine-tunethe ResNets or the RoR on the target age datasets to extract the globalfeatures of face images. To extract the local features of age-sensitiveregions, the LSTM unit is then presented to obtain the coordinates of theagesensitive region automatically. Finally, the age group classification isconducted directly on the Adience dataset, and age-regression experiments areperformed by the Deep EXpectation algorithm (DEX) on MORPH Album 2, FG-NET and15/16LAP datasets. By combining the global and the local features, we obtainour final prediction results. Experimental results illustrate the effectivenessand robustness of the proposed AL-ResNets or AL-RoR for age estimation in thewild, where it achieves better state-of-the-art performance than all otherconvolutional neural network.