A Hybrid Transformer-Sequencer approach for Age and Gender classification from in-wild facial images

The advancements in computer vision and image processing techniques have ledto emergence of new application in the domain of visual surveillance, targetedadvertisement, content-based searching, and human-computer interaction etc. Outof the various techniques in computer vision, face analysis, in particular, hasgained much attention. Several previous studies have tried to explore differentapplications of facial feature processing for a variety of tasks, including ageand gender classification. However, despite several previous studies havingexplored the problem, the age and gender classification of in-wild human facesis still far from the achieving the desired levels of accuracy required forreal-world applications. This paper, therefore, attempts to bridge this gap byproposing a hybrid model that combines self-attention and BiLSTM approaches forage and gender classification problems. The proposed models performance iscompared with several state-of-the-art model proposed so far. An improvement ofapproximately 10percent and 6percent over the state-of-the-art implementationsfor age and gender classification, respectively, are noted for the proposedmodel. The proposed model is thus found to achieve superior performance and isfound to provide a more generalized learning. The model can, therefore, beapplied as a core classification component in various image processing andcomputer vision problems.