EmoNeXt: an Adapted ConvNeXt for Facial Emotion Recognition

Facial expressions play a crucial role in human communication serving as apowerful and impactful means to express a wide range of emotions. Withadvancements in artificial intelligence and computer vision, deep neuralnetworks have emerged as effective tools for facial emotion recognition. Inthis paper, we propose EmoNeXt, a novel deep learning framework for facialexpression recognition based on an adapted ConvNeXt architecture network. Weintegrate a Spatial Transformer Network (STN) to focus on feature-rich regionsof the face and Squeeze-and-Excitation blocks to capture channel-wisedependencies. Moreover, we introduce a self-attention regularization term,encouraging the model to generate compact feature vectors. We demonstrate thesuperiority of our model over existing state-of-the-art deep learning models onthe FER2013 dataset regarding emotion classification accuracy.