LHGNN: Local-Higher Order Graph Neural Networks For Audio Classification and Tagging

Transformers have set new benchmarks in audio processing tasks, leveragingself-attention mechanisms to capture complex patterns and dependencies withinaudio data. However, their focus on pairwise interactions limits their abilityto process the higher-order relations essential for identifying distinct audioobjects. To address this limitation, this work introduces the Local- HigherOrder Graph Neural Network (LHGNN), a graph based model that enhances featureunderstanding by integrating local neighbourhood information with higher-orderdata from Fuzzy C-Means clusters, thereby capturing a broader spectrum of audiorelationships. Evaluation of the model on three publicly available audiodatasets shows that it outperforms Transformer-based models across allbenchmarks while operating with substantially fewer parameters. Moreover, LHGNNdemonstrates a distinct advantage in scenarios lacking ImageNet pretraining,establishing its effectiveness and efficiency in environments where extensivepretraining data is unavailable.