Exploring Green AI for Audio Deepfake Detection

The state-of-the-art audio deepfake detectors leveraging deep neural networksexhibit impressive recognition performance. Nonetheless, this advantage isaccompanied by a significant carbon footprint. This is mainly due to the use ofhigh-performance computing with accelerators and high training time. Studiesshow that average deep NLP model produces around 626k lbs ofCO\textsubscript{2} which is equivalent to five times of average US caremission at its lifetime. This is certainly a massive threat to theenvironment. To tackle this challenge, this study presents a novel frameworkfor audio deepfake detection that can be seamlessly trained using standard CPUresources. Our proposed framework utilizes off-the-shelve self-supervisedlearning (SSL) based models which are pre-trained and available in publicrepositories. In contrast to existing methods that fine-tune SSL models andemploy additional deep neural networks for downstream tasks, we exploitclassical machine learning algorithms such as logistic regression and shallowneural networks using the SSL embeddings extracted using the pre-trained model.Our approach shows competitive results compared to the commonly usedhigh-carbon footprint approaches. In experiments with the ASVspoof 2019 LAdataset, we achieve a 0.90\% equal error rate (EER) with less than 1k trainablemodel parameters. To encourage further research in this direction and supportreproducible results, the Python code will be made publicly accessiblefollowing acceptance. Github: https://github.com/sahasubhajit/Speech-Spoofing-