Environmental Sound Classification on the Edge: A Pipeline for Deep Acoustic Networks on Extremely Resource-Constrained Devices

Significant efforts are being invested to bring state-of-the-artclassification and recognition to edge devices with extreme resourceconstraints (memory, speed, and lack of GPU support). Here, we demonstrate thefirst deep network for acoustic recognition that is small, flexible andcompression-friendly yet achieves state-of-the-art performance for raw audioclassification. Rather than handcrafting a once-off solution, we present ageneric pipeline that automatically converts a large deep convolutional networkvia compression and quantization into a network for resource-impoverished edgedevices. After introducing ACDNet, which produces above state-of-the-artaccuracy on ESC-10 (96.65%), ESC-50 (87.10%), UrbanSound8K (84.45%) andAudioEvent (92.57%), we describe the compression pipeline and show that itallows us to achieve 97.22% size reduction and 97.28% FLOP reduction whilemaintaining close to state-of-the-art accuracy 96.25%, 83.65%, 78.27% and89.69% on these datasets. We describe a successful implementation on a standardoff-the-shelf microcontroller and, beyond laboratory benchmarks, reportsuccessful tests on real-world datasets.