MetaAudio: A Few-Shot Audio Classification Benchmark

Currently available benchmarks for few-shot learning (machine learning withfew training examples) are limited in the domains they cover, primarilyfocusing on image classification. This work aims to alleviate this reliance onimage-based benchmarks by offering the first comprehensive, public and fullyreproducible audio based alternative, covering a variety of sound domains andexperimental settings. We compare the few-shot classification performance of avariety of techniques on seven audio datasets (spanning environmental sounds tohuman-speech). Extending this, we carry out in-depth analyses of joint training(where all datasets are used during training) and cross-dataset adaptationprotocols, establishing the possibility of a generalised audio few-shotclassification algorithm. Our experimentation shows gradient-basedmeta-learning methods such as MAML and Meta-Curvature consistently outperformboth metric and baseline methods. We also demonstrate that the joint trainingroutine helps overall generalisation for the environmental sound databasesincluded, as well as being a somewhat-effective method of tackling thecross-dataset/domain setting.