8 months ago

Abstract

Sound event detection (SED) is essential for recognizing specific sounds andtheir temporal locations within acoustic signals. This becomes challengingparticularly for on-device applications, where computational resources arelimited. To address this issue, we introduce a novel framework referred to asdual knowledge distillation for developing efficient SED systems in this work.Our proposed dual knowledge distillation commences with temporal-averagingknowledge distillation (TAKD), utilizing a mean student model derived from thetemporal averaging of the student model's parameters. This allows the studentmodel to indirectly learn from a pre-trained teacher model, ensuring a stableknowledge distillation. Subsequently, we introduce embedding-enhanced featuredistillation (EEFD), which involves incorporating an embedding distillationlayer within the student model to bolster contextual learning. On DCASE 2023Task 4A public evaluation dataset, our proposed SED system with dual knowledgedistillation having merely one-third of the baseline model's parameters,demonstrates superior performance in terms of PSDS1 and PSDS2. This highlightsthe importance of proposed dual knowledge distillation for compact SED systems,which can be ideal for edge devices.

Source PDF