HyperAIHyperAI
2 months ago

GPU-accelerated Guided Source Separation for Meeting Transcription

Raj, Desh ; Povey, Daniel ; Khudanpur, Sanjeev
GPU-accelerated Guided Source Separation for Meeting Transcription
Abstract

Guided source separation (GSS) is a type of target-speaker extraction methodthat relies on pre-computed speaker activities and blind source separation toperform front-end enhancement of overlapped speech signals. It was firstproposed during the CHiME-5 challenge and provided significant improvementsover the delay-and-sum beamforming baseline. Despite its strengths, however,the method has seen limited adoption for meeting transcription benchmarksprimarily due to its high computation time. In this paper, we describe ourimproved implementation of GSS that leverages the power of modern GPU-basedpipelines, including batched processing of frequencies and segments, to provide300x speed-up over CPU-based inference. The improved inference time allows usto perform detailed ablation studies over several parameters of the GSSalgorithm -- such as context duration, number of channels, and noise class, toname a few. We provide end-to-end reproducible pipelines for speaker-attributedtranscription of popular meeting benchmarks: LibriCSS, AMI, and AliMeeting. Ourcode and recipes are publicly available: https://github.com/desh2608/gss.