Tandem spoofing-robust automatic speaker verification based on time-domain embeddings

Spoofing-robust automatic speaker verification (SASV) systems are a crucialtechnology for the protection against spoofed speech. In this study, we focuson logical access attacks and introduce a novel approach to SASV tasks. A novelrepresentation of genuine and spoofed speech is employed, based on theprobability mass function (PMF) of waveform amplitudes in the time domain. Thismethodology generates novel time embeddings derived from the PMF of selectedgroups within the training set. This paper highlights the role of gendersegregation and its positive impact on performance. We propose a countermeasure(CM) system that employs time-domain embeddings derived from the PMF of spoofedand genuine speech, as well as gender recognition based on male and femaletime-based embeddings. The method exhibits notable gender recognitioncapabilities, with mismatch rates of 0.94% and 1.79% for males and females,respectively. The male and female CM systems achieve an equal error rate (EER)of 8.67% and 10.12%, respectively. By integrating this approach withtraditional speaker verification systems, we demonstrate improvedgeneralization ability and tandem detection cost function evaluation using theASVspoof2019 challenge database. Furthermore, we investigate the impact offusing the time embedding approach with traditional CM and illustrate how thisfusion enhances generalization in SASV architectures.