8 months ago

Abstract

Recent synthetic speech detectors leveraging the Transformer model havesuperior performance compared to the convolutional neural network counterparts.This improvement could be due to the powerful modeling ability of themulti-head self-attention (MHSA) in the Transformer model, which learns thetemporal relationship of each input token. However, artifacts of syntheticspeech can be located in specific regions of both frequency channels andtemporal segments, while MHSA neglects this temporal-channel dependency of theinput sequence. In this work, we proposed a Temporal-Channel Modeling (TCM)module to enhance MHSA's capability for capturing temporal-channeldependencies. Experimental results on the ASVspoof 2021 show that with only0.03M additional parameters, the TCM module can outperform the state-of-the-artsystem by 9.25% in EER. Further ablation study reveals that utilizing bothtemporal and channel information yields the most improvement for detectingsynthetic speech.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

8 months ago

Transformer

Text-to-Speech

Audio and Speech Processing

Method/Architecture

Audio

Task/Problem

Duc-Tuan Truong Ruijie Tao Tuan Nguyen Hieu-Thi Luong Kong Aik Lee Eng Siong Chng

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

8 months ago

Transformer

Text-to-Speech

Audio and Speech Processing

Method/Architecture

Audio

Task/Problem

Duc-Tuan Truong Ruijie Tao Tuan Nguyen Hieu-Thi Luong Kong Aik Lee Eng Siong Chng

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Temporal-Channel Modeling in Multi-head Self-Attention for Synthetic Speech Detection

Duc-Tuan Truong Ruijie Tao Tuan Nguyen Hieu-Thi Luong Kong Aik Lee Eng Siong Chng

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Temporal-Channel Modeling in Multi-head Self-Attention for Synthetic Speech Detection

Duc-Tuan Truong Ruijie Tao Tuan Nguyen Hieu-Thi Luong Kong Aik Lee Eng Siong Chng

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Temporal-Channel Modeling in Multi-head Self-Attention for Synthetic Speech Detection

Duc-Tuan Truong Ruijie Tao Tuan Nguyen Hieu-Thi Luong Kong Aik Lee Eng Siong Chng

Abstract

Build AI with AI

HyperAI Newsletters