HyperAI

Abstract

We present the first neural network model to achieve real-time and streamingtarget sound extraction. To accomplish this, we propose Waveformer, anencoder-decoder architecture with a stack of dilated causal convolution layersas the encoder, and a transformer decoder layer as the decoder. This hybridarchitecture uses dilated causal convolutions for processing large receptivefields in a computationally efficient manner while also leveraging thegeneralization performance of transformer-based architectures. Our evaluationsshow as much as 2.2-3.3 dB improvement in SI-SNRi compared to the prior modelsfor this task while having a 1.2-4x smaller model size and a 1.5-2x lowerruntime. We provide code, dataset, and audio samples:https://waveformer.cs.washington.edu/.

Abstract

Bandhav Veluri Justin Chan Malek Itani Tuochao Chen Takuya Yoshioka Shyamnath Gollakota

Abstract

Build AI with AI

HyperAI Newsletters

Bandhav Veluri Justin Chan Malek Itani Tuochao Chen Takuya Yoshioka Shyamnath Gollakota

Abstract

Build AI with AI

HyperAI Newsletters

Bandhav Veluri Justin Chan Malek Itani Tuochao Chen Takuya Yoshioka Shyamnath Gollakota

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Real-Time Target Sound Extraction

Bandhav Veluri Justin Chan Malek Itani Tuochao Chen Takuya Yoshioka Shyamnath Gollakota

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Real-Time Target Sound Extraction

Bandhav Veluri Justin Chan Malek Itani Tuochao Chen Takuya Yoshioka Shyamnath Gollakota

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Real-Time Target Sound Extraction

Bandhav Veluri Justin Chan Malek Itani Tuochao Chen Takuya Yoshioka Shyamnath Gollakota

Abstract

Build AI with AI

HyperAI Newsletters