17 days ago

Samba-asr state-of-the-art speech recognition leveraging structured state-space models

Syed Abdul Gaffar Shakhadri, Kruthika KR, Kartik Basavaraj Angadi

Abstract

We propose Samba ASR, the first state-of-the-art Automatic Speech Recognition(ASR) model leveraging the novel Mamba architecture as both encoder anddecoder, built on the foundation of state-space models (SSMs). Unliketransformer-based ASR models, which rely on self-attention mechanisms tocapture dependencies, Samba ASR effectively models both local and globaltemporal dependencies using efficient state-space dynamics, achievingremarkable performance gains. By addressing the limitations of transformers,such as quadratic scaling with input length and difficulty in handlinglong-range dependencies, Samba ASR achieves superior accuracy and efficiency. Experimental results demonstrate that Samba ASR surpasses existingopen-source transformer-based ASR models across various standard benchmarks,establishing it as the new state of the art in ASR. Extensive evaluations onbenchmark datasets show significant improvements in Word Error Rate (WER), withcompetitive performance even in low-resource scenarios. Furthermore, thecomputational efficiency and parameter optimization of the Mamba architecturemake Samba ASR a scalable and robust solution for diverse ASR tasks. Our contributions include: A new Samba ASR architecture demonstrating the superiority of SSMs overtransformer-based models for speech sequence processing. A comprehensiveevaluation on public benchmarks showcasing state-of-the-art performance. Ananalysis of computational efficiency, robustness to noise, and sequencegeneralization. This work highlights the viability of Mamba SSMs as atransformer-free alternative for efficient and accurate ASR. By leveragingstate-space modeling advancements, Samba ASR sets a new benchmark for ASRperformance and future research.