Speech Recognition On Librispeech Test Clean

Métriques

Word Error Rate (WER)

Résultats

Résultats de performance de divers modèles sur ce benchmark

Nom du modèle	Word Error Rate (WER)	Paper Title	Repository
LAS (no LM)	2.7	SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
CTC + Transformer LM rescoring	2.10	Faster, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces	-
Stateformer	1.76	Multi-Head State Space Model for Speech Recognition	-
Conformer(L)	1.9	Conformer: Convolution-augmented Transformer for Speech Recognition
Hybrid model with Transformer rescoring	2.3	RWTH ASR Systems for LibriSpeech: Hybrid vs Attention -- w/o Data Augmentation
Conv + Transformer AM + Iterative Pseudo-Labeling (n-gram LM + Transformer Rescoring)	2.10	Iterative Pseudo-Labeling for Speech Recognition	-
Zipformer+CR-CTC (no external language model)	2.02	CR-CTC: Consistency regularization on CTC for improved speech recognition
Seq-to-seq attention	3.82	Improved training of end-to-end attention models for speech recognition
tdnn + chain + rnnlm rescoring	3.06	Neural Network Language Modeling with Letter-based Features and Importance Sampling	-
Transformer+Time reduction+Self Knowledge distillation	1.9	Transformer-based ASR Incorporating Time-reduction Layer and Fine-tuning with Self-Knowledge Distillation	-
Li-GRU	6.2	The PyTorch-Kaldi Speech Recognition Toolkit
Transformer	2.6	A Comparative Study on Transformer vs RNN in Speech Applications
Snips	6.4	Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces
w2v-BERT XXL	1.4	W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training
Zipformer+pruned transducer w/ CR-CTC (no external language model)	1.88	CR-CTC: Consistency regularization on CTC for improved speech recognition
Conv + Transformer AM (ConvLM with Transformer Rescoring) (LS only)	2.31	End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures
Multi-Stream Self-Attention With Dilated 1D Convolutions	2.20	State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention With Dilated 1D Convolutions
Zipformer+pruned transducer (no external language model)	2.00	Zipformer: A faster and better encoder for automatic speech recognition
United-MedASR (764M)	0.985	High-precision medical speech recognition through synthetic data and semantic correction: UNITED-MEDASR	-
ContextNet(M)	2	ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context

0 of 64 row(s) selected.