HyperAI

Speech Recognition On Librispeech Test Other

Métriques

Word Error Rate (WER)

Résultats

Résultats de performance de divers modèles sur ce benchmark

Nom du modèle
Word Error Rate (WER)
Paper TitleRepository
Espresso8.7Espresso: A Fast End-to-end Neural Speech Recognition Toolkit
Jasper DR 10x5 (+ Time/Freq Masks)7.84Jasper: An End-to-End Convolutional Neural Acoustic Model
Conv + Transformer AM (ConvLM with Transformer Rescoring) (LS only)5.18End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures
ContextNet(L)4.1ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context
SAMBA ASR2.48Samba-ASR: State-Of-The-Art Speech Recognition Leveraging Structured State-Space Models-
LAS + SpecAugment5.8SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
Jasper DR 10x58.79Jasper: An End-to-End Convolutional Neural Acoustic Model
Convolutional Speech Recognition10.47Fully Convolutional Speech Recognition-
FAdam2.49FAdam: Adam is a natural gradient optimizer using diagonal empirical Fisher information
Conv + Transformer AM + Iterative Pseudo-Labeling (n-gram LM + Transformer Rescoring)3.83Iterative Pseudo-Labeling for Speech Recognition-
Conformer(M)4.3Conformer: Convolution-augmented Transformer for Speech Recognition
Multi-Stream Self-Attention With Dilated 1D Convolutions5.80State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention With Dilated 1D Convolutions
ContextNet(M)4.5ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context
HuBERT with Libri-Light2.9HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units-
Zipformer+pruned transducer w/ CR-CTC (no external language model)3.95CR-CTC: Consistency regularization on CTC for improved speech recognition
Local Prior Matching (Large Model, ConvLM LM)15.28Semi-Supervised Speech Recognition via Local Prior Matching
MT4SSL9.6MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets
Conformer(S)5.0Conformer: Convolution-augmented Transformer for Speech Recognition
Branchformer + GFSA4.94Graph Convolutions Enrich the Self-Attention in Transformers!
Zipformer+CR-CTC (no external language model)4.35CR-CTC: Consistency regularization on CTC for improved speech recognition
0 of 53 row(s) selected.