HyperAI

Speech Recognition On Librispeech Test Other

Metriken

Word Error Rate (WER)

Ergebnisse

Leistungsergebnisse verschiedener Modelle zu diesem Benchmark

Modellname
Word Error Rate (WER)
Paper TitleRepository
Espresso8.7Espresso: A Fast End-to-end Neural Speech Recognition Toolkit
Jasper DR 10x5 (+ Time/Freq Masks)7.84Jasper: An End-to-End Convolutional Neural Acoustic Model
Conv + Transformer AM (ConvLM with Transformer Rescoring) (LS only)5.18End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures
ContextNet(L)4.1ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context
SAMBA ASR2.48Samba-ASR: State-Of-The-Art Speech Recognition Leveraging Structured State-Space Models-
LAS + SpecAugment5.8SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
Jasper DR 10x58.79Jasper: An End-to-End Convolutional Neural Acoustic Model
Convolutional Speech Recognition10.47Fully Convolutional Speech Recognition-
FAdam2.49FAdam: Adam is a natural gradient optimizer using diagonal empirical Fisher information
Conv + Transformer AM + Iterative Pseudo-Labeling (n-gram LM + Transformer Rescoring)3.83Iterative Pseudo-Labeling for Speech Recognition-
Conformer(M)4.3Conformer: Convolution-augmented Transformer for Speech Recognition
Multi-Stream Self-Attention With Dilated 1D Convolutions5.80State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention With Dilated 1D Convolutions
ContextNet(M)4.5ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context
HuBERT with Libri-Light2.9HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units-
Zipformer+pruned transducer w/ CR-CTC (no external language model)3.95CR-CTC: Consistency regularization on CTC for improved speech recognition
Local Prior Matching (Large Model, ConvLM LM)15.28Semi-Supervised Speech Recognition via Local Prior Matching
MT4SSL9.6MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets
Conformer(S)5.0Conformer: Convolution-augmented Transformer for Speech Recognition
Branchformer + GFSA4.94Graph Convolutions Enrich the Self-Attention in Transformers!
Zipformer+CR-CTC (no external language model)4.35CR-CTC: Consistency regularization on CTC for improved speech recognition
0 of 53 row(s) selected.