Jasper DR 10x5 (+ Time/Freq Masks) | 7.84 | Jasper: An End-to-End Convolutional Neural Acoustic Model | |
Conv + Transformer AM (ConvLM with Transformer Rescoring) (LS only) | 5.18 | End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures | |
Convolutional Speech Recognition | 10.47 | Fully Convolutional Speech Recognition | - |
Conv + Transformer AM + Iterative Pseudo-Labeling (n-gram LM + Transformer Rescoring) | 3.83 | Iterative Pseudo-Labeling for Speech Recognition | - |
Multi-Stream Self-Attention With Dilated 1D Convolutions | 5.80 | State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention With Dilated 1D Convolutions | |
Zipformer+pruned transducer w/ CR-CTC
(no external language model) | 3.95 | CR-CTC: Consistency regularization on CTC for improved speech recognition | |
Local Prior Matching (Large Model, ConvLM LM) | 15.28 | Semi-Supervised Speech Recognition via Local Prior Matching | |
Zipformer+CR-CTC
(no external language model) | 4.35 | CR-CTC: Consistency regularization on CTC for improved speech recognition | |