HyperAI

Speech Recognition On Librispeech Test Clean

المقاييس

Word Error Rate (WER)

النتائج

نتائج أداء النماذج المختلفة على هذا المعيار القياسي

اسم النموذج
Word Error Rate (WER)
Paper TitleRepository
LAS (no LM)2.7SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
CTC + Transformer LM rescoring2.10Faster, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces-
Stateformer1.76Multi-Head State Space Model for Speech Recognition-
Conformer(L)1.9Conformer: Convolution-augmented Transformer for Speech Recognition
Hybrid model with Transformer rescoring2.3RWTH ASR Systems for LibriSpeech: Hybrid vs Attention -- w/o Data Augmentation
Conv + Transformer AM + Iterative Pseudo-Labeling (n-gram LM + Transformer Rescoring)2.10Iterative Pseudo-Labeling for Speech Recognition-
Zipformer+CR-CTC (no external language model)2.02CR-CTC: Consistency regularization on CTC for improved speech recognition
Seq-to-seq attention3.82Improved training of end-to-end attention models for speech recognition
tdnn + chain + rnnlm rescoring3.06Neural Network Language Modeling with Letter-based Features and Importance Sampling-
Transformer+Time reduction+Self Knowledge distillation1.9Transformer-based ASR Incorporating Time-reduction Layer and Fine-tuning with Self-Knowledge Distillation-
Li-GRU6.2The PyTorch-Kaldi Speech Recognition Toolkit
Transformer2.6A Comparative Study on Transformer vs RNN in Speech Applications
Snips6.4Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces
w2v-BERT XXL1.4W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training
Zipformer+pruned transducer w/ CR-CTC (no external language model)1.88CR-CTC: Consistency regularization on CTC for improved speech recognition
Conv + Transformer AM (ConvLM with Transformer Rescoring) (LS only)2.31End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures
Multi-Stream Self-Attention With Dilated 1D Convolutions2.20State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention With Dilated 1D Convolutions
Zipformer+pruned transducer (no external language model)2.00Zipformer: A faster and better encoder for automatic speech recognition
United-MedASR (764M)0.985High-precision medical speech recognition through synthetic data and semantic correction: UNITED-MEDASR-
ContextNet(M)2ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context
0 of 64 row(s) selected.