HyperAI超神経

Speech Recognition On Librispeech Test Clean

評価指標

Word Error Rate (WER)

評価結果

このベンチマークにおける各モデルのパフォーマンス結果

モデル名
Word Error Rate (WER)
Paper TitleRepository
LAS (no LM)2.7SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
CTC + Transformer LM rescoring2.10Faster, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces-
Stateformer1.76Multi-Head State Space Model for Speech Recognition-
Conformer(L)1.9Conformer: Convolution-augmented Transformer for Speech Recognition
Hybrid model with Transformer rescoring2.3RWTH ASR Systems for LibriSpeech: Hybrid vs Attention -- w/o Data Augmentation
Conv + Transformer AM + Iterative Pseudo-Labeling (n-gram LM + Transformer Rescoring)2.10Iterative Pseudo-Labeling for Speech Recognition-
Zipformer+CR-CTC (no external language model)2.02CR-CTC: Consistency regularization on CTC for improved speech recognition
Seq-to-seq attention3.82Improved training of end-to-end attention models for speech recognition
tdnn + chain + rnnlm rescoring3.06Neural Network Language Modeling with Letter-based Features and Importance Sampling-
Transformer+Time reduction+Self Knowledge distillation1.9Transformer-based ASR Incorporating Time-reduction Layer and Fine-tuning with Self-Knowledge Distillation-
Li-GRU6.2The PyTorch-Kaldi Speech Recognition Toolkit
Transformer2.6A Comparative Study on Transformer vs RNN in Speech Applications
Snips6.4Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces
w2v-BERT XXL1.4W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training
Zipformer+pruned transducer w/ CR-CTC (no external language model)1.88CR-CTC: Consistency regularization on CTC for improved speech recognition
Conv + Transformer AM (ConvLM with Transformer Rescoring) (LS only)2.31End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures
Multi-Stream Self-Attention With Dilated 1D Convolutions2.20State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention With Dilated 1D Convolutions
Zipformer+pruned transducer (no external language model)2.00Zipformer: A faster and better encoder for automatic speech recognition
United-MedASR (764M)0.985High-precision medical speech recognition through synthetic data and semantic correction: UNITED-MEDASR-
ContextNet(M)2ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context
0 of 64 row(s) selected.