Speech Recognition On Swb_Hub_500 Wer
Metriken
Percentage error
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Modellname | Percentage error | Paper Title | Repository |
---|---|---|---|
DNN + Dropout | 19.1 | Building DNN Acoustic Models for Large Vocabulary Speech Recognition | |
CNN + Bi-RNN + CTC (speech to letters), 25.9% WER if trainedonlyon SWB | 16 | Deep Speech: Scaling up end-to-end speech recognition | |
HMM-TDNN + iVectors | 17.1 | - | - |
HMM-DNN +sMBR | 18.4 | - | - |
IBM (LSTM+Conformer encoder-decoder) | 6.8 | On the limit of English conversational speech recognition | - |
RNN + VGG + LSTM acoustic model trained on SWB+Fisher+CH, N-gram + "model M" + NNLM language model | 12.2 | The IBM 2016 English Conversational Telephone Speech Recognition System | - |
ResNet + BiLSTMs acoustic model | 10.3 | English Conversational Telephone Speech Recognition by Humans and Machines | - |
VGG/Resnet/LACE/BiLSTM acoustic model trained on SWB+Fisher+CH, N-gram + RNNLM language model trained on Switchboard+Fisher+Gigaword+Broadcast | 11.9 | The Microsoft 2016 Conversational Speech Recognition System | - |
HMM-BLSTM trained with MMI + data augmentation (speed) + iVectors + 3 regularizations + Fisher | 13 | - | - |
HMM-TDNN trained with MMI + data augmentation (speed) + iVectors + 3 regularizations + Fisher (10% / 15.1% respectively trained on SWBD only) | 13.3 | - | - |
HMM-TDNN + pNorm + speed up/down speech | 19.3 | - | - |
IBM (LSTM encoder-decoder) | 7.8 | Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard | - |
0 of 12 row(s) selected.