HyperAI超神经

Speech Recognition On Swb_Hub_500 Wer

评估指标

Percentage error

评测结果

各个模型在此基准测试上的表现结果

模型名称
Percentage error
Paper TitleRepository
DNN + Dropout19.1Building DNN Acoustic Models for Large Vocabulary Speech Recognition
CNN + Bi-RNN + CTC (speech to letters), 25.9% WER if trainedonlyon SWB16Deep Speech: Scaling up end-to-end speech recognition
HMM-TDNN + iVectors17.1--
HMM-DNN +sMBR18.4--
IBM (LSTM+Conformer encoder-decoder)6.8On the limit of English conversational speech recognition-
RNN + VGG + LSTM acoustic model trained on SWB+Fisher+CH, N-gram + "model M" + NNLM language model12.2The IBM 2016 English Conversational Telephone Speech Recognition System-
ResNet + BiLSTMs acoustic model10.3English Conversational Telephone Speech Recognition by Humans and Machines-
VGG/Resnet/LACE/BiLSTM acoustic model trained on SWB+Fisher+CH, N-gram + RNNLM language model trained on Switchboard+Fisher+Gigaword+Broadcast11.9The Microsoft 2016 Conversational Speech Recognition System-
HMM-BLSTM trained with MMI + data augmentation (speed) + iVectors + 3 regularizations + Fisher13--
HMM-TDNN trained with MMI + data augmentation (speed) + iVectors + 3 regularizations + Fisher (10% / 15.1% respectively trained on SWBD only)13.3--
HMM-TDNN + pNorm + speed up/down speech19.3--
IBM (LSTM encoder-decoder)7.8Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard-
0 of 12 row(s) selected.