HMM-TDNN + pNorm + speed up/down speech | 12.9 | - | - |
RNN + VGG + LSTM acoustic model trained on SWB+Fisher+CH, N-gram + "model M" + NNLM language model | 6.6 | The IBM 2016 English Conversational Telephone Speech Recognition System | - |
CNN on MFSC/fbanks + 1 non-conv layer for FMLLR/I-Vectors concatenated in a DNN | 10.4 | - | - |
VGG/Resnet/LACE/BiLSTM acoustic model trained on SWB+Fisher+CH, N-gram + RNNLM language model trained on Switchboard+Fisher+Gigaword+Broadcast | 6.3 | The Microsoft 2016 Conversational Speech Recognition System | - |
IBM (LSTM+Conformer encoder-decoder) | 4.3 | On the limit of English conversational speech recognition | - |