End-to-end speech recognition using lattice-free MMI

We present our work on end-to-end training of acoustic modelsusing the lattice-free maximum mutual information (LF-MMI)objective function in the context of hidden Markov models.By end-to-end training, we mean flat-start training of a singleDNN in one stage without using any previously trained models,forced alignments, or building state-tying decision trees. Weuse full biphones to enable context-dependent modeling without trees, and show that our end-to-end LF-MMI approach canachieve comparable results to regular LF-MMI on well-knownlarge vocabulary tasks. We also compare with other end-to-endmethods such as CTC in character-based and lexicon-free settings and show 5 to 25 percent relative reduction in word error rates on different large vocabulary tasks while using significantly smaller models.