Acoustic ModelUsed to calculate the probability of the model generating a speech waveform. It is one of the most important parts in the speech recognition system and accounts for most of the computing overhead, determining the performance of the speech recognition system.
Development History
- Traditional methods: Based on hidden Markov acoustic models, such as the GMM-HMM modeling method - GMM is used to model the distribution of speech acoustic features, and HMM is used to model the temporal nature of speech signals;
- Deep neural network: used for speech acoustic model. Hinton and his students used feedforward fully connected deep neural network for speech recognition in 2009, which had better performance than the DNN-HMM-based acoustic model on the TIMIT dataset.
- Utilizing variable-length context information: In 2015, acoustic models that utilize variable-length speech information were put into use. The optimal length of speech information is affected by phonemes and speaking speed. Fixed-length context windows are not the best choice in DNN-HMM hybrid systems. New models in recent years are mainly based on recurrent neural networks (RNN) and convolutional neural networks (CNN).
References
【1】Acoustic Model of Speech Recognition Technology – 52AI Artificial Intelligence – CSDN Blog
【2】Yu Dong, Deputy Director of Tencent AI Lab: Progress in acoustic models based on deep learning in the past two years | Machine Heart