Mel-frequency Cepstrum MFCCs
Mel-Frequency Cepstral Coefficients (MFCCs) is a widely used technology in the field of sound processing, especially in speech recognition and speaker recognition. It was proposed by Davis and Mermelstein in 1980. It is based on the linear transformation of the logarithmic energy spectrum of the nonlinear Mel scale of sound frequency.
Mel-frequency cepstral coefficients (MFCCs) are the coefficients that make up the Mel-frequency cepstral. They are derived from the cepstral spectrum of the audio segment, and the equally spaced frequency bands on the Mel scale are more approximate to the human auditory system than the linearly spaced frequency bands used in the normal logarithmic cepstral spectrum. This nonlinear representation can make the sound signal have a better representation in many fields, such as in audio compression. The calculation process of MFCCs can be roughly divided into the steps of audio file reading, pre-emphasis, framing, windowing, Fourier transform, obtaining the Mel spectrum through the Mel filter bank, and performing cepstral analysis on the Mel spectrum. MFCCs usually contain 12 coefficients, which are superimposed with the frame energy to obtain 13-dimensional coefficients, which are used to describe the characteristics of each frame of speech.