Word Error RateWord Error Rate
Word Error Rate (WER) is one of the important indicators for evaluating the performance of automatic speech recognition (ASR) systems. It reflects the ratio of the number of incorrectly recognized words to the total number of words in the speech recognition process. The lower the WER, the better the performance of the speech recognition system.
WER measures the minimum number of edits (insertions, deletions, substitutions) required to convert from a reference text to a recognized/generated text, and normalizes it to a ratio. The value range is usually: 0 (perfect match) to 1 (complete error), or expressed as a percentage (such as 5% error rate). WER can be used as an important feedback indicator in the model training process. By monitoring changes in the word error rate, researchers can adjust the model's parameters and optimize the training strategy to improve the model's performance. For example, when training a speech recognition model, if the word error rate is too high, it may be necessary to increase training data, improve the model architecture, or adjust the training algorithm.