Lipreading On Lrs3 Ted
評価指標
Word Error Rate (WER)
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
比較表
モデル名 | Word Error Rate (WER) |
---|---|
syncvsr-data-efficient-visual-speech | 31.2 |
syncvsr-data-efficient-visual-speech | 21.5 |
sub-word-level-lip-reading-with-visual | 30.7 |
auto-avsr-audio-visual-speech-recognition | 19.1 |
where-visual-speech-meets-language-vsp-llm | 25.4 |
visual-speech-recognition-for-multiple | 31.5 |
learning-audio-visual-speech-representation-1 | 26.9 |
end-to-end-audio-visual-speech-recognition | 43.3 |
audio-visual-representation-learning-via | 26.2 |
large-scale-visual-speech-recognition | 55.1 |
discriminative-multi-modality-speech | 57.8 |
conformers-are-all-you-need-for-visual-speech | 12.8 |
spatio-temporal-fusion-based-convolutional | 60.1 |
asr-is-all-you-need-cross-modal-distillation | 59.8 |
es3-evolving-self-supervised-learning-of | 37.1 |
jointly-learning-visual-and-auditory-speech | 23.4 |
relaxed-attention-for-transformer-models | 25.51 |
unified-speech-recognition-a-single-model-for | 21.5 |
recurrent-neural-network-transducer-for-audio | 33.6 |
es3-evolving-self-supervised-learning-of | 40.3 |
deep-audio-visual-speech-recognition | 58.9 |
unified-speech-recognition-a-single-model-for | 22.3 |
sub-word-level-lip-reading-with-visual | 40.6 |