Speech Recognition On Lrs3 Ted
評価指標
Word Error Rate (WER)
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
モデル名 | Word Error Rate (WER) | Paper Title | Repository |
---|---|---|---|
Whisper | 0.68 | Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation | |
RAVEn Large | 1.4 | Jointly Learning Visual and Auditory Speech Representations from Raw Data | |
Llama-AVSR | 0.81 | Large Language Models are Strong Audio-Visual Speech Recognition Learners | - |
AV-HuBERT Large | 1.3 | Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction |
0 of 4 row(s) selected.