Speech Recognition On Lrs3 Ted
评估指标
Word Error Rate (WER)
评测结果
各个模型在此基准测试上的表现结果
模型名称 | Word Error Rate (WER) | Paper Title | Repository |
---|---|---|---|
Whisper | 0.68 | Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation | |
RAVEn Large | 1.4 | Jointly Learning Visual and Auditory Speech Representations from Raw Data | |
Llama-AVSR | 0.81 | Large Language Models are Strong Audio-Visual Speech Recognition Learners | - |
AV-HuBERT Large | 1.3 | Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction |
0 of 4 row(s) selected.