Audio Visual Speech Recognition On Lrs2
평가 지표
Test WER
평가 결과
이 벤치마크에서 각 모델의 성능 결과
모델 이름 | Test WER | Paper Title | Repository |
---|---|---|---|
End2end Conformer | 3.7 | End-to-end Audio-visual Speech Recognition with Conformers | |
TM-CTC | 8.2 | Deep Audio-Visual Speech Recognition | |
CTC/Attention | 1.5 | Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels | |
TM-Seq2seq | 8.5 | Deep Audio-Visual Speech Recognition | |
LF-MMI TDNN | 5.9 | Audio-visual Recognition of Overlapped speech for the LRS2 dataset | - |
Whisper-Flamingo | 1.4 | Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation | |
MoCo + wav2vec (w/o extLM) | 2.6 | Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition | |
CTC/Attention | 7.0 | Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture | - |
0 of 8 row(s) selected.