Automatic Speech Recognition On Lrs2
Metrics
Test WER
Results
Performance results of various models on this benchmark
Model Name | Test WER | Paper Title | Repository |
---|---|---|---|
TM-CTC | 10.1 | Deep Audio-Visual Speech Recognition | |
LF-MMI TDNN | 6.7 | Audio-visual Recognition of Overlapped speech for the LRS2 dataset | - |
CTC/attention | 8.2 | Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture | - |
MoCo + wav2vec (w/o extLM) | 2.7 | Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition | |
Whisper-LLaMA | 6.6 | Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition | |
CTC/Attention | 1.5 | Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels | |
TM-seq2seq | 9.7 | Deep Audio-Visual Speech Recognition | |
Whisper | 1.3 | Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation | |
End2end Conformer | 3.9 | End-to-end Audio-visual Speech Recognition with Conformers |
0 of 9 row(s) selected.