Speech Recognition On Lrs3 Ted

Word Error Rate (WER)

평가 결과

이 벤치마크에서 각 모델의 성능 결과

		Paper Title
RAVEn Large	1.4	Jointly Learning Visual and Auditory Speech Representations from Raw Data
AV-HuBERT Large	1.3	Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction
Llama-AVSR	0.81	Large Language Models are Strong Audio-Visual Speech Recognition Learners
Whisper	0.68	Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation

0 of 4 row(s) selected.