Lipreading On Lrs2

평가 지표

Word Error Rate (WER)

평가 결과

이 벤치마크에서 각 모델의 성능 결과

모델 이름	Word Error Rate (WER)	Paper Title	Repository
ES³ Large	26.7	ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations	-
ES³ Base*	31.4	ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations	-
CTC + KD ASR	53.2	ASR is all you need: cross-modal distillation for lip reading	-
USR	15.4	Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs
VTP (more data)	22.6	Sub-word Level Lip Reading With Visual Attention	-
LF-MMI TDNN	48.86	Audio-visual Recognition of Overlapped speech for the LRS2 dataset	-
CTC/Attention (LRW+LRS2/3+AVSpeech)	25.5	Visual Speech Recognition for Multiple Languages in the Wild
Multi-head Visual-Audio Memory	44.5	Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
ES³ Large + extLM	24.6	ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations	-
SyncVSR	28.9	SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
MoCo + wav2vec (w/o extLM)	43.2	Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition
ES³ Base + extLM	28.7	ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations	-
TM-seq2seq + extLM	48.3	Deep Audio-Visual Speech Recognition
Hybrid CTC / Attention	39.1	End-to-end Audio-visual Speech Recognition with Conformers
Conv-seq2seq	51.7	Spatio-Temporal Fusion Based Convolutional Sequence Learning for Lip Reading	-
Auto-AVSR	14.6	Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
TM-CTC + extLM	54.7	Deep Audio-Visual Speech Recognition
SyncVSR	16.5	SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
RAVEn Large	18.6	Jointly Learning Visual and Auditory Speech Representations from Raw Data
ES³ Base* + extLM	29.3	ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations	-

0 of 25 row(s) selected.