HyperAI
الرئيسية
الأخبار
أحدث الأوراق البحثية
الدروس
مجموعات البيانات
الموسوعة
SOTA
نماذج LLM
لوحة الأداء GPU
الفعاليات
البحث
حول
العربية
HyperAI
Toggle sidebar
البحث في الموقع...
⌘
K
الرئيسية
SOTA
Lipreading
Lipreading On Lrs2
Lipreading On Lrs2
المقاييس
Word Error Rate (WER)
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
Columns
اسم النموذج
Word Error Rate (WER)
Paper Title
Repository
ES³ Large
26.7
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
ES³ Base*
31.4
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
CTC + KD ASR
53.2
ASR is all you need: cross-modal distillation for lip reading
-
USR
15.4
Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs
VTP (more data)
22.6
Sub-word Level Lip Reading With Visual Attention
-
LF-MMI TDNN
48.86
Audio-visual Recognition of Overlapped speech for the LRS2 dataset
-
CTC/Attention (LRW+LRS2/3+AVSpeech)
25.5
Visual Speech Recognition for Multiple Languages in the Wild
Multi-head Visual-Audio Memory
44.5
Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
ES³ Large + extLM
24.6
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
SyncVSR
28.9
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
MoCo + wav2vec (w/o extLM)
43.2
Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition
ES³ Base + extLM
28.7
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
TM-seq2seq + extLM
48.3
Deep Audio-Visual Speech Recognition
Hybrid CTC / Attention
39.1
End-to-end Audio-visual Speech Recognition with Conformers
Conv-seq2seq
51.7
Spatio-Temporal Fusion Based Convolutional Sequence Learning for Lip Reading
-
Auto-AVSR
14.6
Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
TM-CTC + extLM
54.7
Deep Audio-Visual Speech Recognition
SyncVSR
16.5
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
RAVEn Large
18.6
Jointly Learning Visual and Auditory Speech Representations from Raw Data
ES³ Base* + extLM
29.3
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
0 of 25 row(s) selected.
Previous
Next