HyperAI
HyperAI
الرئيسية
الأخبار
أحدث الأوراق البحثية
الدروس
مجموعات البيانات
الموسوعة
SOTA
نماذج LLM
لوحة الأداء GPU
الفعاليات
البحث
حول
العربية
HyperAI
HyperAI
Toggle sidebar
البحث في الموقع...
⌘
K
الرئيسية
SOTA
القراءة_شفهية
Lipreading On Lrs2
Lipreading On Lrs2
المقاييس
Word Error Rate (WER)
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
Columns
اسم النموذج
Word Error Rate (WER)
Paper Title
Repository
ES³ Large
26.7
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
ES³ Base*
31.4
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
CTC + KD ASR
53.2
ASR is all you need: cross-modal distillation for lip reading
-
USR
15.4
Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs
-
VTP (more data)
22.6
Sub-word Level Lip Reading With Visual Attention
-
LF-MMI TDNN
48.86
Audio-visual Recognition of Overlapped speech for the LRS2 dataset
-
CTC/Attention (LRW+LRS2/3+AVSpeech)
25.5
Visual Speech Recognition for Multiple Languages in the Wild
-
Multi-head Visual-Audio Memory
44.5
Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
-
ES³ Large + extLM
24.6
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
SyncVSR
28.9
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
-
MoCo + wav2vec (w/o extLM)
43.2
Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition
-
ES³ Base + extLM
28.7
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
TM-seq2seq + extLM
48.3
Deep Audio-Visual Speech Recognition
-
Hybrid CTC / Attention
39.1
End-to-end Audio-visual Speech Recognition with Conformers
-
Conv-seq2seq
51.7
Spatio-Temporal Fusion Based Convolutional Sequence Learning for Lip Reading
-
Auto-AVSR
14.6
Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
-
TM-CTC + extLM
54.7
Deep Audio-Visual Speech Recognition
-
SyncVSR
16.5
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
-
RAVEn Large
18.6
Jointly Learning Visual and Auditory Speech Representations from Raw Data
-
ES³ Base* + extLM
29.3
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
0 of 25 row(s) selected.
Previous
Next
Lipreading On Lrs2 | SOTA | HyperAI