HyperAI
الرئيسية
الأخبار
أحدث الأوراق البحثية
الدروس
مجموعات البيانات
الموسوعة
SOTA
نماذج LLM
لوحة الأداء GPU
الفعاليات
البحث
حول
العربية
HyperAI
Toggle sidebar
البحث في الموقع...
⌘
K
الرئيسية
SOTA
Audio Visual Speech Recognition
Audio Visual Speech Recognition On Lrs3 Ted
Audio Visual Speech Recognition On Lrs3 Ted
المقاييس
Word Error Rate (WER)
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
Columns
اسم النموذج
Word Error Rate (WER)
Paper Title
Repository
EG-seq2seq
6.8
Discriminative Multi-modality Speech Recognition
DistillAV
1.3
Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation Models
TM-seq2seq
7.2
Deep Audio-Visual Speech Recognition
RNN-T
4.5
Recurrent Neural Network Transducer for Audio-Visual Speech Recognition
Hyb-Conformer
2.3
End-to-end Audio-visual Speech Recognition with Conformers
AV-HuBERT Large
1.4
Robust Self-Supervised Audio-Visual Speech Recognition
Llama-AVSR
0.77
Large Language Models are Strong Audio-Visual Speech Recognition Learners
-
CTC/Attention
0.9
Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
Whisper-Flamingo
0.76
Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
RAVEn Large
1.4
Jointly Learning Visual and Auditory Speech Representations from Raw Data
Zero-AVSR
1.5
Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations
MMS-LLaMA
0.74
-
-
0 of 12 row(s) selected.
Previous
Next