HyperAI
HyperAI
Accueil
Actualités
Articles de recherche récents
Tutoriels
Ensembles de données
Wiki
SOTA
Modèles LLM
Classement GPU
Événements
Recherche
À propos
Français
HyperAI
HyperAI
Toggle sidebar
Rechercher sur le site...
⌘
K
Accueil
SOTA
Reconnaissance vocale audiovisuelle
Audio Visual Speech Recognition On Lrs3 Ted
Audio Visual Speech Recognition On Lrs3 Ted
Métriques
Word Error Rate (WER)
Résultats
Résultats de performance de divers modèles sur ce benchmark
Columns
Nom du modèle
Word Error Rate (WER)
Paper Title
Repository
EG-seq2seq
6.8
Discriminative Multi-modality Speech Recognition
-
DistillAV
1.3
Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation Models
-
TM-seq2seq
7.2
Deep Audio-Visual Speech Recognition
-
RNN-T
4.5
Recurrent Neural Network Transducer for Audio-Visual Speech Recognition
-
Hyb-Conformer
2.3
End-to-end Audio-visual Speech Recognition with Conformers
-
AV-HuBERT Large
1.4
Robust Self-Supervised Audio-Visual Speech Recognition
-
Llama-AVSR
0.77
Large Language Models are Strong Audio-Visual Speech Recognition Learners
-
CTC/Attention
0.9
Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
-
Whisper-Flamingo
0.76
Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
-
RAVEn Large
1.4
Jointly Learning Visual and Auditory Speech Representations from Raw Data
-
Zero-AVSR
1.5
Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations
-
MMS-LLaMA
0.74
MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens
-
0 of 12 row(s) selected.
Previous
Next
Audio Visual Speech Recognition On Lrs3 Ted | SOTA | HyperAI