HyperAI
HyperAI
Startseite
Neuigkeiten
Neueste Forschungsarbeiten
Tutorials
Datensätze
Wiki
SOTA
LLM-Modelle
GPU-Rangliste
Veranstaltungen
Suche
Über
Deutsch
HyperAI
HyperAI
Toggle sidebar
Seite durchsuchen…
⌘
K
Startseite
SOTA
Lippenlesung
Lipreading On Lrs2
Lipreading On Lrs2
Metriken
Word Error Rate (WER)
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Columns
Modellname
Word Error Rate (WER)
Paper Title
Repository
ES³ Large
26.7
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
ES³ Base*
31.4
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
CTC + KD ASR
53.2
ASR is all you need: cross-modal distillation for lip reading
-
USR
15.4
Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs
-
VTP (more data)
22.6
Sub-word Level Lip Reading With Visual Attention
-
LF-MMI TDNN
48.86
Audio-visual Recognition of Overlapped speech for the LRS2 dataset
-
CTC/Attention (LRW+LRS2/3+AVSpeech)
25.5
Visual Speech Recognition for Multiple Languages in the Wild
-
Multi-head Visual-Audio Memory
44.5
Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
-
ES³ Large + extLM
24.6
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
SyncVSR
28.9
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
-
MoCo + wav2vec (w/o extLM)
43.2
Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition
-
ES³ Base + extLM
28.7
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
TM-seq2seq + extLM
48.3
Deep Audio-Visual Speech Recognition
-
Hybrid CTC / Attention
39.1
End-to-end Audio-visual Speech Recognition with Conformers
-
Conv-seq2seq
51.7
Spatio-Temporal Fusion Based Convolutional Sequence Learning for Lip Reading
-
Auto-AVSR
14.6
Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
-
TM-CTC + extLM
54.7
Deep Audio-Visual Speech Recognition
-
SyncVSR
16.5
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
-
RAVEn Large
18.6
Jointly Learning Visual and Auditory Speech Representations from Raw Data
-
ES³ Base* + extLM
29.3
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
0 of 25 row(s) selected.
Previous
Next