HyperAI超神经
首页
资讯
最新论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
首页
SOTA
Lipreading
Lipreading On Lrs2
Lipreading On Lrs2
评估指标
Word Error Rate (WER)
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Word Error Rate (WER)
Paper Title
Repository
ES³ Large
26.7
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
ES³ Base*
31.4
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
CTC + KD ASR
53.2
ASR is all you need: cross-modal distillation for lip reading
-
USR
15.4
Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs
VTP (more data)
22.6
Sub-word Level Lip Reading With Visual Attention
-
LF-MMI TDNN
48.86
Audio-visual Recognition of Overlapped speech for the LRS2 dataset
-
CTC/Attention (LRW+LRS2/3+AVSpeech)
25.5
Visual Speech Recognition for Multiple Languages in the Wild
Multi-head Visual-Audio Memory
44.5
Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
ES³ Large + extLM
24.6
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
SyncVSR
28.9
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
MoCo + wav2vec (w/o extLM)
43.2
Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition
ES³ Base + extLM
28.7
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
TM-seq2seq + extLM
48.3
Deep Audio-Visual Speech Recognition
Hybrid CTC / Attention
39.1
End-to-end Audio-visual Speech Recognition with Conformers
Conv-seq2seq
51.7
Spatio-Temporal Fusion Based Convolutional Sequence Learning for Lip Reading
-
Auto-AVSR
14.6
Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
TM-CTC + extLM
54.7
Deep Audio-Visual Speech Recognition
SyncVSR
16.5
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
RAVEn Large
18.6
Jointly Learning Visual and Auditory Speech Representations from Raw Data
ES³ Base* + extLM
29.3
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
0 of 25 row(s) selected.
Previous
Next