HyperAI
HyperAI超神经
首页
资讯
最新论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
首页
SOTA
视听语音识别
Audio Visual Speech Recognition On Lrs3 Ted
Audio Visual Speech Recognition On Lrs3 Ted
评估指标
Word Error Rate (WER)
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Word Error Rate (WER)
Paper Title
Repository
EG-seq2seq
6.8
Discriminative Multi-modality Speech Recognition
-
DistillAV
1.3
Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation Models
-
TM-seq2seq
7.2
Deep Audio-Visual Speech Recognition
-
RNN-T
4.5
Recurrent Neural Network Transducer for Audio-Visual Speech Recognition
-
Hyb-Conformer
2.3
End-to-end Audio-visual Speech Recognition with Conformers
-
AV-HuBERT Large
1.4
Robust Self-Supervised Audio-Visual Speech Recognition
-
Llama-AVSR
0.77
Large Language Models are Strong Audio-Visual Speech Recognition Learners
-
CTC/Attention
0.9
Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
-
Whisper-Flamingo
0.76
Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
-
RAVEn Large
1.4
Jointly Learning Visual and Auditory Speech Representations from Raw Data
-
Zero-AVSR
1.5
Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations
-
MMS-LLaMA
0.74
MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens
-
0 of 12 row(s) selected.
Previous
Next
Audio Visual Speech Recognition On Lrs3 Ted | SOTA | HyperAI超神经