HyperAIHyperAI超神经
首页资讯论文教程数据集百科SOTALLM 模型天梯GPU 天梯顶会
全站搜索
关于
中文
HyperAIHyperAI超神经
  1. 首页
  2. SOTA
  3. 视听语音识别
  4. Audio Visual Speech Recognition On Lrs3 Ted

Audio Visual Speech Recognition On Lrs3 Ted

评估指标

Word Error Rate (WER)

评测结果

各个模型在此基准测试上的表现结果

模型名称
Word Error Rate (WER)
Paper TitleRepository
EG-seq2seq6.8Discriminative Multi-modality Speech Recognition
DistillAV1.3Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation Models
TM-seq2seq7.2Deep Audio-Visual Speech Recognition
RNN-T4.5Recurrent Neural Network Transducer for Audio-Visual Speech Recognition
Hyb-Conformer2.3End-to-end Audio-visual Speech Recognition with Conformers
AV-HuBERT Large1.4Robust Self-Supervised Audio-Visual Speech Recognition
Llama-AVSR0.77Large Language Models are Strong Audio-Visual Speech Recognition Learners
CTC/Attention0.9Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
Whisper-Flamingo0.76Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
RAVEn Large1.4Jointly Learning Visual and Auditory Speech Representations from Raw Data
Zero-AVSR1.5Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations
MMS-LLaMA0.74MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens
0 of 12 row(s) selected.
HyperAI

学习、理解、实践,与社区一起构建人工智能的未来

中文

关于

关于我们数据集帮助

产品

资讯教程数据集百科

链接

TVM 中文Apache TVMOpenBayes

© HyperAI超神经

津ICP备17010941号-1京公网安备11010502038810号京公网安备11010502038810号
TwitterBilibili