HyperAIHyperAI초신경
홈뉴스연구 논문튜토리얼데이터셋백과사전SOTALLM 모델GPU 랭킹컨퍼런스
전체 검색
소개
한국어
HyperAIHyperAI초신경
  1. 홈
  2. SOTA
  3. 오디오-비주얼 음성 인식
  4. Audio Visual Speech Recognition On Lrs3 Ted

Audio Visual Speech Recognition On Lrs3 Ted

평가 지표

Word Error Rate (WER)

평가 결과

이 벤치마크에서 각 모델의 성능 결과

모델 이름
Word Error Rate (WER)
Paper TitleRepository
EG-seq2seq6.8Discriminative Multi-modality Speech Recognition
DistillAV1.3Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation Models
TM-seq2seq7.2Deep Audio-Visual Speech Recognition
RNN-T4.5Recurrent Neural Network Transducer for Audio-Visual Speech Recognition
Hyb-Conformer2.3End-to-end Audio-visual Speech Recognition with Conformers
AV-HuBERT Large1.4Robust Self-Supervised Audio-Visual Speech Recognition
Llama-AVSR0.77Large Language Models are Strong Audio-Visual Speech Recognition Learners
CTC/Attention0.9Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
Whisper-Flamingo0.76Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
RAVEn Large1.4Jointly Learning Visual and Auditory Speech Representations from Raw Data
Zero-AVSR1.5Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations
MMS-LLaMA0.74MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens
0 of 12 row(s) selected.
HyperAI

학습, 이해, 실천, 커뮤니티와 함께 인공지능의 미래를 구축하다

한국어

소개

회사 소개데이터셋 도움말

제품

뉴스튜토리얼데이터셋백과사전

링크

TVM 한국어Apache TVMOpenBayes

© HyperAI초신경

TwitterBilibili