HyperAI
HyperAI超神経
ホーム
ニュース
論文
チュートリアル
データセット
百科事典
SOTA
LLMモデル
GPU ランキング
学会
検索
サイトについて
日本語
HyperAI
HyperAI超神経
Toggle sidebar
サイトを検索…
⌘
K
サイトを検索…
⌘
K
ホーム
SOTA
オーディオビジュアル音声認識
Audio Visual Speech Recognition On Lrs3 Ted
Audio Visual Speech Recognition On Lrs3 Ted
評価指標
Word Error Rate (WER)
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
Columns
モデル名
Word Error Rate (WER)
Paper Title
Repository
EG-seq2seq
6.8
Discriminative Multi-modality Speech Recognition
DistillAV
1.3
Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation Models
TM-seq2seq
7.2
Deep Audio-Visual Speech Recognition
RNN-T
4.5
Recurrent Neural Network Transducer for Audio-Visual Speech Recognition
Hyb-Conformer
2.3
End-to-end Audio-visual Speech Recognition with Conformers
AV-HuBERT Large
1.4
Robust Self-Supervised Audio-Visual Speech Recognition
Llama-AVSR
0.77
Large Language Models are Strong Audio-Visual Speech Recognition Learners
CTC/Attention
0.9
Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
Whisper-Flamingo
0.76
Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
RAVEn Large
1.4
Jointly Learning Visual and Auditory Speech Representations from Raw Data
Zero-AVSR
1.5
Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations
MMS-LLaMA
0.74
MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens
0 of 12 row(s) selected.
Previous
Next