HyperAI超神经

Audio Visual Speech Recognition On Lrs3 Ted

评估指标

Word Error Rate (WER)

评测结果

各个模型在此基准测试上的表现结果

比较表格
模型名称Word Error Rate (WER)
discriminative-multi-modality-speech6.8
audio-visual-representation-learning-via1.3
deep-audio-visual-speech-recognition7.2
recurrent-neural-network-transducer-for-audio4.5
end-to-end-audio-visual-speech-recognition2.3
robust-self-supervised-audio-visual-speech1.4
large-language-models-are-strong-audio-visual0.77
auto-avsr-audio-visual-speech-recognition0.9
whisper-flamingo-integrating-visual-features0.76
jointly-learning-visual-and-auditory-speech1.4
zero-avsr-zero-shot-audio-visual-speech1.5
mms-llama-efficient-llm-based-audio-visual-10.74