HyperAI超神经
首页
资讯
最新论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
首页
SOTA
Audio Captioning
Audio Captioning On Clotho
Audio Captioning On Clotho
评估指标
BLEU-4
CIDEr
METEOR
ROUGE-L
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
BLEU-4
CIDEr
METEOR
ROUGE-L
Paper Title
Repository
VALOR
16.2
0.423
17.4
38.2
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
RNN-GRU-EncDec + VGGish + Word2Vec
-
0.18
-
-
Audio Captioning using Gated Recurrent Units
-
VAST
19
0.519
19.3
40.8
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Ensemble-RL
-
0.468
-
-
THE SJTU SYSTEM FOR DCASE2021 CHALLENGE TASK 6: AUDIO CAPTIONING BASED ON ENCODER PRE-TRAINING AND REINFORCEMENT LEARNING
Ensemble
-
0.400
-
-
THE DCASE 2021 CHALLENGE TASK 6 SYSTEM: AUTOMATED AUDIO CAPTIONING WITH WEAKLY SUPERVISED PRE-TRAING AND WORD SELECTION METHODS
-
Ensemble
-
0.319
-
-
The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning with Keywords and Sentence Length Estimation
-
Qwen-Audio
-
0.441
-
-
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
SLAM-AAC
-
0.515
0.197
-
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs
LOAE
-
0.513
0.197
-
Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding
Audio Flamingo (Pengi trainset)
17.4
0.489
18.7
39.4
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
0 of 10 row(s) selected.
Previous
Next