HyperAI超神経
ホーム
ニュース
最新論文
チュートリアル
データセット
百科事典
SOTA
LLMモデル
GPU ランキング
学会
検索
サイトについて
日本語
HyperAI超神経
Toggle sidebar
サイトを検索…
⌘
K
ホーム
SOTA
Audio Captioning
Audio Captioning On Clotho
Audio Captioning On Clotho
評価指標
BLEU-4
CIDEr
METEOR
ROUGE-L
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
Columns
モデル名
BLEU-4
CIDEr
METEOR
ROUGE-L
Paper Title
Repository
VALOR
16.2
0.423
17.4
38.2
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
RNN-GRU-EncDec + VGGish + Word2Vec
-
0.18
-
-
Audio Captioning using Gated Recurrent Units
-
VAST
19
0.519
19.3
40.8
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Ensemble-RL
-
0.468
-
-
THE SJTU SYSTEM FOR DCASE2021 CHALLENGE TASK 6: AUDIO CAPTIONING BASED ON ENCODER PRE-TRAINING AND REINFORCEMENT LEARNING
Ensemble
-
0.400
-
-
THE DCASE 2021 CHALLENGE TASK 6 SYSTEM: AUTOMATED AUDIO CAPTIONING WITH WEAKLY SUPERVISED PRE-TRAING AND WORD SELECTION METHODS
-
Ensemble
-
0.319
-
-
The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning with Keywords and Sentence Length Estimation
-
Qwen-Audio
-
0.441
-
-
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
SLAM-AAC
-
0.515
0.197
-
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs
LOAE
-
0.513
0.197
-
Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding
Audio Flamingo (Pengi trainset)
17.4
0.489
18.7
39.4
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
0 of 10 row(s) selected.
Previous
Next