HyperAI초신경
홈
뉴스
최신 연구 논문
튜토리얼
데이터셋
백과사전
SOTA
LLM 모델
GPU 랭킹
컨퍼런스
전체 검색
소개
한국어
HyperAI초신경
Toggle sidebar
전체 사이트 검색...
⌘
K
홈
SOTA
Audio Captioning
Audio Captioning On Clotho
Audio Captioning On Clotho
평가 지표
BLEU-4
CIDEr
METEOR
ROUGE-L
평가 결과
이 벤치마크에서 각 모델의 성능 결과
Columns
모델 이름
BLEU-4
CIDEr
METEOR
ROUGE-L
Paper Title
Repository
VALOR
16.2
0.423
17.4
38.2
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
RNN-GRU-EncDec + VGGish + Word2Vec
-
0.18
-
-
Audio Captioning using Gated Recurrent Units
-
VAST
19
0.519
19.3
40.8
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Ensemble-RL
-
0.468
-
-
THE SJTU SYSTEM FOR DCASE2021 CHALLENGE TASK 6: AUDIO CAPTIONING BASED ON ENCODER PRE-TRAINING AND REINFORCEMENT LEARNING
Ensemble
-
0.400
-
-
THE DCASE 2021 CHALLENGE TASK 6 SYSTEM: AUTOMATED AUDIO CAPTIONING WITH WEAKLY SUPERVISED PRE-TRAING AND WORD SELECTION METHODS
-
Ensemble
-
0.319
-
-
The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning with Keywords and Sentence Length Estimation
-
Qwen-Audio
-
0.441
-
-
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
SLAM-AAC
-
0.515
0.197
-
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs
LOAE
-
0.513
0.197
-
Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding
Audio Flamingo (Pengi trainset)
17.4
0.489
18.7
39.4
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
0 of 10 row(s) selected.
Previous
Next