Audio Captioning On Audiocaps
المقاييس
CIDEr
METEOR
SPICE
SPIDEr
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
جدول المقارنة
اسم النموذج | CIDEr | METEOR | SPICE | SPIDEr |
---|---|---|---|---|
enclap-combining-neural-audio-codec-and-audio | 0.8029 | 0.2554 | 0.1879 | 0.4954 |
enclap-analyzing-the-enclap-framework-for | 0.823 | 0.269 | 0.197 | 0.510 |
vast-a-vision-audio-subtitle-text-omni-1 | 0.781 | 0.247 | - | - |
improving-audio-language-learning-with-mixgen | 0.755 | - | 0.177 | 0.466 |
slam-aac-enhancing-audio-captioning-with | 0.841 | 0.268 | 0.194 | 0.518 |
enhancing-automated-audio-captioning-via | 0.816 | 0.267 | 0.193 | 0.505 |
audiocaps-generating-captions-for-audios-in | 0.593 | - | 0.144 | 0.369 |
النموذج 8 | 0.769 | - | 0.181 | 0.475 |
enclap-analyzing-the-enclap-framework-for | 0.815 | 0.257 | 0.188 | 0.501 |
النموذج 10 | 0.8061 | 0.2527 | 0.1841 | 0.4951 |
automated-audio-captioning-by-fine-tuning | 0.753 | - | 0.176 | 0.465 |
enclap-combining-neural-audio-codec-and-audio | 0.7795 | 0.2473 | 0.1863 | 0.4829 |
audio-captioning-transformer | 0.693 | - | 0.159 | 0.426 |
valor-vision-audio-language-omni-perception | 0.741 | 0.231 | - | - |
taming-data-and-transformers-for-audio-1 | 0.832 | 0.253 | 0.182 | 0.507 |
rethinking-transfer-and-auxiliary-learning | 0.764 | 0.242 | 0.180 | 0.472 |