Clotho Audio Subtitles Dataset
Date
Size
Publish URL
Paper URL
License
Other

Clotho is an audio captioning dataset. The dataset focuses on the content of audio and the diversity of captions, and consists of 4,981 audio samples, each with 5 captions (24,905 captions in total), with a duration of 15 to 30 seconds and a caption length of 8 to 20 words.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.