AudioSetCaps Audio Subtitle Dataset
Date
Size
Publish URL
License
CC BY 4.0
Categories
The dataset was released in 2024 by researchers from Northwestern Polytechnical University, Xi'an Lianfeng Acoustic Technology Co., Ltd., Nanyang Technological University, University of Surrey, and the Institute of Acoustics, Chinese Academy of Sciences.AudioSetCaps: Enriched Audio Captioning Dataset Generation Using Large Audio Language Models", has been accepted by NeurIPS 24.
AudioSetCaps is an audio-caption dataset containing 6,117,099 10-second audio files. Each audio file is accompanied by a descriptive title and 3 Q&A pairs as metadata for generating the final caption (a total of 18,414,789 pairs of Q&A data).
It is created using an automated generation pipeline of large audio and language models using data from three audio datasets: AudioSet, YouTube-8M, and VGGSound.