HyperAIHyperAI

AudioSetCaps Audio Subtitle Dataset

Date

9 months ago

Size

120.7 MB

Organization

Nanyang Technological University
University of Surrey

Publish URL

github.com

License

CC BY 4.0

The dataset was released in 2024 by researchers from Northwestern Polytechnical University, Xi'an Lianfeng Acoustic Technology Co., Ltd., Nanyang Technological University, University of Surrey, and the Institute of Acoustics, Chinese Academy of Sciences.AudioSetCaps: Enriched Audio Captioning Dataset Generation Using Large Audio Language Models", has been accepted by NeurIPS 24.

AudioSetCaps is an audio-caption dataset containing 6,117,099 10-second audio files. Each audio file is accompanied by a descriptive title and 3 Q&A pairs as metadata for generating the final caption (a total of 18,414,789 pairs of Q&A data).

It is created using an automated generation pipeline of large audio and language models using data from three audio datasets: AudioSet, YouTube-8M, and VGGSound.

AudioSetCaps.torrent
Seeding 1Downloading 0Completed 82Total Downloads 116
  • AudioSetCaps/
    • README.md
      1.63 KB
    • README.txt
      3.27 KB
      • data/
        • AudioSetCaps.zip
          120.7 MB