HyperAI

MusicPile Large Music Dataset

Date

a year ago

Size

6.33 GB

Organization

Publish URL

huggingface.co

MusicPile is a large-scale music-language pre-training dataset jointly launched by the Multimodal Art Projection Research Community, Skywork AI, and the Hong Kong University of Science and Technology. The dataset contains 5.17 million samples and approximately 4.16 billion tokens, from sources including online corpora, encyclopedias, music books, YouTube music subtitles, ABC notation works, mathematical content, and code. The dataset contains three fields: id, text, and src, and each text has no more than 2,048 tokens. MusicPile covers a wide range of music common sense, knowledge questions and answers, and typical music theory content, which plays a key role in improving the music understanding and creation capabilities of large models.

MusicPile.torrent
Seeding 2Downloading 1Completed 136Total Downloads 326
  • MusicPile/
    • README.md
      1.3 KB
    • README.txt
      2.61 KB
      • data/
        • MusicPile.zip
          6.33 GB