GTZAN Music Genre Classification Dataset
Date
Size
Publish URL
Categories
The GTZAN dataset is the most commonly used public dataset in machine listening music genre recognition (MGR) research. These files were collected in 2000-2001 from various sources including personal CDs, radio, microphone recordings.
The GTZAN dataset is a standard dataset widely used for music information retrieval, music classification, and other music-related tasks. It contains 1,000 music samples from 10 different genres, with 100 samples in each genre. These audio samples are 30 seconds long, sampled at 22050 Hz, and stored in 16-bit mono .wav format. This dataset was originally created by the Marsyas Music Information Retrieval Toolkit and is widely used to evaluate the performance of music classification algorithms.
The dataset contains:
- Original genre– A collection of 10 genres, each with 100 audio files, all 30 seconds long (the famous GTZAN dataset, MNIST for sound)
- Original Image– A visual representation of each audio file. One way to classify the data is through a neural network. Since NNs (like the CNN we will be using today) usually take some kind of image representation, the audio files are converted to mel-spectrograms to achieve this.
- 2 CSV files– Contains features of audio files. One file contains the mean and variance for each song (30 seconds long), calculated using multiple features that can be extracted from the audio files. The other file has the same structure, but the songs were split into 3-second audio files.