Free Spoken Digit Dataset (FSDD) Digital Recognition Audio Dataset
Date
Size
Publish URL
License
CC BY-SA 4.0
Categories
The Free Spoken Digit Dataset (FSDD) is a simple audio/speech dataset consisting of digital speech recordings in wav files with a sampling rate of 8kHz. The recordings have been cropped to minimize silence at the beginning and end. The dataset is open, meaning it will grow over time as data continues to be contributed.
The FSDD dataset currently includes (as of July 2024):
- 6 different speakers
- 3,000 recordings (50 per speaker)
- English Pronunciation
The files in the dataset are named according to a specific format, for example:{digitLabel}_{speakerName}_{index}.wav
For example, the file name 7_jackson_32.wav
Indicates the 32nd recording of number 7 by speaker jackson.
The FSDD dataset is not only available for academic research, but the community is also encouraged to contribute their own recordings. All recordings should be mono 8kHz wav files and cropped to minimize silence.