HyperAI

Free Spoken Digit Dataset (FSDD) Digital Recognition Audio Dataset

Date

a year ago

Size

15.67 MB

Publish URL

github.com

License

CC BY-SA 4.0

The Free Spoken Digit Dataset (FSDD) is a simple audio/speech dataset consisting of digital speech recordings in wav files with a sampling rate of 8kHz. The recordings have been cropped to minimize silence at the beginning and end. The dataset is open, meaning it will grow over time as data continues to be contributed.

The FSDD dataset currently includes (as of July 2024):

  • 6 different speakers
  • 3,000 recordings (50 per speaker)
  • English Pronunciation

The files in the dataset are named according to a specific format, for example:{digitLabel}_{speakerName}_{index}.wav For example, the file name 7_jackson_32.wav Indicates the 32nd recording of number 7 by speaker jackson.

The FSDD dataset is not only available for academic research, but the community is also encouraged to contribute their own recordings. All recordings should be mono 8kHz wav files and cropped to minimize silence.

FSDD.torrent
Seeding 1Downloading 0Completed 102Total Downloads 202
  • FSDD/
    • README.md
      1.6 KB
    • README.txt
      3.2 KB
      • data/
        • free-spoken-digit-dataset-master.zip
          15.67 MB