Command Palette
Search for a command to run...
HiFiTTS-2 Large-Scale High-Bandwidth Speech Dataset
Date
Paper URL
License
CC BY 4.0
HiFiTTS-2 is a large-scale high-bandwidth speech dataset released by NVIDIA in 2025. The related paper results are "HiFiTTS-2: A Large-Scale High Bandwidth Speech Dataset", designed to support the training and evaluation of high-quality zero-shot text-to-speech (TTS) models.
This dataset contains audio metadata from 5,000 speakers, approximately 36,700 hours of English speech recordings at 22.05 kHz and 31,700 hours at 44.1 kHz, organized strata by bandwidth quality and sampling rate. The data is sourced from LibriVox audiobooks, available for download from LibriVox, and is sampled at 48 kHz, making it suitable for training high-resolution vocoders and non-autoregressive speech synthesis models.
The data includes:
- Voice audio (22 kHz / 44 kHz, mono)
- Transcript and chapter/episode metadata
- Speaker and bandwidth quality estimation, segmentation timestamp
- Training/validation manifests and example configurations
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.