Command Palette
Search for a command to run...
NonverbalTTS non-verbal Audio Generation Dataset
Date
Size
Paper URL
License
Apache 2.0
*This dataset supports online use.Click here to jump.
NonverbalTTS is a non-verbal audio generation dataset released by VK Lab and Yandex in 2025. The related paper results are "NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech", which aims to promote expressive text-to-audio (TTS) research and support models to generate natural speech that contains emotions and non-verbal sounds.
The dataset contains 17 hours of high-quality speech data from 2,296 participants (60% males, 40% females), covering 10 non-verbal speech types (breathing, laughing, sighing, sneezing, coughing, throat clearing, groaning, grunting, snoring, and inhaling) and 8 emotion categories (anger, disgust, fear, happiness, neutral, sadness, surprise, and other).
Dataset features:
- Multi-source data: derived from VoxCeleb and Expresso corpora
- Rich metadata: emotion tags, non-verbal speech annotations, speaker IDs, audio quality metrics
- Sampling rate: 16kHz for audio from VoxCeleb, 48kHz for audio from Expresso
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.