ChildMandarin Children's Chinese Conversation Speech Dataset
The ChildMandarin dataset is a comprehensive Mandarin speech dataset for children aged 3 to 5 years old, released in 2025 by the Zhiyuan Research Institute and the Human Language Technology Laboratory (HLT Lab) of the School of Computer Science at Nankai University. This dataset is designed to solve the problem of scarcity of Mandarin speech data for this age group. The relevant paper results are:ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5", which aims to support the development of related research fields such as children's speech recognition and speaker verification.
Dataset features:
- Large data size: 397 children, totaling 41.25 hours of conversational speech from 3-5 years old, which has certain advantages among similar data sets
- Wide geographical coverage: Data is collected from 22 provinces and cities, ensuring regional diversity and covering different accents and speech habits
- Natural and realistic interaction: The collection method of parent-guided dialogue is adopted to simulate natural communication scenes and make the voice more realistic.
ChildMandarin.torrent
Seeding 2Downloading 0Completed 14Total Downloads 22