Nemotron-Personas Character Dataset
Nemotron-Personas is a character dataset released by NVIDIA in 2025. It contains artificially synthesized characters based on real-world demographics, geographic distribution, and personality traits, aiming to capture the diversity and richness of the population. It is the first dataset of its kind to have statistics related to attributes such as name, gender, age, background, marital status, education, occupation, and place of residence.
The dataset includes:
- 100,000 records, containing 22 fields: 6 persona fields and 16 context fields
- About 54 million tokens, of which about 23.6 million are character-related
- Covering multiple dimensions including demographics, geographic distribution and personality traits
- Over 560 different job types based on real-world job distribution data