CoSER Role-playing Dataset
Date
Size
Publish URL
License
MIT
Categories
CoSER (CoOrdinating LLM-Based Persona Simulation of Established Roles) dataset is a large-scale real-world dataset focusing on role-playing, jointly constructed by Fudan University and Jieyuexingchen in 2025. The relevant paper results are "CoSER: Coordinating LLM-Based Persona Simulation of Established Roles". It extracts data from 771 of the world's most famous books, covering 17,966 characters and 29,798 real dialogues. Unlike previous datasets, the CoSER dataset not only contains character overviews and dialogues, but also provides rich content such as plot summaries, character experiences, and dialogue backgrounds. In addition, the dialogue content covers three dimensions: language, actions, and thoughts, making the character performance more three-dimensional. The uniqueness of the CoSER dataset lies in its authenticity and comprehensiveness. It extracts real character dialogues from classic literary works, retains the complexity of the dialogues, and is a natural multi-round, multi-character high-quality dialogue data.
