HyperAI

CoSER Role-playing Dataset

Date

2 months ago

Size

1.53 GB

Organization

Fudan University

Publish URL

github.com

License

MIT

CoSER (CoOrdinating LLM-Based Persona Simulation of Established Roles) dataset is a large-scale real-world dataset focusing on role-playing, jointly constructed by Fudan University and Jieyuexingchen in 2025. The relevant paper results are "CoSER: Coordinating LLM-Based Persona Simulation of Established Roles". It extracts data from 771 of the world's most famous books, covering 17,966 characters and 29,798 real dialogues. Unlike previous datasets, the CoSER dataset not only contains character overviews and dialogues, but also provides rich content such as plot summaries, character experiences, and dialogue backgrounds. In addition, the dialogue content covers three dimensions: language, actions, and thoughts, making the character performance more three-dimensional. The uniqueness of the CoSER dataset lies in its authenticity and comprehensiveness. It extracts real character dialogues from classic literary works, retains the complexity of the dialogues, and is a natural multi-round, multi-character high-quality dialogue data.

An example from the CoSER dataset, which provides comprehensive data types such as dialogue and settings, plot summaries, and character inner thoughts, authentically sourced from well-known books.
CoSER.torrent
Seeding 0Downloading 1Completed 29Total Downloads 68
  • CoSER/
    • README.md
      1.76 KB
    • README.txt
      3.51 KB
      • data/
        • CoSER.zip
          1.53 GB