ChatHaruhi-RolePlaying Role-playing Dialogue Dataset

* This dataset supports online use.Click here to jump.
ChatHaruhi is a dataset containing 32 Chinese/English TV/anime characters and over 54k simulated dialogues.
Role-playing chatbots built with large language models have attracted widespread attention, but more advanced technology is needed to imitate specific fictional characters. The researchers proposed an algorithm to control the language model through improved prompts and memory of characters extracted from scripts. By collecting corpora from movies, novels, and scripts and performing structured extraction, the researchers collected more than 23,000 dialogue messages. These dialogue data can be used to train and test role-playing language models. At the same time, using the algorithm proposed by the researchers and with the help of GPT3 and GPT4, the researchers generated more than 27,000 additional dialogues for these characters.