HyperAIHyperAI

Command Palette

Search for a command to run...

PsyDTCorpus Psychological Counselor Digital Twin Dataset

Date

2 years ago

Size

9.73 MB

Organization

华南理工大学

Publish URL

github.com

PsyDTCorpus is a digital twin dataset for psychological counselors launched by the School of Future Technology of South China University of Technology-Guangdong Provincial Key Laboratory of Digital Twins in 2024. The core goal of this dataset is to simulate the language style and consulting techniques of specific psychological counselors to support the development and training of the psychological counselor digital twin model SoulChat2.0.SoulChat: Improving LLMs' Empathy, Listening, and Comfort Abilities through Fine-tuning with Multi-turn Empathy Conversations". The PsyDTCorpus dataset targets real multi-round consultation cases of specific psychological counselors. Based on 5k single-round consultation samples, digital twin data synthesis is performed, and finally 5k high-quality mental health dialogue data with the counselor's language style and therapeutic technology application methods are obtained. Among them, 4,760 samples are used as training sets, and 240 samples are split into multiple test samples. The total number of rounds in the dataset is: 90,365, of which the number of rounds in the test set is: 4,311. This dataset uses an innovative data generation framework that combines the language style, counseling techniques of real counselors and the Big Five personality traits of clients to generate data that simulates a single-round conversation. Through this framework, the research team was able to generate multi-round conversation data that effectively characterizes the language style and counseling techniques of specific counselors. In this project, the total number of multi-round conversation data generated reached 90,365, with an average of 18 rounds per conversation sample. PsyDTCorpus was manually evaluated and compared in four professional dimensions: conversation technology, state and attitude, relationship building, and therapy technology. The results showed that it has significant improvements in these aspects compared to other datasets, proving the feasibility of using a small number of consultation cases from real psychological counselors to construct high-quality multi-round mental health conversation data.

Data topic distribution
Data topic distribution

Citation

@inproceedings{xie-etal-2025-psydt, title = “{P}sy{DT}: Using {LLM}s to Construct the Digital Twin of Psychological Counselor with Personalized Counseling Style for Psychological Counseling”, author = “Xie, Haojie and Chen, Yirong and Xing, Xiaofen and Lin, Jingkai and Xu, Xiangmin”, editor = "Che, Wanxiang and" Nabende, Joyce and Shutova, Ekaterina and Pilehvar, Mohammad Taher”, booktitle = “Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)”, month = jul, year = “2025”, address = “Vienna, Austria”, publisher = “Association for Computational Linguistics”, url = “https://aclanthology.org/2025.acl-long.55/”, pages = “1081–1115”, ISBN = “979-8-89176-251-0”, abstract = “Currently, large language models (LLMs) have made significant progress in the field of counseling psychology. However, existing mental health LLMs overlook a critical issue where they do not consider the fact that different psychological counselors exhibit different personal styles, including linguistic style and therapy techniques, etc. As a result, these LLMs fail to satisfy the individual needs of clients who seek different counseling styles. To help bridge this gap, we propose PsyDT, a novel framework using LLMs to construct the Digital Twin of Psychological counselors with personalized counseling style. Compared to the time-consuming and costly approach of collecting a large number of real-world counseling cases to create a specific counselor{'}s digital twin, our framework offers a faster and more cost-effective solution. To construct PsyDT, we utilize dynamic one-shot learning by using GPT-4 to capture counselor{'}s unique counseling style, mainly focusing on linguistic style and therapy techniques. Subsequently, using existing single-turn long-text dialogues with client{'}s questions, GPT-4 is guided to synthesize multi-turn dialogues of specific counselor. Finally, we fine-tune the LLMs on the synthetic dataset, PsyDTCorpus, to achieve the digital twin of psychological counselor with personalized counseling style. Experimental results indicate that our proposed PsyDT framework can synthesize multi-turn dialogues that closely resemble real-world counseling cases and demonstrate better performance compared to other baselines, thereby showing that our framework can effectively construct the digital twin of psychological counselor with a specific counseling style." }

PsyDTCorpus.torrent
Seeding 1Downloading 0Completed 256Total Downloads 570
  • PsyDTCorpus/
    • README.md
      2.47 KB
    • README.txt
      4.95 KB
      • data/
        • PsyDTCorpus.zip
          9.73 MB

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
PsyDTCorpus Psychological Counselor Digital Twin Dataset | Datasets | HyperAI