Command Palette
Search for a command to run...
MedChatZH Chinese Medical Conversation Command Dataset
Date
Size
Publish URL
Paper URL
License
Apache 2.0
*This dataset supports online use.Click here to jump.
MedChatZH is a Chinese medical conversation dataset released by East China University of Science and Technology in 2023. The related paper results are "MedChatZH: A tuning LLM for traditional Chinese medicine consultations", which aims to improve the ability to understand and generate Chinese (especially in TCM scenarios) medical consultation dialogues through continuous pre-training of TCM classics and fine-tuning of medical instruction data.
The data comes from over 1,000 TCM classics and medical notes, as well as over 7 million Chinese medical instructions collected from the internet and multiple Chinese hospitals. Combined with the BELLE-3.5M general instructions, 763,629 medical instructions and 1,305,194 general instructions were screened and cleaned to form the med-mix-2M dataset for dialogue fine-tuning. Together with the TCM classics corpus, they serve the two stages of continued pre-training and instruction fine-tuning.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.
