HyperAI

ShareGPT 90k Chinese and English Bilingual Human-machine Question Answering Dataset

ShareGPT-Chinese-English-90k is a high-quality human-machine question-answering dataset in parallel in Chinese and English, covering user questions in real and complex scenarios. It can be used to train high-quality dialogue models (which are more robust in instruction distribution than those generated by repeatedly calling API interfaces to simulate machine questions and answers).

The characteristics of this dataset are:

  • At the same time, it provides Chinese and English parallel comparison corpora with exactly the same meaning, which can be used for bilingual dialogue model training.
  • All questions are not artificially imagined or fake data created by API polling (such as Moss), which is more in line with the command distribution and question expression of real user scenarios.
  • The Sharegpt dataset is collected through spontaneous sharing by netizens, which is equivalent to a very natural filtering (through human sense), screening out most of the conversations with bad experiences.
ShareGPT-Chinese-English-90k.torrent
Seeding 1Downloading 1Completed 186Total Downloads 525
  • ShareGPT-Chinese-English-90k/
    • README.md
      1.5 KB
    • README.txt
      2.99 KB
      • data/
        • sharegpt-ec.zip
          730.58 MB