HyperAI

ODSQA Open Domain Spoken Question Answering Dataset

ODSQA (Open-Domain Spoken Question Answering Dataset) From ODSQA: Open-domain Spoken Question Answering Dataset This is a Chinese dataset. In addition, an English dataset Spoken-SQuAD is also provided.hereturn up.

Spoken-SQuAD  It is a spoken intelligent question-answering corpus generated from the SQuAD dataset through Google's text-to-speech (TTS) system. Although Spoken-SQuAD is large enough to train the most advanced intelligent question-answering models, it is artificially generated, so there is still a certain gap with real spoken question-answering. Therefore, researchers released an SQA dataset containing more than three thousand questions, called ODSQA. It is currently the largest real SQA dataset for extraction-based intelligent question-answering tasks.

ODSQA.torrent
Seeding 2Downloading 0Completed 213Total Downloads 414
  • ODSQA/
    • DRCD-TTS.json
      10.15 MB
    • DRCD-backtrans.json
      23.2 MB
    • ODSQA_spokenq_test-v1.1.json
      25.05 MB
    • ODSQA_textq_test-v1.1.json
      26.11 MB
    • README.md
      26.12 MB
    • README.txt
      26.12 MB
      • data/
        • DRCD-TTS.json
          36.27 MB
        • DRCD-backtrans.json
          49.32 MB
        • ODSQA_spokenq_test-v1.1.json
          51.17 MB
        • ODSQA_textq_test-v1.1.json
          52.23 MB
        • README.md
          52.24 MB
        • download.sh
          52.24 MB
    • download.sh
      52.24 MB