HyperAIHyperAI

ODSQA Open Domain Spoken Question Answering Dataset

Date

2 years ago

Size

52.24 MB

Organization

Cornell University

Publish URL

github.com

Paper URL

arxiv.org

ODSQA (Open-Domain Spoken Question Answering Dataset) From ODSQA: Open-domain Spoken Question Answering Dataset This is a Chinese dataset. In addition, an English dataset Spoken-SQuAD is also provided.hereturn up.

Spoken-SQuAD  It is a spoken intelligent question-answering corpus generated from the SQuAD dataset through Google's text-to-speech (TTS) system. Although Spoken-SQuAD is large enough to train the most advanced intelligent question-answering models, it is artificially generated, so there is still a certain gap with real spoken question-answering. Therefore, researchers released an SQA dataset containing more than three thousand questions, called ODSQA. It is currently the largest real SQA dataset for extraction-based intelligent question-answering tasks.

ODSQA.torrent
Seeding 1Downloading 0Completed 252Total Downloads 491
  • ODSQA/
    • DRCD-TTS.json
      10.15 MB
    • DRCD-backtrans.json
      23.2 MB
    • ODSQA_spokenq_test-v1.1.json
      25.05 MB
    • ODSQA_textq_test-v1.1.json
      26.11 MB
    • README.md
      26.12 MB
    • README.txt
      26.12 MB
      • data/
        • DRCD-TTS.json
          36.27 MB
        • DRCD-backtrans.json
          49.32 MB
        • ODSQA_spokenq_test-v1.1.json
          51.17 MB
        • ODSQA_textq_test-v1.1.json
          52.23 MB
        • README.md
          52.24 MB
        • download.sh
          52.24 MB
    • download.sh
      52.24 MB
ODSQA Open Domain Spoken Question Answering Dataset | Datasets | HyperAI