HyperAI

CCMT 2019-BSTC Speech Translation Corpus

Date

3 years ago

Size

4.51 GB

Organization

Baidu

Publish URL

ai.baidu.com

特色图像

BSTC stands for Baidu Speech Translation Corpus, which is a large-scale automatic simultaneous interpretation dataset used for the construction of automatic simultaneous interpretation systems.

The corpus is divided into three subsets: training set, development set and test set. Each subset includes:

-Sound signal file, named baidu_XX.wav

- Description file, including description information of each sound signal, each sentence is encoded in JSON format

-Supplementary documentation, including detailed descriptions of speeches and reports

CCMT_2019_BSTC.torrent
Seeding 2Downloading 0Completed 387Total Downloads 591
  • CCMT_2019_BSTC/
    • README.md
      1.14 KB
    • README.txt
      2.29 KB
      • data/
        • Train_sample.zip
          111.89 MB
        • development_data.zip
          248.54 MB
        • training_data.zip
          4.51 GB