CCMT 2019-BSTC Speech Translation Corpus

BSTC stands for Baidu Speech Translation Corpus, which is a large-scale automatic simultaneous interpretation dataset used for the construction of automatic simultaneous interpretation systems.
The corpus is divided into three subsets: training set, development set and test set. Each subset includes:
-Sound signal file, named baidu_XX.wav
- Description file, including description information of each sound signal, each sentence is encoded in JSON format
-Supplementary documentation, including detailed descriptions of speeches and reports
CCMT_2019_BSTC.torrent
Seeding 2Downloading 0Completed 463Total Downloads 669