CCMT 2019-BSTC Speech Translation Corpus
Date
3 years ago
Size
4.51 GB
Publish URL
Categories

BSTC stands for Baidu Speech Translation Corpus, which is a large-scale automatic simultaneous interpretation dataset used for the construction of automatic simultaneous interpretation systems.
The corpus is divided into three subsets: training set, development set and test set. Each subset includes:
-Sound signal file, named baidu_XX.wav
- Description file, including description information of each sound signal, each sentence is encoded in JSON format
-Supplementary documentation, including detailed descriptions of speeches and reports
CCMT_2019_BSTC.torrent
Seeding 2Downloading 0Completed 387Total Downloads 591