Date

4 years ago

Size

4.51 GB

Organization

Publish URL

ai.baidu.com

Tags

Audio and Speech Processing

Natural Language Processing

Audio Recognition

Translation

Text-to-Speech

BSTC stands for Baidu Speech Translation Corpus, which is a large-scale automatic simultaneous interpretation dataset used for the construction of automatic simultaneous interpretation systems. The corpus is divided into three subsets: training set, development set and test set. Each subset includes: -Sound signal file, named baidu_XX.wav

Description file, including description information of each sound signal, each sentence is encoded in JSON format -Supplementary documentation, including detailed descriptions of speeches and reports

CCMT_2019_BSTC.torrent

Seeding 2Downloading 0Completed 550Total Downloads 831

CCMT_2019_BSTC/
- README.md
  1.14 KB
- README.txt
  2.29 KB

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.