DocBank Text Dataset

DocBank is a text dataset. The dataset contains 500,000 document pages with fine-grained, term-level annotations for document layout analysis. The dataset is constructed in a simple and effective way with weak supervision from \LaTeX{} documents available on arXiv.com.
DocBank.torrent
Seeding 1Downloading 2Completed 299Total Downloads 613