OpenWebMath Open Web Mathematics Training Dataset
Date
a year ago
Size
44.21 GB
Publish URL
Categories
OpenWebMath is a dataset containing high-quality mathematical text from most of the Internet. It is filtered and extracted from more than 200B HTML files on Common Crawl, resulting in a set of 6.3 million documents containing a total of 14.7B tokens. OpenWebMath is intended for pre-training andFine-tuningLarge language models.
OpenWebMath.torrent
Seeding 1Downloading 1Completed 157Total Downloads 212