HyperAI

OpenWebMath Open Web Mathematics Training Dataset

Date

a year ago

Size

44.21 GB

Organization

University of Cambridge
University of Toronto

Publish URL

huggingface.co

OpenWebMath is a dataset containing high-quality mathematical text from most of the Internet. It is filtered and extracted from more than 200B HTML files on Common Crawl, resulting in a set of 6.3 million documents containing a total of 14.7B tokens. OpenWebMath is intended for pre-training andFine-tuningLarge language models.

OpenWebMath.torrent
Seeding 1Downloading 1Completed 157Total Downloads 212
  • OpenWebMath/
    • README.md
      1.13 KB
    • README.txt
      2.26 KB
      • data/
        • open-web-math.zip
          44.21 GB