Isabelle Parallel Corpus
Date
Publish URL
Categories
A parallel corpus is a collection of texts in one language paired with corresponding texts in another language.
The Isabelle Parallel Corpus (IPC) is a community-driven initiative to create a parallel corpus of Isabelle documents. IPC pairs formal documents in Isabelle (such as theorems, lemmas, definitions, etc.) with their natural language counterparts.
Entries in the IPC have multiple levels of data annotation. At the first level, the natural language statements of the document (e.g., theorem statements) and the natural language source (e.g., textbook) are recorded. The second level of annotation pairs sentences in the natural language proof with statements in the corresponding Isabelle proof script. The third level of annotation allows the alignment of words in the natural language with tokens in the Isabelle proof script.