LongAlign 10K Large Model Long Context Alignment Dataset
Date
a year ago
Size
392.42 MB
Publish URL
Tags
Categories
LongAlign-10k is a dataset proposed by Tsinghua University to address the challenges faced by large models in long-context alignment tasks. It contains 10,000 long instruction data with a length between 8k and 64k.
During the construction process, the dataset first draws materials from 9 different fields such as books, encyclopedias, academic papers, and codes, and then uses the Claude 2.1 large model to generate diverse tasks and answers in a long context. This dataset is designed to evaluate the performance of large models in long contexts and their ability to follow 10k-100k length task instructions.
LongAlign.torrent
Seeding 2Downloading 2Completed 157Total Downloads 273