HyperAI

LongAlign 10K Large Model Long Context Alignment Dataset

Date

a year ago

Size

392.42 MB

Organization

Tsinghua University

Publish URL

huggingface.co

LongAlign-10k is a dataset proposed by Tsinghua University to address the challenges faced by large models in long-context alignment tasks. It contains 10,000 long instruction data with a length between 8k and 64k.

During the construction process, the dataset first draws materials from 9 different fields such as books, encyclopedias, academic papers, and codes, and then uses the Claude 2.1 large model to generate diverse tasks and answers in a long context. This dataset is designed to evaluate the performance of large models in long contexts and their ability to follow 10k-100k length task instructions.

LongAlign.torrent
Seeding 2Downloading 2Completed 157Total Downloads 273
  • LongAlign/
    • README.md
      1.28 KB
    • README.txt
      2.57 KB
      • data/
        • LongAlign-10k.zip
          392.42 MB