HyperAI

Alpaca-Cleaned Instruction Fine-tuning Dataset

The Alpaca-Cleaned dataset is a cleaned version of the original Alpaca dataset released by Stanford University in 2024. The original Alpaca is a dataset of 52,000 instructions and demonstrations generated by the engine of OpenAI (text-davinci-003). This instruction data can be used to perform instruction tuning on language models, making them better at following instructions.

This dataset solves some problems in the original Alpaca, such as hallucinatory answers, merged instructions, empty outputs, and inconsistent input fields, thereby improving the quality and consistency of the data. The Alpaca-Cleaned dataset has a variety of application scenarios, including text generation, question-answering systems, natural language understanding, and code understanding and generation. Its features include quality optimization, performance improvement, rich model resources, and open source and community support. It encourages community participation, continuous updates and improvements, and promotes the development of the NLP field.

Alpaca-Cleaned.torrent
Seeding 2Downloading 0Completed 107Total Downloads 114
  • Alpaca-Cleaned/
    • README.md
      1.57 KB
    • README.txt
      3.15 KB
      • data/
        • Alpaca-Cleaned.zip
          13.98 MB