HyperAIHyperAI

Command Palette

Search for a command to run...

M2Lingual Multi-language Multi-round Instruction Fine-tuning Dataset

Date

a year ago

Size

649.13 MB

Organization

ServiceNow Research
University of Illinois at Chicago

Paper URL

arxiv.org

M2Lingual is a multi-lingual, multi-round instruction fine-tuning (IFT) dataset designed to improve the performance of large language models (LLMs) in following instructions, especially on diverse languages and tasks. The dataset was proposed in 2024 by a research team from ServiceNow and the University of Illinois at Chicago.

The main features of the M2Lingual dataset include:

  1. Multi-language coverage: M2Lingual covers 70 different languages, providing more training data for low-resource languages.
  2. Multi-turn dialogue: The dataset contains multiple rounds of instructions and responses, which enhances the model's ability to handle complex dialogue scenarios.
  3. Task-oriented: M2Lingual includes 17 natural language processing (NLP) tasks, such as summarization, question answering, and general command-response pairs.
  4. Large scale: The dataset contains a total of 182,000 instruction fine-tuning pairs, providing rich training samples.
  5. Synthetic Dataset:M2Lingual is a completely synthetic dataset generated using a specific evolutionary taxonomy, ensuring the diversity and complexity of the data.
  6. Performance Improvements: LLM fine-tuned using M2Lingual shows superior performance over existing multilingual IFT datasets on multiple evaluation benchmarks.

The introduction of M2Lingual provides a new solution to the problem of multi-language and multi-round instruction alignment, which helps to improve the practicality and accuracy of large language models in multi-language environments.

M2Lingual.torrent
Seeding 1Downloading 0Completed 198Total Downloads 254
  • M2Lingual/
    • README.md
      2.11 KB
    • README.txt
      4.22 KB
      • data/
        • M2Lingual.zip
          649.13 MB

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp