6 months ago

Abstract

One core capability of large language models (LLMs) is to follow naturallanguage instructions. However, the issue of automatically constructinghigh-quality training data to enhance the complex instruction-followingabilities of LLMs without manual annotation remains unresolved. In this paper,we introduce AutoIF, the first scalable and reliable method for automaticallygenerating instruction-following training data. AutoIF transforms thevalidation of instruction-following data quality into code verification,requiring LLMs to generate instructions, the corresponding code to check thecorrectness of the instruction responses, and unit test samples to verify thecode's correctness. Then, execution feedback-based rejection sampling cangenerate data for Supervised Fine-Tuning (SFT) and Reinforcement Learning fromHuman Feedback (RLHF) training. AutoIF achieves significant improvements acrossthree training algorithms, SFT, Offline DPO, and Online DPO, when applied tothe top open-source LLMs, Qwen2 and LLaMA3, in self-alignment andstrong-to-weak distillation settings. Our code is publicly available athttps://github.com/QwenLM/AutoIF.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

6 months ago

Supervised Fine-Tuning

LLM

Code Generation

Method/Architecture

Natural Language Processing

Task/Problem

Guanting Dong Keming Lu Chengpeng Li Tingyu Xia Bowen Yu Chang Zhou Jingren Zhou

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

6 months ago

Supervised Fine-Tuning

LLM

Code Generation

Method/Architecture

Natural Language Processing

Task/Problem

Guanting Dong Keming Lu Chengpeng Li Tingyu Xia Bowen Yu Chang Zhou Jingren Zhou

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models | Papers | HyperAI

Command Palette

Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models

Guanting Dong Keming Lu Chengpeng Li Tingyu Xia Bowen Yu Chang Zhou Jingren Zhou

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models

Guanting Dong Keming Lu Chengpeng Li Tingyu Xia Bowen Yu Chang Zhou Jingren Zhou

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models

Guanting Dong Keming Lu Chengpeng Li Tingyu Xia Bowen Yu Chang Zhou Jingren Zhou

Abstract

Build AI with AI

HyperAI Newsletters