HyperAI

Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

Li, Pengyi ; Skripkin, Matvey ; Zubrey, Alexander ; Kuznetsov, Andrey ; Oseledets, Ivan
Veröffentlichungsdatum: 6/12/2025
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Abstract

Large language models (LLMs) excel at reasoning, yet post-training remainscritical for aligning their behavior with task goals. Existing reinforcementlearning (RL) methods often depend on costly human annotations or externalreward models. We propose Reinforcement Learning via Self-Confidence (RLSC),which uses the model's own confidence as reward signals-eliminating the needfor labels, preference models, or reward engineering. Applied toQwen2.5-Math-7B with only 16 samples per question and 10 or 20 training steps,RLSC improves accuracy by +13.4% on AIME2024, +21.2% on MATH500, +21.7% onMinerva Math, +20.8% on Olympiadbench, and +9.7% on AMC23. RLSC provides asimple, scalable post-training method for inference models, requiring only asmall number of samples and unlabelled supervision.