Tina: Low-Cost, High-Efficiency Boost for Language Model Inference Performance
How can we achieve powerful reasoning capabilities in language models at the lowest possible cost? To address this central question, we introduce Tina, a suite of compact reasoning models designed to deliver high performance with minimal resources. One of the standout features of Tina is its ability to significantly enhance reasoning performance using very limited resources. Specifically, Tina employs Low-Rank Adaptation (LoRA) techniques on a small base model with only 1.5 billion parameters. By applying parameter-efficient updates during the reinforcement learning (RL) process, this minimalist approach yields models that match or even outperform current state-of-the-art (SOTA) RL reasoning models based on similar architectures, all while drastically reducing computational costs. For instance, the best-performing Tina model achieves over 20% improvement in reasoning performance on the AIME24 dataset, reaching a Pass@1 accuracy of 43.33%. The total cost for fine-tuning and evaluating this model was just $9, representing a cost reduction of approximately 260 times compared to existing SOTA models. Our research underscores the remarkable effectiveness of using LoRA for efficient RL inference. We validated this through multiple open-source reasoning datasets and a series of ablation experiments, all of which began with a single, fixed set of hyperparameters. These experiments further suggest that LoRA's efficiency and efficacy stem from its ability to quickly adapt the model to the specific inference structures reinforced by RL rewards, while largely preserving the underlying knowledge of the base model. In the spirit of promoting open research and accessibility, we have fully open-sourced all of our code, training logs, and model weights and checkpoints. This transparency allows other researchers to reproduce and build upon our findings, potentially leading to new advancements in the field of cost-effective reasoning models.
