Guess – Think – Answer
Guess–Think–Answer (GTA) was proposed by the vivo AI Lab algorithm team in September 2025, and the relevant research results were published in the paper "GTA: Supervised-Guided Reinforcement Learning for Text Classification with Large Language Models".
The GTA framework works by first having the model generate an initial guess (optimized through cross-entropy loss), then reflecting on this guess to generate the final answer, while simultaneously using reinforcement learning (RL) rewards to shape the final output and the format of the entire GTA structure. This framework enables the model to spontaneously learn effective inference patterns through RL, eliminating the need for manual annotation of the inference chain, and combining the efficiency of supervised fine-tuning (SFT) with the enhanced capabilities of RL within a unified training paradigm.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.