HyperAI

Guess – Think – Answer

Date

11 days ago

Organization

Paper URL

Tags

Guess–Think–Answer (GTA) was proposed by the vivo AI Lab algorithm team in September 2025, and the relevant research results were published in the paper "GTA: Supervised-Guided Reinforcement Learning for Text Classification with Large Language Models".

The GTA framework works by first having the model generate an initial guess (optimized through cross-entropy loss), then reflecting on this guess to generate the final answer, while simultaneously using reinforcement learning (RL) rewards to shape the final output and the format of the entire GTA structure. This framework enables the model to spontaneously learn effective inference patterns through RL, eliminating the need for manual annotation of the inference chain, and combining the efficiency of supervised fine-tuning (SFT) with the enhanced capabilities of RL within a unified training paradigm.

Guess – Think – Answer

Build AI with AI

Hyper Newsletters

Command Palette

Guess – Think – Answer

Build AI with AI

Hyper Newsletters