Search for a command to run...
On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training