Search for a command to run...
Combining On-Policy Optimization and Distillation for Long-Context Reasoning in Large Language Models