Search for a command to run...
Specification Self-Correction: Mitigating In-Context Reward Hacking Through Test-Time Refinement