Search for a command to run...
Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration