Search for a command to run...
A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks