Search for a command to run...
When Tools Fail: Benchmarking Dynamic Replanning and Anomaly Recovery in LLM Agents