Search for a command to run...
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning