Search for a command to run...
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning