Search for a command to run...
PRISM: Pre-alignment via Black-box On-policy Distillation for Multimodal Reinforcement Learning