Search for a command to run...
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion