Search for a command to run...
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models