Search for a command to run...
Grounding Everything: Emerging Localization Properties in Vision-Language Transformers