Search for a command to run...
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale