Search for a command to run...
Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos