Search for a command to run...
Joint Visual Grounding and Tracking with Natural Language Specification