Search for a command to run...
MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions