Search for a command to run...
Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input