Search for a command to run...
A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer