Search for a command to run...
MMTF: Multi-Modal Temporal Fusion for Commonsense Video Question Answering