Search for a command to run...
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning