Search for a command to run...
TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding