UniVTG (w/ PT) | 65.43 | 50.06 | 43.63 | 64.06 | 45.02 | UniVTG: Towards Unified Video-Language Temporal Grounding | - |
UVCOM (w/ PT ASR Captions) | 64.53 | 48.31 | 43.8 | 64.78 | 43.65 | Bridging the Gap: A Unified Video Comprehension Framework for Moment
Retrieval and Highlight Detection | - |
QD-DETR (only Video) | 62.40 | 44.98 | 39.86 | 62.52 | 39.88 | Query-Dependent Video Representation for Moment Retrieval and Highlight Detection | - |
LA-DETR | 63.94 | 51.10 | 47.93 | 65.65 | 49.44 | Length-Aware DETR for Robust Moment Retrieval | - |