HyperAI

Moment Retrieval On Qvhighlights

Metriken

R@1 IoU=0.5
R@1 IoU=0.7
mAP
mAP@0.5
mAP@0.75

Ergebnisse

Leistungsergebnisse verschiedener Modelle zu diesem Benchmark

Modellname
R@1 IoU=0.5
R@1 IoU=0.7
mAP
mAP@0.5
mAP@0.75
Paper TitleRepository
SG-DETR72.2056.6054.1073.2055.80Saliency-Guided DETR for Moment Retrieval and Highlight Detection
LLMEPET66.7349.9444.0565.7643.91Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval
DenoiseLoc59.2745.07---Boundary-Denoising for Video Activity Localization
BAM-DETR62.7148.6445.3664.5746.33BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos
UMT--36.12--UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection
VideoLights-B-pt70.3655.2547.9469.5349.17VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval
UniVTG (w/ PT)65.4350.0643.6364.0645.02UniVTG: Towards Unified Video-Language Temporal Grounding
UVCOM (w/ PT ASR Captions)64.5348.3143.864.7843.65Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection
QD-DETR (only Video)62.4044.9839.8662.5239.88Query-Dependent Video Representation for Moment Retrieval and Highlight Detection
R^2-Tuning68.0349.3546.1769.0447.56$R^2$-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
FlashVTG70.6953.9652.0072.3353.85FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding
SeViLA-Localizer54.536.532.3----
QD-DETR (w/ audio)63.0645.1040.1963.0440.10Query-Dependent Video Representation for Moment Retrieval and Highlight Detection
BAM-DETR (w/ audio)64.0748.1246.9165.6147.51BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos
CG-DETR65.4348.3842.8664.5142.77Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding
UnLoc-L66.146.7---UnLoc: A Unified Framework for Video Localization Tasks
LD-DETR 66.8051.0446.4167.61 46.99LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection
LA-DETR63.9451.1047.9365.6549.44Length-Aware DETR for Robust Moment Retrieval
LLaVA-MR76.5961.4852.7369.4154.40LLaVA-MR: Large Language-and-Vision Assistant for Video Moment Retrieval-
BM-DETR60.1243.0540.0863.0840.18Background-aware Moment Detection for Video Moment Retrieval
0 of 32 row(s) selected.