HyperAI

Moment Retrieval On Charades Sta

المقاييس

R@1 IoU=0.5
R@1 IoU=0.7

النتائج

نتائج أداء النماذج المختلفة على هذا المعيار القياسي

اسم النموذج
R@1 IoU=0.5
R@1 IoU=0.7
Paper TitleRepository
video-mamba-suite57.1836.05Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
SG-DETR (w/ PT)71.1052.80Saliency-Guided DETR for Moment Retrieval and Highlight Detection
VideoChat-T (ZS)48.724.0TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning-
SimVTP44.726.3SimVTP: Simple Video Text Pre-training with Masked Autoencoders-
CG-DETR58.4436.34Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding
UnLoc-L60.838.4UnLoc: A Unified Framework for Video Localization Tasks
UMT (VO)49.3526.16UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection
LLaVA-MR70.6549.58LLaVA-MR: Large Language-and-Vision Assistant for Video Moment Retrieval-
SG-DETR70.2049.50Saliency-Guided DETR for Moment Retrieval and Highlight Detection
LD-DETR62.5841.56LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection
UVCOM59.2536.64Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection
Moment-DETR53.6331.37QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries
UMT (VA)48.3129.25UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection
InternVideo2-6B70.0348.95InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
VideoLights-B-pt61.9641.05VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval
InternVideo2-1B68.3645.03InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
BM-DETR59.4838.33Background-aware Moment Detection for Video Moment Retrieval
UniMD+Sync.63.9844.46UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection
Moment-DETR w/ PT (on 10K HowTo100M videos)55.6534.17QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries
UnLoc-B58.135.4UnLoc: A Unified Framework for Video Localization Tasks
0 of 25 row(s) selected.