Moment Retrieval On Charades Sta

평가 지표

R@1 IoU=0.5

R@1 IoU=0.7

평가 결과

이 벤치마크에서 각 모델의 성능 결과

모델 이름	R@1 IoU=0.5	R@1 IoU=0.7	Paper Title	Repository
video-mamba-suite	57.18	36.05	Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
SG-DETR (w/ PT)	71.10	52.80	Saliency-Guided DETR for Moment Retrieval and Highlight Detection
VideoChat-T (ZS)	48.7	24.0	TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
SimVTP	44.7	26.3	SimVTP: Simple Video Text Pre-training with Masked Autoencoders	-
CG-DETR	58.44	36.34	Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding
UnLoc-L	60.8	38.4	UnLoc: A Unified Framework for Video Localization Tasks
UMT (VO)	49.35	26.16	UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection
LLaVA-MR	70.65	49.58	LLaVA-MR: Large Language-and-Vision Assistant for Video Moment Retrieval
SG-DETR	70.20	49.50	Saliency-Guided DETR for Moment Retrieval and Highlight Detection
LD-DETR	62.58	41.56	LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection
UVCOM	59.25	36.64	Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection
Moment-DETR	53.63	31.37	QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries
UMT (VA)	48.31	29.25	UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection
InternVideo2-6B	70.03	48.95	InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
VideoLights-B-pt	61.96	41.05	VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval
InternVideo2-1B	68.36	45.03	InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
BM-DETR	59.48	38.33	Background-aware Moment Detection for Video Moment Retrieval
UniMD+Sync.	63.98	44.46	UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection
Moment-DETR w/ PT (on 10K HowTo100M videos)	55.65	34.17	QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries
UnLoc-B	58.1	35.4	UnLoc: A Unified Framework for Video Localization Tasks

0 of 25 row(s) selected.