Moment Retrieval On Qvhighlights

평가 지표

R@1 IoU=0.5

R@1 IoU=0.7

mAP

mAP@0.5

mAP@0.75

평가 결과

이 벤치마크에서 각 모델의 성능 결과

모델 이름	R@1 IoU=0.5	R@1 IoU=0.7	mAP	mAP@0.5	mAP@0.75	Paper Title	Repository
SG-DETR	72.20	56.60	54.10	73.20	55.80	Saliency-Guided DETR for Moment Retrieval and Highlight Detection
LLMEPET	66.73	49.94	44.05	65.76	43.91	Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval
DenoiseLoc	59.27	45.07	-	-	-	Boundary-Denoising for Video Activity Localization
BAM-DETR	62.71	48.64	45.36	64.57	46.33	BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos
UMT	-	-	36.12	-	-	UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection
VideoLights-B-pt	70.36	55.25	47.94	69.53	49.17	VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval
UniVTG (w/ PT)	65.43	50.06	43.63	64.06	45.02	UniVTG: Towards Unified Video-Language Temporal Grounding
UVCOM (w/ PT ASR Captions)	64.53	48.31	43.8	64.78	43.65	Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection
QD-DETR (only Video)	62.40	44.98	39.86	62.52	39.88	Query-Dependent Video Representation for Moment Retrieval and Highlight Detection
R^2-Tuning	68.03	49.35	46.17	69.04	47.56	$R^2$-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
FlashVTG	70.69	53.96	52.00	72.33	53.85	FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding
SeViLA-Localizer	54.5	36.5	32.3	-	-	-	-
QD-DETR (w/ audio)	63.06	45.10	40.19	63.04	40.10	Query-Dependent Video Representation for Moment Retrieval and Highlight Detection
BAM-DETR (w/ audio)	64.07	48.12	46.91	65.61	47.51	BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos
CG-DETR	65.43	48.38	42.86	64.51	42.77	Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding
UnLoc-L	66.1	46.7	-	-	-	UnLoc: A Unified Framework for Video Localization Tasks
LD-DETR	66.80	51.04	46.41	67.61	46.99	LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection
LA-DETR	63.94	51.10	47.93	65.65	49.44	Length-Aware DETR for Robust Moment Retrieval
LLaVA-MR	76.59	61.48	52.73	69.41	54.40	LLaVA-MR: Large Language-and-Vision Assistant for Video Moment Retrieval
BM-DETR	60.12	43.05	40.08	63.08	40.18	Background-aware Moment Detection for Video Moment Retrieval

0 of 32 row(s) selected.