HyperAI초신경

Natural Language Moment Retrieval On Tacos

평가 지표

R@1,IoU=0.3

R@1,IoU=0.5

R@1,IoU=0.7

mIoU

평가 결과

이 벤치마크에서 각 모델의 성능 결과

					Paper Title
SG-DETR (w/ PT)	58.10	46.40	33.90	42.40	Saliency-Guided DETR for Moment Retrieval and Highlight Detection
SG-DETR	56.71	44.70	29.90	40.90	Saliency-Guided DETR for Moment Retrieval and Highlight Detection
LD-DETR	57.61	44.31	26.24	40.30	LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection
FlashVTG	53.71	41.76	24.74	37.61	FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding
BAM-DETR	56.69	41.54	26.77	39.31	BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos
LLMEPET	52.73	40.12	22.78	36.55	Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval
CG-DETR	52.23	39.61	22.23	36.48	Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding
UVCOM	-	36.39	23.32	-	Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection
GVL (paragraph-level)	48.29	36.07	-	-	Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos
UniVTG	51.44	34.97	21.07	35.76	UniVTG: Towards Unified Video-Language Temporal Grounding
GVL	45.92	34.57	-	-	Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos
VLG-Net	45.46	34.19	-	-	VLG-Net: Video-Language Graph Matching Network for Video Grounding

0 of 12 row(s) selected.