Natural Language Moment Retrieval On

Metrics

R@1,IoU=0.5

R@1,IoU=0.7

R@5,IoU=0.5

R@5,IoU=0.7

Results

Performance results of various models on this benchmark

					Paper Title
GVL (paragraph-level)	60.67	38.55	-	-	Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos
LLaVA-MR	55.16	35.68	-	-	LLaVA-MR: Large Language-and-Vision Assistant for Video Moment Retrieval
GVL	49.18	29.69	-	-	Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos
UnLoc-L	48.3	30.2	79.2	61.3	UnLoc: A Unified Framework for Video Localization Tasks
UnLoc-B	48.0	29.7	81.5	61.4	UnLoc: A Unified Framework for Video Localization Tasks
VLG-Net	46.32	29.82	77.15	63.33	VLG-Net: Video-Language Graph Matching Network for Video Grounding
DRN	45.45	24.36	77.97	50.30	Dense Regression Network for Video Grounding
UniMD+Sync.	-	-	80.54	57.04	UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection

0 of 8 row(s) selected.

Natural Language Moment Retrieval On | SOTA | HyperAI