HyperAI超神经
首页
资讯
最新论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
首页
SOTA
Moment Retrieval
Moment Retrieval On Qvhighlights
Moment Retrieval On Qvhighlights
评估指标
R@1 IoU=0.5
R@1 IoU=0.7
mAP
mAP@0.5
mAP@0.75
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
R@1 IoU=0.5
R@1 IoU=0.7
mAP
mAP@0.5
mAP@0.75
Paper Title
Repository
SG-DETR
72.20
56.60
54.10
73.20
55.80
Saliency-Guided DETR for Moment Retrieval and Highlight Detection
LLMEPET
66.73
49.94
44.05
65.76
43.91
Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval
DenoiseLoc
59.27
45.07
-
-
-
Boundary-Denoising for Video Activity Localization
BAM-DETR
62.71
48.64
45.36
64.57
46.33
BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos
UMT
-
-
36.12
-
-
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection
VideoLights-B-pt
70.36
55.25
47.94
69.53
49.17
VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval
UniVTG (w/ PT)
65.43
50.06
43.63
64.06
45.02
UniVTG: Towards Unified Video-Language Temporal Grounding
UVCOM (w/ PT ASR Captions)
64.53
48.31
43.8
64.78
43.65
Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection
QD-DETR (only Video)
62.40
44.98
39.86
62.52
39.88
Query-Dependent Video Representation for Moment Retrieval and Highlight Detection
R^2-Tuning
68.03
49.35
46.17
69.04
47.56
$R^2$-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
FlashVTG
70.69
53.96
52.00
72.33
53.85
FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding
SeViLA-Localizer
54.5
36.5
32.3
-
-
-
-
QD-DETR (w/ audio)
63.06
45.10
40.19
63.04
40.10
Query-Dependent Video Representation for Moment Retrieval and Highlight Detection
BAM-DETR (w/ audio)
64.07
48.12
46.91
65.61
47.51
BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos
CG-DETR
65.43
48.38
42.86
64.51
42.77
Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding
UnLoc-L
66.1
46.7
-
-
-
UnLoc: A Unified Framework for Video Localization Tasks
LD-DETR
66.80
51.04
46.41
67.61
46.99
LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection
LA-DETR
63.94
51.10
47.93
65.65
49.44
Length-Aware DETR for Robust Moment Retrieval
LLaVA-MR
76.59
61.48
52.73
69.41
54.40
LLaVA-MR: Large Language-and-Vision Assistant for Video Moment Retrieval
-
BM-DETR
60.12
43.05
40.08
63.08
40.18
Background-aware Moment Detection for Video Moment Retrieval
0 of 32 row(s) selected.
Previous
Next