HyperAI

Highlight Detection On Qvhighlights

Metrics

Hit@1
mAP

Results

Performance results of various models on this benchmark

Model Name
Hit@1
mAP
Paper TitleRepository
SG-DETR69.1343.76Saliency-Guided DETR for Moment Retrieval and Highlight Detection
SG-DETR (w/ PT)71.0044.70Saliency-Guided DETR for Moment Retrieval and Highlight Detection
VideoLights-B-pt70.5642.84VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval
HL-CLIP70.6041.94Unleash the Potential of CLIP for Video Highlight Detection-
UniVTG (w/ PT)66.2840.54UniVTG: Towards Unified Video-Language Temporal Grounding
LLMEPET65.6940.33Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval
Moment-DETR w/ PT60.1737.43QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries
QD-DETR (only Video w/ PT)61.91-Query-Dependent Video Representation for Moment Retrieval and Highlight Detection
CG-DETR (w/ PT)66.6040.71Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding
QD-DETR62.8739.04Query-Dependent Video Representation for Moment Retrieval and Highlight Detection
QD-DETR (w/ PT)62.2738.52Query-Dependent Video Representation for Moment Retrieval and Highlight Detection
NumPro70.7140.54Number it: Temporal Grounding Videos like Flipping Manga
QD-DETR (only Video)62.4038.94Query-Dependent Video Representation for Moment Retrieval and Highlight Detection
UMT (w. PT)-39.12UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection
UMT-38.18UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection
R^2-Tuning64.2040.75$R^2$-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
CG-DETR66.2140.33Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding
FlashVTG71.0144.09FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding
UniVTG60.9638.20UniVTG: Towards Unified Video-Language Temporal Grounding
0 of 19 row(s) selected.