Video Object Detection On Imagenet Vid

평가 지표

MAP

평가 결과

이 벤치마크에서 각 모델의 성능 결과

		Paper Title
YOLOV++	93.2	Practical Video Object Detection via Feature Selection and Aggregation
DiffusionVID (Swin-B)	92.5	DiffusionVID: Denoising Object Boxes with Spatio-temporal Conditioning for Video Object Detection
Ours (Def. DETR + SwinB)	91.3	Objects do not disappear: Video object detection by single-frame object location anticipation
VSTAM	91.1	Video Sparse Transformer With Attention-Guided Memory for Video Object Detection
TGBFormer (Swin B)	90.3	TGBFormer: Transformer-GraphFormer Blender Network for Video Object Detection
TransVOD (Swin Base)	90.1	TransVOD: End-to-End Video Object Detection with Spatial-Temporal Transformers
PTSEFormer (ResNet-101)	88.1	PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer Towards Video Object Detection
Ours (Def. DETR + R101)	87.9	Objects do not disappear: Video object detection by single-frame object location anticipation
YOLOV	87.5	YOLOV: Making Still Image Object Detectors Great at Video Object Detection
Ours (Faster RCNN + R101)	87.2	Objects do not disappear: Video object detection by single-frame object location anticipation
DiffusionVID (ResNet-101)	87.1	DiffusionVID: Denoising Object Boxes with Spatio-temporal Conditioning for Video Object Detection
DAFA-F (ResNeXt-101)	85.9	DAFA: Diversity-Aware Feature Aggregation for Attention-Based Video Object Detection
ClipVID	85.8	Identity-Consistent Aggregation for Video Object Detection
HVRNet (ResNeXt101-32x4d)	85.5	Mining Inter-Video Proposal Relations for Video Object Detection
MEGA (ResNeXt101)	85.4	Memory Enhanced Global-Local Aggregation for Video Object Detection
BoxMask(ResNeXt101)	84.8	BoxMask: Revisiting Bounding Box Supervision for Video Object Detection
DAFA-F (ResNet-101)	84.5	DAFA: Diversity-Aware Feature Aggregation for Attention-Based Video Object Detection
SELSA (ResNeXt-101)	84.3	Sequence Level Semantics Aggregation for Video Object Detection
Temporal ROI Align (ResNeXt101)	84.3	Temporal RoI Align for Video Object Recognition
REPP + SELSA (ResNet-101)	84.2	Robust and Efficient Post-Processing for Video Object Detection (REPP)

0 of 33 row(s) selected.

Command Palette

Video Object Detection On Imagenet Vid

평가 지표

평가 결과