Object Detection On Coco 2017

Métriques

mAP

Résultats

Résultats de performance de divers modèles sur ce benchmark

Nom du modèle	mAP	Paper Title	Repository
UniRepLKNet-S++	54.3	UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
BiFormer-B (IN1k pretrain, MaskRCNN 12ep)	48.6	BiFormer: Vision Transformer with Bi-Level Routing Attention
DyHead (SAP)	-	Stochastic Subsampling With Average Pooling	-
Lpixel	-	Paint Transformer: Feed Forward Neural Painting with Stroke Prediction
MaxViT-T	-	MaxViT: Multi-Axis Vision Transformer
YOLO-Drone	35.45	YOLO-Drone:Airborne real-time detection of dense small objects from high-altitude perspective	-
UniRepLKNet-T	51.7	UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
UniRepLKNet-B++	54.8	UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
DAT-T++	-	DAT++: Spatially Dynamic Vision Transformer with Deformable Attention
DeBiFormer-B (IN1k pretrain, MaskRCNN 12ep)	48.5	DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention	-
MaxViT-S	-	MaxViT: Multi-Axis Vision Transformer
MixMIM-B	52.2	MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers
Faster R-CNN (ideal number of groups)	-	On the Ideal Number of Groups for Isometric Gradient Propagation	-
BiFormer-S (IN1k pretrain, MaskRCNN 12ep)	47.8	BiFormer: Vision Transformer with Bi-Level Routing Attention
UniRepLKNet-S	53	UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
DeBiFormer-S (IN1k pretrain, MaskRCNN 12ep)	47.5	DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention	-
DAT-S++	-	DAT++: Spatially Dynamic Vision Transformer with Deformable Attention
retinanet	-	Benchmark for Generic Product Detection: A Low Data Baseline for Dense Object Detection
UniRepLKNet-XL++	56.4	UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
DeBiFormer-B (IN1k pretrain, Retina)	47.1	DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention	-

0 of 24 row(s) selected.