Instance Segmentation On Coco

Metrics

AP50

AP75

APL

APM

APS

mask AP

Results

Performance results of various models on this benchmark

Model Name	AP50	AP75	APL	APM	APS	mask AP	Paper Title	Repository
CenterMask + VoVNetV2-99 (single-scale)	62.3	44.1	57.0	42.8	20.1	40.6	CenterMask : Real-Time Anchor-Free Instance Segmentation
MasK DINO (SwinL, multi-scale)	-	-	-	-	-	54.7	Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation
ISDA (ours)	62	41.1	-	41.2	17	38.7	ISDA: Position-Aware Instance Segmentation with Deformable Attention
EmbedMask(R-101-FPN)	59.1	40.3	-	40.4	17.9	37.7	EmbedMask: Embedding Coupling for One-stage Instance Segmentation
PANet	-	-	-	-	-	42.0	Path Aggregation Network for Instance Segmentation
GLEE-Lite	-	-	-	-	-	48.3	General Object Foundation Model for Images and Videos at Scale
VirTex Mask R-CNN (ResNet-50-FPN)	58.4	39.7	-	-	-	36.9	VirTex: Learning Visual Representations from Textual Annotations
iBOT (ViT-B/16)	-	-	-	-	-	44.2	iBOT: Image BERT Pre-Training with Online Tokenizer
DiffusionInst-ResNet101	-	-	-	-	-	41.5	DiffusionInst: Diffusion Model for Instance Segmentation
Cascade Mask R-CNN (ResNeXt152, CBNet)	-	-	-	-	-	43.3	CBNet: A Novel Composite Backbone Network Architecture for Object Detection
ViT-Adapter-L (HTC++, BEiTv2 pretrain, multi-scale)	-	-	-	-	-	53.0	Vision Transformer Adapter for Dense Predictions
PolarMask (ResNet-101-FPN)	51.9%	31%	42.8%	32.4%	13.4%	30.4%	PolarMask: Single Shot Instance Segmentation with Polar Representation
Co-DETR	80.2	63.4	72.0	60.1	41.6	57.1	DETRs with Collaborative Hybrid Assignments Training
gSwin-S	-	-	-	-	-	45.03	gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window	-
MogaNet-B (Cascade Mask R-CNN)	-	-	-	-	-	46	MogaNet: Multi-order Gated Aggregation Network
DetectoRS (ResNeXt-101-32x4d, multi-scale)	71.1	51.6	59.6	49.5	30.3	47.1	DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution
VoVNetV1-57	-	-	-	-	-	40.8%	An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection
GCNet (ResNeXt-101 + DCN + cascade + GC r16)	-	-	-	-	-	41.5%	GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond
SOLQ (ResNet50, single scale)	-	-	-	-	-	39.7	SOLQ: Segmenting Objects by Learning Queries
dBOT ViT-B (CLIP)	-	-	-	-	-	46.2	Exploring Target Representations for Masked Autoencoders

0 of 112 row(s) selected.