Video Instance Segmentation On Youtube Vis 2

Metriken

AP50

AP75

AR1

AR10

mask AP

Ergebnisse

Leistungsergebnisse verschiedener Modelle zu diesem Benchmark

Modellname	AP50	AP75	AR1	AR10	mask AP	Paper Title	Repository
DVIS-DAQ(VIT-L, Offline)	86.1	72.2	49.6	70.7	64.5	DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries
CAVIS(VIT-L, Offline)	87.3	73.2	49.7	70.3	65.3	Context-Aware Video Instance Segmentation
TarViS (Swin-L)	81.4	67.6	47.6	64.8	60.2	TarViS: A Unified Approach for Target-based Video Segmentation
NOVIS (Swin-L)	82.0	66.5	47.9	64.4	59.8	NOVIS: A Case for End-to-End Near-Online Video Instance Segmentation	-
STMask(R101-DCN-FPN)	54.0	38.0	29.4	39.1	34.6	Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation
DeVIS (Swin-L)	77.7	59.8	43.8	57.8	54.4	DeVIS: Making Deformable Transformers Work for Video Instance Segmentation
InstanceFormer (Swin-L)	73.7	56.9	42.8	56.0	51.0	InstanceFormer: An Online Video Instance Segmentation Framework
UniVS(Swin-L)	79.4	63.3	46.2	63.1	57.9	UniVS: Unified and Universal Video Segmentation with Prompts as Queries
VITA (Swin-L)	80.6	61.0	47.7	62.6	57.5	VITA: Video Instance Segmentation via Object Token Association
GRAtt-VIS (Swin-L)	81.3	67.1	48.8	64.5	60.3	GRAtt-VIS: Gated Residual Attention for Auto Rectifying Video Instance Segmentation
GRAtt-VIS (ResNet-50)	69.2	53.1	41.8	56.0	48.9	GRAtt-VIS: Gated Residual Attention for Auto Rectifying Video Instance Segmentation
DVIS++(VIT-L, Online)	82.7	70.2	49.5	68.0	62.3	DVIS++: Improved Decoupled Framework for Universal Video Segmentation
RefineVIS (Swin-L, online)	84.1	68.5	48.3	65.2	61.4	RefineVIS: Video Instance Segmentation with Temporal Attention Refinement	-
GenVIS (Swin-L)	80.9	66.5	49.1	64.7	60.1	A Generalized Framework for Video Instance Segmentation
Tube-Link(Swin-L)	79.4	64.3	47.5	63.6	58.4	Tube-Link: A Flexible Cross Tube Framework for Universal Video Segmentation
DVIS(Swin-L)	83.0	68.4	47.7	65.7	60.1	DVIS: Decoupled Video Instance Segmentation Framework
MinVIS (Swin-L)	76.6	62	45.9	60.8	55.3	MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training
TarViS (Swin-T)	71.6	56.6	42.2	57.2	50.9	TarViS: A Unified Approach for Target-based Video Segmentation
DVIS++(VIT-L, Offline)	86.7	71.5	48.8	69.5	63.9	DVIS++: Improved Decoupled Framework for Universal Video Segmentation
NOVIS (ResNet-50)	69.4	50.0	41.3	54.4	47.2	NOVIS: A Case for End-to-End Near-Online Video Instance Segmentation	-

0 of 26 row(s) selected.