Video Instance Segmentation On Youtube Vis 2

Metrics

AP50

AP75

AR1

AR10

mask AP

Results

Performance results of various models on this benchmark

						Paper Title
CAVIS(VIT-L, Offline)	87.3	73.2	49.7	70.3	65.3	Context-Aware Video Instance Segmentation
DVIS++(VIT-L, Offline)	86.7	71.5	48.8	69.5	63.9	DVIS++: Improved Decoupled Framework for Universal Video Segmentation
DVIS-DAQ(VIT-L, Offline)	86.1	72.2	49.6	70.7	64.5	DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries
RefineVIS (Swin-L, online)	84.1	68.5	48.3	65.2	61.4	RefineVIS: Video Instance Segmentation with Temporal Attention Refinement
DVIS(Swin-L)	83.0	68.4	47.7	65.7	60.1	DVIS: Decoupled Video Instance Segmentation Framework
DVIS++(VIT-L, Online)	82.7	70.2	49.5	68.0	62.3	DVIS++: Improved Decoupled Framework for Universal Video Segmentation
NOVIS (Swin-L)	82.0	66.5	47.9	64.4	59.8	NOVIS: A Case for End-to-End Near-Online Video Instance Segmentation
TarViS (Swin-L)	81.4	67.6	47.6	64.8	60.2	TarViS: A Unified Approach for Target-based Video Segmentation
GRAtt-VIS (Swin-L)	81.3	67.1	48.8	64.5	60.3	GRAtt-VIS: Gated Residual Attention for Auto Rectifying Video Instance Segmentation
GenVIS (Swin-L)	80.9	66.5	49.1	64.7	60.1	A Generalized Framework for Video Instance Segmentation
IDOL (Swin-L)	80.8	63.5	45	60.1	56.1	In Defense of Online Models for Video Instance Segmentation
MDQE(Swin-L)	80.7	61.7	45.4	60.6	55.5	MDQE: Mining Discriminative Query Embeddings to Segment Occluded Instances on Challenging Videos
VITA (Swin-L)	80.6	61.0	47.7	62.6	57.5	VITA: Video Instance Segmentation via Object Token Association
UniVS(Swin-L)	79.4	63.3	46.2	63.1	57.9	UniVS: Unified and Universal Video Segmentation with Prompts as Queries
Tube-Link(Swin-L)	79.4	64.3	47.5	63.6	58.4	Tube-Link: A Flexible Cross Tube Framework for Universal Video Segmentation
DeVIS (Swin-L)	77.7	59.8	43.8	57.8	54.4	DeVIS: Making Deformable Transformers Work for Video Instance Segmentation
MinVIS (Swin-L)	76.6	62	45.9	60.8	55.3	MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training
BoxVIS(Swin-L & Box-sup)	76.4	59.6	44.8	61.0	53.9	BoxVIS: Video Instance Segmentation with Box Annotations
InstanceFormer (Swin-L)	73.7	56.9	42.8	56.0	51.0	InstanceFormer: An Online Video Instance Segmentation Framework
TarViS (Swin-T)	71.6	56.6	42.2	57.2	50.9	TarViS: A Unified Approach for Target-based Video Segmentation

0 of 26 row(s) selected.

Command Palette

Video Instance Segmentation On Youtube Vis 2

Metrics

Results