DVIS-DAQ(VIT-L, Offline) | 83.8 | 62.9 | - | - | 57.1 | DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries | |
DVIS++(R50, Offline) | 68.9 | 40.9 | 16.8 | 47.3 | 41.2 | DVIS++: Improved Decoupled Framework for Universal Video Segmentation | |
InstanceFormer (Swin-L) | 42.5 | 21.61 | 12.9 | 29.3 | 22.8 | InstanceFormer: An Online Video Instance Segmentation Framework | |
UNINEXT (ViT-H, Online) | 72.5 | 52.2 | - | - | 49.0 | Universal Instance Perception as Object Discovery and Retrieval | |
InstanceFormer(ResNet-50) | 40.7 | 18.1 | 12 | 27.1 | 20.0 | InstanceFormer: An Online Video Instance Segmentation Framework | |
BoxVIS(Swin-L & Box-sup) | 68.4 | 39.9 | - | - | 40.6 | BoxVIS: Video Instance Segmentation with Box Annotations | |
TarViS (ResNet-50) | 52.5 | 30.4 | 15.9 | 39.9 | 31.1 | TarViS: A Unified Approach for Target-based Video Segmentation | |
DVIS(Swin-L, Offline) | 75.9 | 53.0 | 19.4 | 55.3 | 49.9 | DVIS: Decoupled Video Instance Segmentation Framework | |
Mask2Former-VIS | 36.9 | 14.1 | 9.9 | 24.7 | 16.6 | Mask2Former for Video Instance Segmentation | |
DVIS++(VIT-L, Online) | 72.5 | 55.0 | 20.8 | 54.6 | 49.6 | DVIS++: Improved Decoupled Framework for Universal Video Segmentation | |