DVIS-DAQ(VIT-L, Offline) | 86.1 | 72.2 | 49.6 | 70.7 | 64.5 | DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries | |
CAVIS(VIT-L, Offline) | 87.3 | 73.2 | 49.7 | 70.3 | 65.3 | Context-Aware Video Instance Segmentation | |
InstanceFormer (Swin-L) | 73.7 | 56.9 | 42.8 | 56.0 | 51.0 | InstanceFormer: An Online Video Instance Segmentation Framework | |
DVIS++(VIT-L, Online) | 82.7 | 70.2 | 49.5 | 68.0 | 62.3 | DVIS++: Improved Decoupled Framework for Universal Video Segmentation | |
RefineVIS (Swin-L, online) | 84.1 | 68.5 | 48.3 | 65.2 | 61.4 | RefineVIS: Video Instance Segmentation with Temporal Attention Refinement | - |
GenVIS (Swin-L) | 80.9 | 66.5 | 49.1 | 64.7 | 60.1 | A Generalized Framework for Video Instance Segmentation | |
DVIS(Swin-L) | 83.0 | 68.4 | 47.7 | 65.7 | 60.1 | DVIS: Decoupled Video Instance Segmentation Framework | |
DVIS++(VIT-L, Offline) | 86.7 | 71.5 | 48.8 | 69.5 | 63.9 | DVIS++: Improved Decoupled Framework for Universal Video Segmentation | |