SwinB-AOTv2-L (all frames, MS) | 90.3 | 89.1 | - | 85.5 | 81.0 | 86.5 | Scalable Video Object Segmentation with Identification Mechanism | |
Lightweight TrickVOS (PT) | 83.3 | 84 | - | 79.5 | - | - | TrickVOS: A Bag of Tricks for Video Object Segmentation | - |
DEVA | 89.9 | 89.1 | 25.3 | 85.4 | 89.9 | 86.2 | Tracking Anything in High Quality | |
R50-AOST (L'=1) | 85.6 | 83.8 | - | 81.0 | 754.8 | 81.5 | Scalable Video Object Segmentation with Identification Mechanism | |
SwinB-AOTv2-L (all frames) | 88.9 | 88.0 | - | 84.2 | 79.8 | 85.2 | Scalable Video Object Segmentation with Identification Mechanism | |
STCN + TrickVOS (PT) | 86.4 | 85.5 | - | 82.1 | 77.2 | - | TrickVOS: A Bag of Tricks for Video Object Segmentation | - |