Semi Supervised Video Object Segmentation On 20
Metrics
D16 val (F)
D16 val (G)
D16 val (J)
D17 test (F)
D17 test (G)
D17 test (J)
D17 val (F)
D17 val (G)
D17 val (J)
FPS
Results
Performance results of various models on this benchmark
Comparison Table
Model Name | D16 val (F) | D16 val (G) | D16 val (J) | D17 test (F) | D17 test (G) | D17 test (J) | D17 val (F) | D17 val (G) | D17 val (J) | FPS |
---|---|---|---|---|---|---|---|---|---|---|
ranet-ranking-attention-network-for-fast | 85.4 | 85.5 | 85.5 | 57.2 | 55.3 | 53.4 | 68.2 | 65.7 | 63.2 | 30.3 |
fast-video-object-segmentation-using-the | 85.7 | 86.6 | 87.6 | - | - | - | 73.5 | 71.4 | 69.3 | 25.0 |
agss-vos-attention-guided-single-shot-video | - | - | - | 59.7 | 57.2 | 54.8 | 69.9 | 67.4 | 64.9 | 10.0 |
video-object-segmentation-using-space-time | 88.1 | 86.5 | 84.8 | - | - | - | 74.0 | 71.6 | 69.2 | 6.25 |
learning-position-and-target-consistency-for | - | - | - | - | - | - | 77.2 | 75.2 | 73.1 | 8.47 |
Model 6 | 86.4 | 86.1 | 85.8 | - | 55.2 | - | 71.6 | 68.5 | 65.3 | 0.92 |
efficient-regional-memory-network-for-video | 82.3 | 81.5 | 80.6 | - | - | - | 77.2 | 75.0 | 72.8 | 11.9 |
tackling-background-distraction-in-video | 86.2 | 86.8 | 87.5 | 72.2 | 69.4 | 66.6 | 82.3 | 80.0 | 77.6 | 50.1 |
learning-what-to-learn-for-video-object | - | - | - | - | - | - | 76.3 | 74.3 | 72.2 | 14.0 |
sstvos-sparse-spatiotemporal-transformers-for | - | - | - | - | - | - | 81.4 | 78.4 | 75.4 | - |
kernelized-memory-network-for-video-object | 88.1 | 87.6 | 87.1 | - | - | - | 77.8 | 76.0 | 74.2 | 8.33 |
spatiotemporal-cnn-for-video-object | 83.8 | 83.8 | 83.8 | - | - | - | 64.6 | 61.7 | 58.7 | 0.26 |
a-transductive-approach-for-video-object | - | - | - | 67.4 | 63.1 | 58.8 | 74.7 | 72.3 | 69.9 | 37.0 |
joint-inductive-and-transductive-learning-for | - | - | - | - | - | - | 81.2 | 78.6 | 76.0 | 4.00 |
spatiotemporal-graph-neural-network-based | 86.0 | 85.7 | 85.4 | 66.5 | 63.1 | 59.7 | 77.9 | 74.7 | 71.5 | - |
video-object-segmentation-with-adaptive | - | - | - | - | - | - | 76.1 | 74.6 | 73.0 | 4.00 |
hierarchical-memory-matching-network-for | 90.6 | 89.4 | 88.2 | - | - | - | 83.1 | 80.4 | 77.7 | 10.0 |
xmem-long-term-video-object-segmentation-with | - | - | - | - | - | - | - | - | - | 29.6 |
associating-objects-with-transformers-for | - | - | - | - | - | - | 82.0 | 79.2 | 76.4 | 40.0 |
collaborative-video-object-segmentation-by | 86.9 | 86.1 | 85.3 | - | - | - | 77.7 | 74.9 | 72.1 | 5.56 |
learning-fast-and-robust-target-models-for | - | 81.7 | - | - | - | - | 71.2 | 68.8 | 66.4 | 21.9 |
feelvos-fast-end-to-end-embedding-learning | 83.1 | 81.7 | 80.3 | 57.5 | 54.4 | 51.2 | 72.3 | 69.1 | 65.9 | 2.22 |
swem-towards-real-time-video-object-1 | 89.0 | 88.1 | 87.3 | - | - | - | 79.8 | 77.2 | 74.5 | 36.0 |
dmm-net-differentiable-mask-matching-network | - | - | - | - | - | - | 73.3 | 70.7 | 68.1 | - |
pixel-level-bijective-matching-for-video | 81.4 | 82.2 | 82.9 | 64.7 | 62.7 | 60.7 | 74.7 | 72.7 | 70.7 | 45.9 |
fast-video-object-segmentation-via-dynamic | 83.5 | 83.6 | 83.7 | - | - | - | 70.6 | 67.4 | 64.2 | 14.3 |