Semantic Segmentation On Nyu Depth V2
評価指標
Mean IoU
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
比較表
モデル名 | Mean IoU |
---|---|
dynamic-multimodal-fusion | 51.0% |
geminifusion-efficient-pixel-wise-multimodal | 60.9 |
cmx-cross-modal-fusion-for-rgb-x-semantic | 54.4% |
locality-sensitive-deconvolution-networks | 45.9% |
efficient-rgb-d-semantic-segmentation-for | 48.17 |
3d-graph-neural-networks-for-rgbd-semantic | 43.1% |
asymformer-asymmetrical-cross-modal | 55.3% |
learning-deep-multimodal-feature | 51.2% |
swinmtl-a-shared-architecture-for | 58.14% |
omnivec2-a-novel-transformer-based-network | 63.6 |
multi-modal-attention-based-fusion-model-for | 44.8% |
geminifusion-efficient-pixel-wise-multimodal | 56.8 |
cross-stitch-networks-for-multi-task-learning | 19.3% |
rgb-based-semantic-segmentation-using-self | 33.49% |
delivering-arbitrary-modal-semantic | 56.9% |
cascaded-feature-network-for-semantic | 47.7% |
nddr-cnn-layer-wise-feature-fusing-in-multi | 43.3% |
acnet-attention-based-network-to-exploit | 48.3% |
malleable-2-5d-convolution-learning-receptive | 50.9% |
spatial-information-guided-adaptive-context | 49.4% |
adashare-learning-what-to-share-for-efficient | 29.6% |
pixel-difference-convolutional-network-for | 53.5% |
optimizing-rgb-d-semantic-segmentation | 51.9% |
masked-supervised-learning-for-semantic | 39.31% |
multimodal-token-fusion-for-vision | 53.3% |
haarnet-large-scale-linear-morphological | 50.7% |
improving-multi-modal-learning-with-uni-modal | 49.14% |
exploring-relational-context-for-multi-task | 46.33% |
understanding-dark-scenes-by-contrasting | 52.5% |
composite-learning-for-robust-and-effective | 33.48% |
refinenet-multi-path-refinement-networks-for | 46.5% |
inverseform-a-loss-function-for-structured | 53.1% |
ci-net-contextual-information-for-joint | 42.6% |
prompt-guided-transformer-for-multi-task | 41.61 |
comptr-towards-diverse-bi-source-dense | 55.5% |
efficient-yet-deep-convolutional-neural | 32.3% |
multi-task-meta-learning-learn-how-to-adapt | 41.51% |
variational-context-deformable-convnets-for | 50.7% |
omnivore-a-single-model-for-many-visual | 55.1% |
joint-task-recursive-learning-for-semantic | 46.8% |
dcanet-differential-convolution-attention | 53.3% |
inverted-pyramid-multi-task-transformer-for | 53.56% |
toward-edge-efficient-dense-predictions-with | 22.1% |
multimodal-knowledge-expansion | 48.88% |
efficient-multi-task-scene-analysis-with-rgb | 51.26% |
malleable-2-5d-convolution-learning-receptive | 49.7% |
mmanet-margin-aware-distillation-and-modality | 49.62% |
learning-fully-dense-neural-networks-for | 47.4% |
multimodal-token-fusion-for-vision | 54.2% |
mti-net-multi-scale-task-interaction-networks | 49.0 |
pattern-structure-diffusion-for-multi-task | 51.0% |
multi-layer-feature-aggregation-for-deep | 50.7% |
light-weight-refinenet-for-real-time-semantic | 44.4% |
temporally-distributed-networks-for-fast | 43.5 |
understanding-dark-scenes-by-contrasting | 55.8% |
multimae-multi-modal-multi-task-masked | 56.0% |
hs3-learning-with-proper-task-complexity-in | 53.5% |
bi-directional-cross-modality-feature | 52.4% |
scene-parsing-via-integrated-classification | 50.70 |
polymax-general-dense-prediction-with-mask | 58.08% |
mmformer-multimodal-medical-transformer-for | 48.45% |
rednet-residual-encoder-decoder-network-for | 47.2% |
variational-context-deformable-convnets-for | 51.9% |
dense-decoder-shortcut-connections-for-single | 48.1% |
efficient-multimodal-semantic-segmentation | 59.3 |
what-uncertainties-do-we-need-in-bayesian | 37.3% |
std2p-rgbd-semantic-segmentation-using-spatio | 40.1% |
warp-refine-propagation-semi-supervised-auto | 52.2% |
rfnet-region-aware-fusion-network-for | 48.13% |
pattern-affinitive-propagation-across-depth-1 | 50.4% |
spatial-information-guided-adaptive-context | 48.2% |
depth-aware-cnn-for-rgb-d-segmentation | 43.9% |
channel-exchanging-networks-for-multimodal | 52.5% |
dformer-rethinking-rgbd-representation | 51.8% |
hspformer-hierarchical-spatial-perception | 57.8% |
hemis-hetero-modal-image-segmentation | 37.77% |
depth-adapted-cnns-for-rgb-d-semantic | 51.24% |
real-time-joint-semantic-segmentation-and | 42.0% |
context-aware-interaction-network-for-rgb-t | 52.6% |
geminifusion-efficient-pixel-wise-multimodal | 57.7 |
dformer-rethinking-rgbd-representation | 57.2% |
temporally-distributed-networks-for-fast | 37.4 |
contrastive-multimodal-fusion-with | 48.1% |
cmx-cross-modal-fusion-for-rgb-x-semantic | 56.9% |
spatial-information-guided-convolution-for | 51.0% |
cmx-cross-modal-fusion-for-rgb-x-semantic | 56.3% |
variational-context-deformable-convnets-for | 45.3 |
fully-convolutional-networks-for-semantic | - |
understanding-dark-scenes-by-contrasting | 53.7% |
sosd-net-joint-semantic-object-segmentation | 45.0% |
efficient-multi-task-rgb-d-scene-analysis-for | 53.34% |
semantic-segmentation-with-reverse-attention | 41.2% |
omnivore-a-single-model-for-many-visual | 56.8% |
attention-based-dual-supervised-decoder-for | 52.5% |
light-weight-refinenet-for-real-time-semantic | 43.6% |
shapeconv-shape-aware-convolutional-layer-for | 49.0% |
cerberus-transformer-joint-semantic | 50.4% |
comptr-towards-diverse-bi-source-dense | 49.2% |
hapnet-toward-superior-rgb-thermal-scene | 55.0 |
shapeconv-shape-aware-convolutional-layer-for | 51.3% |
depth-adapted-cnns-for-rgb-d-semantic | 49.15% |
dformer-rethinking-rgbd-representation | 55.6% |
efficient-rgb-d-semantic-segmentation-for | 50.30 |
diffusion-based-rgb-d-semantic-segmentation | 61.5 |
recurrent-scene-parsing-with-perspective | 44.5% |
cross-task-attention-mechanism-for-dense | 40.84% |
panopticndt-efficient-and-robust-panoptic | 59.02 |
depth-adapted-cnns-for-rgb-d-semantic | 47.02% |
shapeconv-shape-aware-convolutional-layer-for | 48.8% |
omnivec-learning-robust-representations-with | 60.8 |
deep-feature-selection-and-fusion-for-rgb-d | 52.0% |
light-weight-refinenet-for-real-time-semantic | 41.7% |
geminifusion-efficient-pixel-wise-multimodal | 60.2 |
prompt-guided-transformer-for-multi-task | 46.43 |
depth-adapted-cnns-for-rgb-d-semantic | 50.05% |
dformer-rethinking-rgbd-representation | 53.6% |