HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Semantic Segmentation
Semantic Segmentation On Nyu Depth V2
Semantic Segmentation On Nyu Depth V2
Metrics
Mean IoU
Results
Performance results of various models on this benchmark
Columns
Model Name
Mean IoU
Paper Title
OmniVec2
63.6
OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning
DiffusionMMS (DAT++-S)
61.5
Diffusion-based RGB-D Semantic Segmentation with Deformable Attention Transformer
GeminiFusion (Swin-Large)
60.9
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer
OmniVec
60.8
OmniVec: Learning robust representations with cross modal sharing
GeminiFusion (Swin-Large)
60.2
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer
DPLNet
59.3
Efficient Multimodal Semantic Segmentation via Dual-Prompt Learning
EMSANet (2x ResNet-34 NBt1D, PanopticNDT version, finetuned)
59.02
PanopticNDT: Efficient and Robust Panoptic Mapping
SwinMTL
58.14%
SwinMTL: A Shared Architecture for Simultaneous Depth Estimation and Semantic Segmentation from Monocular Camera Images
PolyMaX(ConvNeXt-L)
58.08%
PolyMaX: General Dense Prediction with Mask Transformer
HSPFormer(PVT v2-B4)
57.8%
HSPFormer: Hierarchical Spatial Perception Transformer for Semantic Segmentation
GeminiFusion (MiT-B5)
57.7
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer
DFormer-L
57.2%
DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation
CMNeXt (B4)
56.9%
Delivering Arbitrary-Modal Semantic Segmentation
CMX (B5)
56.9%
CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers
GeminiFusion (MiT-B3)
56.8
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer
OMNIVORE (Swin-L, finetuned)
56.8%
Omnivore: A Single Model for Many Visual Modalities
CMX (B4)
56.3%
CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers
MultiMAE (ViT-B)
56.0%
MultiMAE: Multi-modal Multi-task Masked Autoencoders
SMMCL (SegNeXt-B)
55.8%
Understanding Dark Scenes by Contrasting Multi-Modal Observations
DFormer-B
55.6%
DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation
0 of 116 row(s) selected.
Previous
Next