HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Open Vocabulary Semantic Segmentation
Open Vocabulary Semantic Segmentation On 1
Open Vocabulary Semantic Segmentation On 1
Metrics
mIoU
Results
Performance results of various models on this benchmark
Columns
Model Name
mIoU
Paper Title
HyperSeg
64.6
HyperSeg: Towards Universal Visual Segmentation with Large Language Model
SILC
63.5
SILC: Improving Vision Language Pretraining with Self-Distillation
CAT-Seg
63.3
CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation
MaskCLIP++
62.5
High-Quality Mask Tuning Matters for Open-Vocabulary Segmentation
CLIPSelf
62.3
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
SED
60.6
SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation
Mask-Adapter
60.4
Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation
EBSeg-L
60.2
Open-Vocabulary Semantic Segmentation with Image Embedding Balancing
MAFT+
59.4
Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation
SCAN
59.3
Open-Vocabulary Segmentation with Semantic-Assisted Calibration
MAFT-ViTL
58.5
Learning Mask-aware CLIP Representations for Zero-Shot Segmentation
FC-CLIP
58.4
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
ODISE
57.3
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
OVSeg Swin-B
55.7
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
PACL
50.1
Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning
SimSeg
47.7
A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model
MaskCLIP
45.9
Open-Vocabulary Universal Image Segmentation with MaskCLIP
TaAlign(trained with image-text pairs)
37.6
TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification
TTD (TCL)
37.4
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias
LaVG
34.7
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
0 of 23 row(s) selected.
Previous
Next
Open Vocabulary Semantic Segmentation On 1 | SOTA | HyperAI