Home News Papers Tutorials Datasets Wiki SOTA LLM Models GPU Leaderboard Events

English

Sound Prompted Semantic Segmentation On

Metrics

mAP

mIoU

Results

Performance results of various models on this benchmark

Model Name	mAP	mIoU	Paper Title	Repository
DenseAV	32.7	24.7	Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language
DAVENet	16.8	18.1	Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input	-
ImageBIND	19.7	20.5	ImageBind: One Embedding Space To Bind Them All
CAVMAE	26.0	17.0	Contrastive Audio-Visual Masked Autoencoder

0 of 4 row(s) selected.