Home News Papers Tutorials Datasets Wiki SOTA LLM Models GPU Leaderboard Events

English

Speech Prompted Semantic Segmentation On

Metrics

mAP

mIoU

Results

Performance results of various models on this benchmark

Model Name	mAP	mIoU	Paper Title	Repository
CAVMAE	27.2	19.9	Contrastive Audio-Visual Masked Autoencoder
DAVENet	32.2	26.3	Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input	-
ImageBIND	20.2	19.7	ImageBind: One Embedding Space To Bind Them All
DenseAV	48.7	36.8	Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language

0 of 4 row(s) selected.