Open-vocabulary Attribute Detection

Vision-language modeling has enabled open-vocabulary tasks where predictionscan be queried using any text prompt in a zero-shot manner. Existingopen-vocabulary tasks focus on object classes, whereas research on objectattributes is limited due to the lack of a reliable attribute-focusedevaluation benchmark. This paper introduces the Open-Vocabulary AttributeDetection (OVAD) task and the corresponding OVAD benchmark. The objective ofthe novel task and benchmark is to probe object-level attribute informationlearned by vision-language models. To this end, we created a clean and denselyannotated test set covering 117 attribute classes on the 80 object classes ofMS COCO. It includes positive and negative annotations, which enablesopen-vocabulary evaluation. Overall, the benchmark consists of 1.4 millionannotations. For reference, we provide a first baseline method foropen-vocabulary attribute detection. Moreover, we demonstrate the benchmark'svalue by studying the attribute detection performance of several foundationmodels. Project page https://ovad-benchmark.github.io