Described Object Detection
Described Object Detection (DOD) is an advanced task in the field of computer vision aimed at detecting all instances in an image based on flexible linguistic references. DOD not only encompasses the capabilities of Open Vocabulary Object Detection (OVD) but also extends to flexible expressions of category names and overcomes the limitations of Referring Expression Comprehension (REC), which can only locate pre-stored objects, thereby achieving more precise and extensive object recognition and localization. This technology holds significant application value in image understanding and scene parsing.