Open Vocabulary Image Classification | SOTA | HyperAI

Open Vocabulary Image Classification is a subtask in the field of computer vision that aims to generate accurate, fine-grained classification labels from the entire English noun vocabulary without any prompts or candidate labels. The goal of this task is to identify and describe specific objects or scenes in images, enhancing the model's generalization capability for unknown categories. Its application value lies in the ability to process large-scale, diverse image data, supporting a wider range of practical use cases such as intelligent image annotation, content retrieval, and automatic report generation.

OVIC Datasets (Wiki-H)

DFN-5B H/14-378 + PrefixedIter Decoder (FT2)

OVIC Datasets (World-H)

OVIC Datasets (Val3K)

OVIC Datasets (Wiki-L)