Open Vocabulary Action Recognition
Open Vocabulary Action Recognition (OVAR) is a cutting-edge task in the field of computer vision that aims to go beyond the predefined set of actions seen during training, enabling the system to generalize and recognize unseen actions. This task achieves action recognition (verbs or verb-object pairs) by providing textual queries at inference time, without requiring prior knowledge about these actions during the training phase. The application value of OVAR lies in its ability to handle more diverse and complex real-world scenarios, enhancing the adaptability and robustness of visual systems.