Discovering Human-Object Interaction Concepts via Self-Compositional Learning

A comprehensive understanding of human-object interaction (HOI) requiresdetecting not only a small portion of predefined HOI concepts (or categories)but also other reasonable HOI concepts, while current approaches usually failto explore a huge portion of unknown HOI concepts (i.e., unknown but reasonablecombinations of verbs and objects). In this paper, 1) we introduce a novel andchallenging task for a comprehensive HOI understanding, which is termed as HOIConcept Discovery; and 2) we devise a self-compositional learning framework (orSCL) for HOI concept discovery. Specifically, we maintain an online updatedconcept confidence matrix during training: 1) we assign pseudo-labels for allcomposite HOI instances according to the concept confidence matrix forself-training; and 2) we update the concept confidence matrix using thepredictions of all composite HOI instances. Therefore, the proposed methodenables the learning on both known and unknown HOI concepts. We performextensive experiments on several popular HOI datasets to demonstrate theeffectiveness of the proposed method for HOI concept discovery, objectaffordance recognition and HOI detection. For example, the proposedself-compositional learning framework significantly improves the performance of1) HOI concept discovery by over 10% on HICO-DET and over 3% on V-COCO,respectively; 2) object affordance recognition by over 9% mAP on MS-COCO andHICO-DET; and 3) rare-first and non-rare-first unknown HOI detection relativelyover 30% and 20%, respectively. Code is publicly available athttps://github.com/zhihou7/HOI-CL.