2 months ago

Label2Label: A Language Modeling Framework for Multi-Attribute Learning

Li, Wanhua ; Cao, Zhexuan ; Feng, Jianjiang ; Zhou, Jie ; Lu, Jiwen

Abstract

Objects are usually associated with multiple attributes, and these attributesoften exhibit high correlations. Modeling complex relationships betweenattributes poses a great challenge for multi-attribute learning. This paperproposes a simple yet generic framework named Label2Label to exploit thecomplex attribute correlations. Label2Label is the first attempt formulti-attribute prediction from the perspective of language modeling.Specifically, it treats each attribute label as a "word" describing the sample.As each sample is annotated with multiple attribute labels, these "words" willnaturally form an unordered but meaningful "sentence", which depicts thesemantic information of the corresponding sample. Inspired by the remarkablesuccess of pre-training language models in NLP, Label2Label introduces animage-conditioned masked language model, which randomly masks some of the"word" tokens from the label "sentence" and aims to recover them based on themasked "sentence" and the context conveyed by image features. Our intuition isthat the instance-wise attribute relations are well grasped if the neural netcan infer the missing attributes based on the context and the remainingattribute hints. Label2Label is conceptually simple and empirically powerful.Without incorporating task-specific prior knowledge and highly specializednetwork designs, our approach achieves state-of-the-art results on threedifferent multi-attribute learning tasks, compared to highly customizeddomain-specific methods. Code is available athttps://github.com/Li-Wanhua/Label2Label.