Generative Multi-Label Zero-Shot Learning

Multi-label zero-shot learning strives to classify images into multipleunseen categories for which no data is available during training. The testsamples can additionally contain seen categories in the generalized variant.Existing approaches rely on learning either shared or label-specific attentionfrom the seen classes. Nevertheless, computing reliable attention maps forunseen classes during inference in a multi-label setting is still a challenge.In contrast, state-of-the-art single-label generative adversarial network (GAN)based approaches learn to directly synthesize the class-specific visualfeatures from the corresponding class attribute embeddings. However,synthesizing multi-label features from GANs is still unexplored in the contextof zero-shot setting. In this work, we introduce different fusion approaches atthe attribute-level, feature-level and cross-level (across attribute andfeature-levels) for synthesizing multi-label features from their correspondingmulti-label class embedding. To the best of our knowledge, our work is thefirst to tackle the problem of multi-label feature synthesis in the(generalized) zero-shot setting. Comprehensive experiments are performed onthree zero-shot image classification benchmarks: NUS-WIDE, Open Images and MSCOCO. Our cross-level fusion-based generative approach outperforms thestate-of-the-art on all three datasets. Furthermore, we show the generalizationcapabilities of our fusion approach in the zero-shot detection task on MS COCO,achieving favorable performance against existing methods. The source code isavailable at https://github.com/akshitac8/Generative_MLZSL.