Zero-Shot Learning by Convex Combination of Semantic Embeddings

Several recent publications have proposed methods for mapping images intocontinuous semantic embedding spaces. In some cases the embedding space istrained jointly with the image transformation. In other cases the semanticembedding space is established by an independent natural language processingtask, and then the image transformation into that space is learned in a secondstage. Proponents of these image embedding systems have stressed theiradvantages over the traditional \nway{} classification framing of imageunderstanding, particularly in terms of the promise for zero-shot learning --the ability to correctly annotate images of previously unseen objectcategories. In this paper, we propose a simple method for constructing an imageembedding system from any existing \nway{} image classifier and a semantic wordembedding model, which contains the $\n$ class labels in its vocabulary. Ourmethod maps images into the semantic embedding space via convex combination ofthe class label embedding vectors, and requires no additional training. We showthat this simple and direct method confers many of the advantages associatedwith more complex image embedding schemes, and indeed outperforms state of theart methods on the ImageNet zero-shot learning task.