Grafit: Learning fine-grained image representations with coarse labels

This paper tackles the problem of learning a finer representation than theone provided by training labels. This enables fine-grained category retrievalof images in a collection annotated with coarse labels only. Our network is learned with a nearest-neighbor classifier objective, and aninstance loss inspired by self-supervised learning. By jointly leveraging thecoarse labels and the underlying fine-grained latent space, it significantlyimproves the accuracy of category-level retrieval methods. Our strategy outperforms all competing methods for retrieving or classifyingimages at a finer granularity than that available at train time. It alsoimproves the accuracy for transfer learning tasks to fine-grained datasets,thereby establishing the new state of the art on five public benchmarks, likeiNaturalist-2018.