A Light CNN for Deep Face Representation with Noisy Labels

The volume of convolutional neural network (CNN) models proposed for facerecognition has been continuously growing larger to better fit large amount oftraining data. When training data are obtained from internet, the labels arelikely to be ambiguous and inaccurate. This paper presents a Light CNNframework to learn a compact embedding on the large-scale face data withmassive noisy labels. First, we introduce a variation of maxout activation,called Max-Feature-Map (MFM), into each convolutional layer of CNN. Differentfrom maxout activation that uses many feature maps to linearly approximate anarbitrary convex activation function, MFM does so via a competitiverelationship. MFM can not only separate noisy and informative signals but alsoplay the role of feature selection between two feature maps. Second, threenetworks are carefully designed to obtain better performance meanwhile reducingthe number of parameters and computational costs. Lastly, a semanticbootstrapping method is proposed to make the prediction of the networks moreconsistent with noisy labels. Experimental results show that the proposedframework can utilize large-scale noisy data to learn a Light model that isefficient in computational costs and storage spaces. The learned single networkwith a 256-D representation achieves state-of-the-art results on various facebenchmarks without fine-tuning. The code is released onhttps://github.com/AlfredXiangWu/LightCNN.