Learning from Failure: Training Debiased Classifier from Biased Classifier

Neural networks often learn to make predictions that overly rely on spuriouscorrelation existing in the dataset, which causes the model to be biased. Whileprevious work tackles this issue by using explicit labeling on the spuriouslycorrelated attributes or presuming a particular bias type, we instead utilize acheaper, yet generic form of human knowledge, which can be widely applicable tovarious types of bias. We first observe that neural networks learn to rely onthe spurious correlation only when it is "easier" to learn than the desiredknowledge, and such reliance is most prominent during the early phase oftraining. Based on the observations, we propose a failure-based debiasingscheme by training a pair of neural networks simultaneously. Our main idea istwofold; (a) we intentionally train the first network to be biased byrepeatedly amplifying its "prejudice", and (b) we debias the training of thesecond network by focusing on samples that go against the prejudice of thebiased network in (a). Extensive experiments demonstrate that our methodsignificantly improves the training of the network against various types ofbiases in both synthetic and real-world datasets. Surprisingly, our frameworkeven occasionally outperforms the debiasing methods requiring explicitsupervision of the spuriously correlated attributes.