Domain-Adaptive Self-Supervised Pre-Training for Face & Body Detection in Drawings

Drawings are powerful means of pictorial abstraction and communication.Understanding diverse forms of drawings, including digital arts, cartoons, andcomics, has been a major problem of interest for the computer vision andcomputer graphics communities. Although there are large amounts of digitizeddrawings from comic books and cartoons, they contain vast stylistic variations,which necessitate expensive manual labeling for training domain-specificrecognizers. In this work, we show how self-supervised learning, based on ateacher-student network with a modified student network update design, can beused to build face and body detectors. Our setup allows exploiting largeamounts of unlabeled data from the target domain when labels are provided foronly a small subset of it. We further demonstrate that style transfer can beincorporated into our learning pipeline to bootstrap detectors using a vastamount of out-of-domain labeled images from natural images (i.e., images fromthe real world). Our combined architecture yields detectors withstate-of-the-art (SOTA) and near-SOTA performance using minimal annotationeffort. Our code can be accessed fromhttps://github.com/barisbatuhan/DASS_Detector.