Constrained R-CNN: A general image manipulation detection model

Recently, deep learning-based models have exhibited remarkable performancefor image manipulation detection. However, most of them suffer from pooruniversality of handcrafted or predetermined features. Meanwhile, they onlyfocus on manipulation localization and overlook manipulation classification. Toaddress these issues, we propose a coarse-to-fine architecture namedConstrained R-CNN for complete and accurate image forensics. First, thelearnable manipulation feature extractor learns a unified featurerepresentation directly from data. Second, the attention region proposalnetwork effectively discriminates manipulated regions for the next manipulationclassification and coarse localization. Then, the skip structure fuseslow-level and high-level information to refine the global manipulationfeatures. Finally, the coarse localization information guides the model tofurther learn the finer local features and segment out the tampered region.Experimental results show that our model achieves state-of-the-art performance.Especially, the F1 score is increased by 28.4%, 73.2%, 13.3% on the NIST16,COVERAGE, and Columbia dataset.