8 months ago

Abstract

Interactive image synthesis from user-guided input is a challenging task whenusers wish to control the scene structure of a generated image withease.Although remarkable progress has been made on layout-based image synthesisapproaches, in order to get realistic fake image in interactive scene, existingmethods require high-precision inputs, which probably need adjustment severaltimes and are unfriendly to novice users. When placement of bounding boxes issubject to perturbation, layout-based models suffer from "missing regions" inthe constructed semantic layouts and hence undesirable artifacts in thegenerated images. In this work, we propose Panoptic Layout GenerativeAdversarial Networks (PLGAN) to address this challenge. The PLGAN employspanoptic theory which distinguishes object categories between "stuff" withamorphous boundaries and "things" with well-defined shapes, such that stuff andinstance layouts are constructed through separate branches and later fused intopanoptic layouts. In particular, the stuff layouts can take amorphous shapesand fill up the missing regions left out by the instance layouts. Weexperimentally compare our PLGAN with state-of-the-art layout-based models onthe COCO-Stuff, Visual Genome, and Landscape datasets. The advantages of PLGANare not only visually demonstrated but quantitatively verified in terms ofinception score, Fr'echet inception distance, classification accuracy score,and coverage.

Source PDF View Code