LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation

Recently, diffusion models have achieved great success in image synthesis.However, when it comes to the layout-to-image generation where an image oftenhas a complex scene of multiple objects, how to make strong control over boththe global layout map and each detailed object remains a challenging task. Inthis paper, we propose a diffusion model named LayoutDiffusion that can obtainhigher generation quality and greater controllability than the previous works.To overcome the difficult multimodal fusion of image and layout, we propose toconstruct a structural image patch with region information and transform thepatched image into a special layout to fuse with the normal layout in a unifiedform. Moreover, Layout Fusion Module (LFM) and Object-aware Cross Attention(OaCA) are proposed to model the relationship among multiple objects anddesigned to be object-aware and position-sensitive, allowing for preciselycontrolling the spatial related information. Extensive experiments show thatour LayoutDiffusion outperforms the previous SOTA methods on FID, CAS byrelatively 46.35%, 26.70% on COCO-stuff and 44.29%, 41.82% on VG. Code isavailable at https://github.com/ZGCTroy/LayoutDiffusion.