HyperAIHyperAI
17 days ago

DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents

Yilun Xu, Gabriele Corso, Tommi Jaakkola, Arash Vahdat, Karsten Kreis
DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents
Abstract

Diffusion models (DMs) have revolutionized generative learning. They utilizea diffusion process to encode data into a simple Gaussian distribution.However, encoding a complex, potentially multimodal data distribution into asingle continuous Gaussian distribution arguably represents an unnecessarilychallenging learning problem. We propose Discrete-Continuous Latent VariableDiffusion Models (DisCo-Diff) to simplify this task by introducingcomplementary discrete latent variables. We augment DMs with learnable discretelatents, inferred with an encoder, and train DM and encoder end-to-end.DisCo-Diff does not rely on pre-trained networks, making the frameworkuniversally applicable. The discrete latents significantly simplify learningthe DM's complex noise-to-data mapping by reducing the curvature of the DM'sgenerative ODE. An additional autoregressive transformer models thedistribution of the discrete latents, a simple step because DisCo-Diff requiresonly few discrete variables with small codebooks. We validate DisCo-Diff on toydata, several image synthesis tasks as well as molecular docking, and find thatintroducing discrete latents consistently improves model performance. Forexample, DisCo-Diff achieves state-of-the-art FID scores on class-conditionedImageNet-64/128 datasets with ODE sampler.