1. Tutorial Introduction

The PixelFlow project is an AI image generation model released by the Adobe team at the University of Hong Kong in April 2025. It is a series of image generation models that operate directly in the original pixel space, in stark contrast to the main latent space models.PixelFlow: Pixel-Space Generative Models with Flow".

This approach simplifies the image generation process by eliminating the need for pre-trained variational autoencoders (VAEs) and making the entire model end-to-end trainable. Through efficient cascade flow modeling, PixelFlow achieves affordable computational cost in pixel space. It achieves a FID of 1.98 on the 256×256 ImageNet class-conditional image generation benchmark. Qualitative results on text-to-image show that PixelFlow excels in image quality, artistry, and semantic control. We hope that this new paradigm will inspire and open up new opportunities for the next generation of visual generation models.

This tutorial uses resources for a single RTX 4090 card.

👉 This project provides a model of:

class-to-image: It achieves a FID of 1.98 on the 256×256 ImageNet class-conditional image generation benchmark.

Project Examples

2. Operation steps

1. After starting the container, click the API address to enter the Web interface

If "Bad Gateway" is displayed, it means the model is initializing. Since the model is large, please wait about 1-2 minutes and refresh the page.

2. After entering the webpage, you can start a conversation with the model

❗️Important usage tips:

ImageNet-1k Class: The generated images can only select the categories given in the drop-down box and cannot be customized.
Dopri5 ODE: It is a Dormand-Prince 5th order adaptive step size ODE solver and is enabled when high quality generation is required (eg generating high definition images).
Noise Shift: Controls the offset of the noise in the generation process. A larger value will increase the intensity of the noise, making the generated results more random and diverse. A smaller value will reduce the interference of the noise, making the generated results closer to the distribution of the training data (more conservative).
Classifier-free Guidance Scale: It is used to control the influence of conditional input (such as text or image) on the generated results in the generative model. A higher guidance value will make the generated results closer to the input conditions, while a lower value will retain more randomness.
Num Inference Steps [stage 0]: Indicates the number of iterations of the model or the number of steps in the inference process, representing the number of optimization steps used by the model to generate the result. A higher number of steps usually produces more refined results, but may increase the calculation time. [stage 0] represents the generated image, and the number after it plus 1 indicates the chapter image. There are four images in total.
Seed: It is a random number seed, which is used to control the randomness of the generation process. The same Seed value can generate the same results (provided that other parameters are the same), which is very important in reproducing the results.

How to use

Exchange and discussion

🖌️ If you see a high-quality project, please leave a message in the background to recommend it! In addition, we have also established a tutorial exchange group. Welcome friends to scan the QR code and remark [SD Tutorial] to join the group to discuss various technical issues and share application effects↓

Citation Information

Thanks to Github user xxxjjjyyy1 For the production of this tutorial, the project reference information is as follows:

@article{chen2025pixelflow,
  title={PixelFlow: Pixel-Space Generative Models with Flow},
  author={Chen, Shoufa and Ge, Chongjian and Zhang, Shilong and Sun, Peize and Luo, Ping},
  journal={arXiv preprint arXiv:2504.07963},
  year={2025}
}