EvoSearch-codes: Evolutionary Algorithm Framework

1. Tutorial Introduction

EvoSearch-codes is an Evolutionary Search method launched by the Hong Kong University of Science and Technology and Kuaishou Keling team on May 1, 2025. The generation quality of the model is greatly improved by increasing the amount of computation during inference, supporting image and video generation, and supporting the most advanced diffusion-based and flow-based models. EvoSearch does not require training or gradient updates, and can achieve significant optimal results on a series of tasks, and exhibits good scaling up capabilities, robustness and generalization. As the amount of computation during testing increases, EvoSearch shows that SD2.1 and Flux.1-dev also have the potential to match or even exceed GPT-4o. For video generation, Wan 1.3B can also surpass Wan 14B and Hunyuan 13B, showing the potential and research space for test-time scaling to supplement training-time scaling. The relevant paper results are "Scaling Image and Video Generation via Test-Time Evolutionary Search".
This tutorial uses a single RTX A6000 card as the resource. This tutorial provides three examples for testing: Wan Video Generation, SD Image Generation, and FLUX Image Generation.
2. Project Examples

3. Operation steps
1. After starting the container, click the API address to enter the Web interface

2. Usage steps
If "Bad Gateway" is displayed, it means the model is initializing. Since the model is large, please wait about 2-3 minutes and refresh the page.
2.1 Wan Video Generation
Tip: The video will take approximately 5 – 8 minutes to generate.

Parameter Description
- Advanced Settings
- Random Seed: Random seed.
- Height: Video generation height.
- Width: Video generation width.
- Video duration: Controls the video duration.
- Inference Steps: Inference steps.
- Guidance Scale: Controls the strength of the influence of textual cues on the generated video.
- Iteration: number of iterations.
2.2 SD Image Generation
Tip: It is better to use English as the prompt word.

- Advanced Settings
- Random Seed: Random seed.
- Image Size: Image size.
- Inference Steps: Inference steps.
- CFG Scale: Controls the strength of the influence of textual cues on the generated image.
- Iteration: number of iterations.
2.3 FLUX Image Generation

4. Discussion
🖌️ If you see a high-quality project, please leave a message in the background to recommend it! In addition, we have also established a tutorial exchange group. Welcome friends to scan the QR code and remark [SD Tutorial] to join the group to discuss various technical issues and share application effects↓

Citation Information
The citation information for this project is as follows:
@misc{he2025scaling,
title={Scaling Image and Video Generation via Test-Time Evolutionary Search},
author={Haoran He and Jiajun Liang and Xintao Wang and Pengfei Wan and Di Zhang and Kun Gai and Ling Pan},
year={2025},
eprint={2505.17618},
archivePrefix={arXiv},
primaryClass={cs.CV}
}