HyperAI

Step1X-Edit: Image Editing Tool

Project Overview

GitHub Stars

This tutorial uses a single RTX A6000 card as the resource.

Step1X-Edit is a state-of-the-art image editing model released by the StepFun team on April 25, 2025, which aims to provide comparable performance to closed-source models such as GPT-4o and Gemini2 Flash. More specifically, Step1X-Edit uses multimodal LLM to process reference images and user editing instructions, extracts latent embeddings and integrates them with the diffuse image decoder to obtain the target image. The model has a total parameter volume of 19B (7B MLLM + 12B DiT), and has three key capabilities: precise semantic parsing, identity consistency maintenance, and high-precision regional level control; it supports 11 types of high-frequency image editing tasks, such as text replacement, style transfer, material transformation, character retouching, etc.

Step1X-Edit is the first open source system to achieve a deep fusion of MLLM and DiT, which greatly improves editing accuracy and image fidelity. In the latest image editing benchmark GEdit-Bench, Step1X-Edit is ahead of existing open source models in terms of semantic consistency, image quality and comprehensive score, and is comparable to GPT-4o and Gemini 2.0 Flash.Step1X-Edit: A Practical Framework for General Image Editing".

Step1X-Edit has the following core capabilities for natural language image editing tasks:

  • Semantic precision analysis: supports complex combination instructions described in natural language. The instructions do not require templates and can flexibly cope with multi-round and multi-task editing needs. It also supports the recognition, replacement and reconstruction of text in images.
  • Identity consistency preservation: After editing, the face, posture and identity features can be stably retained, which is suitable for high-consistency scenarios such as virtual people, e-commerce models, and social images;
  • High-precision area-level control: supports directional editing of text, materials, colors, etc. in designated areas, maintaining a unified image style and providing more refined control capabilities.

Project Examples

Run steps

1. After starting the container, click the API address to enter the Web interface

If "Bad Gateway" is displayed, it means the model is initializing. Since the model is large, please wait about 1-2 minutes and refresh the page.

2. Once you enter the web page, you can interact with the model

Exchange and discussion

🖌️ If you see a high-quality project, please leave a message in the background to recommend it! In addition, we have also established a tutorial exchange group. Welcome friends to scan the QR code and remark [SD Tutorial] to join the group to discuss various technical issues and share application effects↓

Citation Information

Thanks to Github user zhangjunchang  For the deployment of this tutorial, the project reference information is as follows:

@article{liu2025step1x-edit,
      title={Step1X-Edit: A Practical Framework for General Image Editing}, 
      author={Shiyu Liu and Yucheng Han and Peng Xing and Fukun Yin and Rui Wang and Wei Cheng and Jiaqi Liao and Yingming Wang and Honghao Fu and Chunrui Han and Guopeng Li and Yuang Peng and Quan Sun and Jingwei Wu and Yan Cai and Zheng Ge and Ranchen Ming and Lei Xia and Xianfang Zeng and Yibo Zhu and Binxing Jiao and Xiangyu Zhang and Gang Yu and Daxin Jiang},
      journal={arXiv preprint arXiv:2504.17761},
      year={2025}
}