Date

8 months ago

Size

655.06 MB

Project Overview

This tutorial uses a single RTX A6000 card as the resource.

Step1X-Edit is a state-of-the-art image editing model released by the StepFun team on April 25, 2025, which aims to provide comparable performance to closed-source models such as GPT-4o and Gemini2 Flash. More specifically, Step1X-Edit uses multimodal LLM to process reference images and user editing instructions, extracts latent embeddings and integrates them with the diffuse image decoder to obtain the target image. The model has a total parameter volume of 19B (7B MLLM + 12B DiT), and has three key capabilities: precise semantic parsing, identity consistency maintenance, and high-precision regional level control; it supports 11 types of high-frequency image editing tasks, such as text replacement, style transfer, material transformation, character retouching, etc.

Step1X-Edit is the first open-source platform to achieve deep integration of MLLM and DiT, significantly improving editing accuracy and image fidelity. In the newly released image editing benchmark GEdit-Bench, Step1X-Edit comprehensively outperforms existing open-source models in semantic consistency, image quality, and overall score, rivaling GPT-4o and Gemini 2.0 Flash. Related research papers are available. Step1X-Edit: A Practical Framework for General Image Editing .

Step1X-Edit has the following core capabilities for natural language image editing tasks:

Semantic precision analysis: supports complex combination instructions described in natural language. The instructions do not require templates and can flexibly cope with multi-round and multi-task editing needs. It also supports the recognition, replacement and reconstruction of text in images.
Identity consistency preservation: After editing, the face, posture and identity features can be stably retained, which is suitable for high-consistency scenarios such as virtual people, e-commerce models, and social images;
High-precision area-level control: supports directional editing of text, materials, colors, etc. in designated areas, maintaining a unified image style and providing more refined control capabilities.

Project Examples

Run steps

1. After starting the container, click the API address to enter the Web interface

If "Bad Gateway" is displayed, it means the model is initializing. Since the model is large, please wait about 1-2 minutes and refresh the page.

2. Once you enter the web page, you can interact with the model

Exchange and discussion

🖌️ If you see a high-quality project, please leave a message in the background to recommend it! In addition, we have also established a tutorial exchange group. Welcome friends to scan the QR code and remark [SD Tutorial] to join the group to discuss various technical issues and share application effects↓

Citation Information

Thanks to Github user zhangjunchang For the deployment of this tutorial, the project reference information is as follows:

@article{liu2025step1x-edit,
      title={Step1X-Edit: A Practical Framework for General Image Editing}, 
      author={Shiyu Liu and Yucheng Han and Peng Xing and Fukun Yin and Rui Wang and Wei Cheng and Jiaqi Liao and Yingming Wang and Honghao Fu and Chunrui Han and Guopeng Li and Yuang Peng and Quan Sun and Jingwei Wu and Yan Cai and Zheng Ge and Ranchen Ming and Lei Xia and Xianfang Zeng and Yibo Zhu and Binxing Jiao and Xiangyu Zhang and Gang Yu and Daxin Jiang},
      journal={arXiv preprint arXiv:2504.17761},
      year={2025}
}

This notebook is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.

Related Notebooks

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Run this Notebook Discuss on Discord

Date

8 months ago

Size

655.06 MB

Project Overview

This tutorial uses a single RTX A6000 card as the resource.

Step1X-Edit has the following core capabilities for natural language image editing tasks:

Semantic precision analysis: supports complex combination instructions described in natural language. The instructions do not require templates and can flexibly cope with multi-round and multi-task editing needs. It also supports the recognition, replacement and reconstruction of text in images.
Identity consistency preservation: After editing, the face, posture and identity features can be stably retained, which is suitable for high-consistency scenarios such as virtual people, e-commerce models, and social images;
High-precision area-level control: supports directional editing of text, materials, colors, etc. in designated areas, maintaining a unified image style and providing more refined control capabilities.

Project Examples

Run steps

1. After starting the container, click the API address to enter the Web interface

If "Bad Gateway" is displayed, it means the model is initializing. Since the model is large, please wait about 1-2 minutes and refresh the page.

2. Once you enter the web page, you can interact with the model

Exchange and discussion

Citation Information

Thanks to Github user zhangjunchang For the deployment of this tutorial, the project reference information is as follows:

@article{liu2025step1x-edit,
      title={Step1X-Edit: A Practical Framework for General Image Editing}, 
      author={Shiyu Liu and Yucheng Han and Peng Xing and Fukun Yin and Rui Wang and Wei Cheng and Jiaqi Liao and Yingming Wang and Honghao Fu and Chunrui Han and Guopeng Li and Yuang Peng and Quan Sun and Jingwei Wu and Yan Cai and Zheng Ge and Ranchen Ming and Lei Xia and Xianfang Zeng and Yibo Zhu and Binxing Jiao and Xiangyu Zhang and Gang Yu and Daxin Jiang},
      journal={arXiv preprint arXiv:2504.17761},
      year={2025}
}

Related Notebooks

HunyuanOCR: Tencent Hunyuan End-to-End OCR

2 months ago

VibeVoice-Realtime TTS: Real-time Speech Synthesis Service

2 months ago

FLUX.2-dev: Image Generation and Editing Model

2 months ago

F5-E2 TTS Clones Any Sound in Just 3 Seconds

2 months ago

LongCat-Image-Edit-Interface: A Bilingual Text-Driven Image Editing System

2 months ago

Deploying Qwen-Image-Edit Using vLLM-Omni

6 days ago

SAM3: Visual Segmentation Model

2 months ago

DiagGym Diagnostic Agent

16 days ago

Krea-realtime-video: Real-time Video Generation Model

3 months ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Step1X-Edit: Image Editing Tool

Project Overview

Project Examples

Run steps

Exchange and discussion

Citation Information

Build AI with AI

HyperAI Newsletters

Command Palette

Step1X-Edit: Image Editing Tool

Project Overview

Project Examples

Run steps

Exchange and discussion

Citation Information

Related Notebooks

HunyuanOCR: Tencent Hunyuan End-to-End OCR

VibeVoice-Realtime TTS: Real-time Speech Synthesis Service

FLUX.2-dev: Image Generation and Editing Model

F5-E2 TTS Clones Any Sound in Just 3 Seconds

LongCat-Image-Edit-Interface: A Bilingual Text-Driven Image Editing System

Deploying Qwen-Image-Edit Using vLLM-Omni

SAM3: Visual Segmentation Model

DiagGym Diagnostic Agent

Krea-realtime-video: Real-time Video Generation Model

Build AI with AI

HyperAI Newsletters

Command Palette

Step1X-Edit: Image Editing Tool

Project Overview

Project Examples

Run steps

Exchange and discussion

Citation Information

Related Notebooks

HunyuanOCR: Tencent Hunyuan End-to-End OCR

VibeVoice-Realtime TTS: Real-time Speech Synthesis Service

FLUX.2-dev: Image Generation and Editing Model

F5-E2 TTS Clones Any Sound in Just 3 Seconds

LongCat-Image-Edit-Interface: A Bilingual Text-Driven Image Editing System

Deploying Qwen-Image-Edit Using vLLM-Omni

SAM3: Visual Segmentation Model

DiagGym Diagnostic Agent

Krea-realtime-video: Real-time Video Generation Model

Build AI with AI

HyperAI Newsletters

Related Notebooks

HunyuanOCR: Tencent Hunyuan End-to-End OCR

VibeVoice-Realtime TTS: Real-time Speech Synthesis Service

FLUX.2-dev: Image Generation and Editing Model

F5-E2 TTS Clones Any Sound in Just 3 Seconds

LongCat-Image-Edit-Interface: A Bilingual Text-Driven Image Editing System

Deploying Qwen-Image-Edit Using vLLM-Omni

SAM3: Visual Segmentation Model

DiagGym Diagnostic Agent

Krea-realtime-video: Real-time Video Generation Model

Related Notebooks

HunyuanOCR: Tencent Hunyuan End-to-End OCR

VibeVoice-Realtime TTS: Real-time Speech Synthesis Service

FLUX.2-dev: Image Generation and Editing Model

F5-E2 TTS Clones Any Sound in Just 3 Seconds

LongCat-Image-Edit-Interface: A Bilingual Text-Driven Image Editing System

Deploying Qwen-Image-Edit Using vLLM-Omni

SAM3: Visual Segmentation Model

DiagGym Diagnostic Agent

Krea-realtime-video: Real-time Video Generation Model