Date

8 months ago

Size

48.17 MB

1. Tutorial Introduction

VIRES is a video instance redrawing method that combines sketches and text guidance, jointly proposed in 2025 by the Camera Intelligence Laboratory of Peking University (Shi Baixin's team) in conjunction with OpenBayes Bayesian Computing and the team of Associate Professor Li Si from the Pattern Recognition Laboratory of the School of Artificial Intelligence of Beijing University of Posts and Telecommunications. It supports a variety of editing operations such as redrawing, replacement, generation and removal of video subjects. This method uses the prior knowledge of the text-generated video model to ensure temporal consistency. It also proposes a Sequential ControlNet with a standardized adaptive scaling mechanism, which can effectively extract structural layouts and adaptively capture high-contrast sketch details. Furthermore, the research team introduced a sketch attention mechanism in the DiT (diffusion transformer) backbone to interpret and inject fine-grained sketch semantics. Experimental results show that VIRES outperforms existing SOTA models in many aspects such as video quality, temporal consistency, conditional alignment and user ratings.

Related research VIRES: Video Instance Repainting via Sketch and Text Guided Generation The topic has been selected for CVPR 2025.

This tutorial uses resources for a single card A6000.

2. Project Examples

3. Operation steps

1. After starting the container, click the API address to enter the Web interface

2. Once you enter the webpage, you can use the model

If "Bad Gateway" is displayed, it means the model is initializing. Since the model is large, please wait about 2-3 minutes and refresh the page.

How to use

Parameter Description:

CFG Guidance Scale: Unconditional guidance strength.
Number of Sampling Steps: Number of sampling steps.
Start Frame: Edit the start frame.

Citation Information

@article{vires,
      title={VIRES: Video Instance Repainting via Sketch and Text Guided Generation},
      author={Weng, Shuchen and Zheng, Haojie and Zhang, Peixuan and Hong, Yuchen and Jiang, Han and Li, Si and Shi, Boxin},
      booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
      pages={28416--28425},
      year={2025}
}

This notebook is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.

Related Notebooks

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Run this Notebook

Date

8 months ago

Size

48.17 MB

1. Tutorial Introduction

Related research VIRES: Video Instance Repainting via Sketch and Text Guided Generation The topic has been selected for CVPR 2025.

This tutorial uses resources for a single card A6000.

2. Project Examples

3. Operation steps

1. After starting the container, click the API address to enter the Web interface

2. Once you enter the webpage, you can use the model

If "Bad Gateway" is displayed, it means the model is initializing. Since the model is large, please wait about 2-3 minutes and refresh the page.

How to use

Parameter Description:

CFG Guidance Scale: Unconditional guidance strength.
Number of Sampling Steps: Number of sampling steps.
Start Frame: Edit the start frame.

Citation Information

@article{vires,
      title={VIRES: Video Instance Repainting via Sketch and Text Guided Generation},
      author={Weng, Shuchen and Zheng, Haojie and Zhang, Peixuan and Hong, Yuchen and Jiang, Han and Li, Si and Shi, Boxin},
      booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
      pages={28416--28425},
      year={2025}
}

Related Notebooks

LongCat-Video: Meituan's open-source AI Video Generation Model

3 months ago

Krea-realtime-video: Real-time Video Generation Model

3 months ago

SAM3: Visual Segmentation Model

2 months ago

F5-E2 TTS Clones Any Sound in Just 3 Seconds

2 months ago

FLUX.2-dev: Image Generation and Editing Model

2 months ago

Supertonic: A high-speed TTS Speech Synthesis Model Based on ONNX

2 months ago

ROCKET-2: 3D Game Zero-Shot Transfer

2 months ago

Ovis-Image: High-quality Image Generation Model

2 months ago

JarvisArt-Preview Smart Photo Retouching Proxy

a month ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

VIRES: Sketch-and-text dual-guided Video Redrawing

1. Tutorial Introduction

2. Project Examples

3. Operation steps

Citation Information

Build AI with AI

HyperAI Newsletters

Command Palette

VIRES: Sketch-and-text dual-guided Video Redrawing

1. Tutorial Introduction

2. Project Examples

3. Operation steps

Citation Information

Related Notebooks

LongCat-Video: Meituan's open-source AI Video Generation Model

Krea-realtime-video: Real-time Video Generation Model

SAM3: Visual Segmentation Model

F5-E2 TTS Clones Any Sound in Just 3 Seconds

FLUX.2-dev: Image Generation and Editing Model

Supertonic: A high-speed TTS Speech Synthesis Model Based on ONNX

ROCKET-2: 3D Game Zero-Shot Transfer

Ovis-Image: High-quality Image Generation Model

JarvisArt-Preview Smart Photo Retouching Proxy

Build AI with AI

HyperAI Newsletters

Command Palette

VIRES: Sketch-and-text dual-guided Video Redrawing

1. Tutorial Introduction

2. Project Examples

3. Operation steps

Citation Information

Related Notebooks

LongCat-Video: Meituan's open-source AI Video Generation Model

Krea-realtime-video: Real-time Video Generation Model

SAM3: Visual Segmentation Model

F5-E2 TTS Clones Any Sound in Just 3 Seconds

FLUX.2-dev: Image Generation and Editing Model

Supertonic: A high-speed TTS Speech Synthesis Model Based on ONNX

ROCKET-2: 3D Game Zero-Shot Transfer

Ovis-Image: High-quality Image Generation Model

JarvisArt-Preview Smart Photo Retouching Proxy

Build AI with AI

HyperAI Newsletters

Related Notebooks

LongCat-Video: Meituan's open-source AI Video Generation Model

Krea-realtime-video: Real-time Video Generation Model

SAM3: Visual Segmentation Model

F5-E2 TTS Clones Any Sound in Just 3 Seconds

FLUX.2-dev: Image Generation and Editing Model

Supertonic: A high-speed TTS Speech Synthesis Model Based on ONNX

ROCKET-2: 3D Game Zero-Shot Transfer

Ovis-Image: High-quality Image Generation Model

JarvisArt-Preview Smart Photo Retouching Proxy

Related Notebooks

LongCat-Video: Meituan's open-source AI Video Generation Model

Krea-realtime-video: Real-time Video Generation Model

SAM3: Visual Segmentation Model

F5-E2 TTS Clones Any Sound in Just 3 Seconds

FLUX.2-dev: Image Generation and Editing Model

Supertonic: A high-speed TTS Speech Synthesis Model Based on ONNX

ROCKET-2: 3D Game Zero-Shot Transfer

Ovis-Image: High-quality Image Generation Model

JarvisArt-Preview Smart Photo Retouching Proxy