HyperAIHyperAI

Command Palette

Search for a command to run...

ComfyUI HunyuanCustom Video Generation Workflow Tutorial

1. Tutorial Introduction

Build

This tutorial uses a single RTX 4090 card as the resource, and the video generation takes about 10 minutes. It is recommended to use a GPU with 80GB of memory for better generation quality.

HunyuanCustom is a multimodal custom video generation framework released by the Tencent Hunyuan team on May 9, 2025. It is a multimodal, conditionally controllable generation model centered on subject consistency built on the Hunyuan Video generation framework. It supports the generation of subject-consistent videos conditioned on text, image, audio, and video inputs. With the multimodal capabilities of HunyuanCustom, numerous downstream tasks can be accomplished. For example, by taking multiple pictures as input, HunyuanCustom can facilitate virtual human advertising and virtual makeup trials. The relevant paper results are "HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation".

This workflow tutorial uses the following model files in total:

  • hunyuan_video_custom_720p_fp8_scaled.safetensors
  • llava_llama3_fp16.safetensors
  • hunyuan_video_vae_bf16.safetensors
  • clip_l.safetensors

2. Project Examples

Multimodal video customization

Various applications

3. Operation steps

1. After starting the container, click the API address to enter the Web interface

If "Bad Gateway" is displayed, it means the model is initializing. Since the model is large, please wait about 2-3 minutes and refresh the page.

2. Functional Demonstration 

How to use

  1. The first clone requires manual import of the workflow file for loading
  1. Image Generation Video

Select image

Input Prompt 

Result Output 

4. Discussion

🖌️ If you see a high-quality project, please leave a message in the background to recommend it! In addition, we have also established a tutorial exchange group. Welcome friends to scan the QR code and remark [SD Tutorial] to join the group to discuss various technical issues and share application effects↓ 

Citation Information

The citation information for this project is as follows:

@misc{hu2025hunyuancustom,
      title={HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation}, 
      author={Teng Hu and Zhentao Yu and Zhengguang Zhou and Sen Liang and Yuan Zhou and Qin Lin and Qinglin Lu},
      year={2025},
      eprint={2505.04512},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.04512}, 
}

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp