2 months ago

Jiangning Zhang Junwei Zhu Zhenye Gan Donghao Luo Chuming Lin Feifan Xu Xu Peng Jianlong Hu Yuansen Liu Yijia Hong

Abstract

We propose a multimodal-driven framework for high-fidelity long-term digital human animation termed Soul, which generates semantically coherent videos from a single-frame portrait image, text prompts, and audio, achieving precise lip synchronization, vivid facial expressions, and robust identity preservation. We construct Soul-1M, containing 1 million finely annotated samples with a precise automated annotation pipeline (covering portrait, upper-body, full-body, and multi-person scenes) to mitigate data scarcity, and we carefully curate Soul-Bench for comprehensive and fair evaluation of audio-/text-guided animation methods. The model is built on the Wan2.2-5B backbone, integrating audio-injection layers and multiple training strategies together with threshold-aware codebook replacement to ensure long-term generation consistency. Meanwhile, step/CFG distillation and a lightweight VAE are used to optimize inference efficiency, achieving an 11.4 × speedup with negligible quality loss. Extensive experiments show that Soul significantly outperforms current leading open-source and commercial models on video quality, video-text alignment, identity preservation, and lip-synchronization accuracy, demonstrating broad applicability in real-world scenarios such as virtual anchors and film production.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

2 months ago

Jiangning Zhang Junwei Zhu Zhenye Gan Donghao Luo Chuming Lin Feifan Xu Xu Peng Jianlong Hu Yuansen Liu Yijia Hong

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

2 months ago

Jiangning Zhang Junwei Zhu Zhenye Gan Donghao Luo Chuming Lin Feifan Xu Xu Peng Jianlong Hu Yuansen Liu Yijia Hong

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Soul: Breathe Life into Digital Human for High-fidelity Long-term Multimodal Animation

Jiangning Zhang Junwei Zhu Zhenye Gan Donghao Luo Chuming Lin Feifan Xu Xu Peng Jianlong Hu Yuansen Liu Yijia Hong7 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Soul: Breathe Life into Digital Human for High-fidelity Long-term Multimodal Animation

Jiangning Zhang Junwei Zhu Zhenye Gan Donghao Luo Chuming Lin Feifan Xu Xu Peng Jianlong Hu Yuansen Liu Yijia Hong7 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Soul: Breathe Life into Digital Human for High-fidelity Long-term Multimodal Animation

Jiangning Zhang Junwei Zhu Zhenye Gan Donghao Luo Chuming Lin Feifan Xu Xu Peng Jianlong Hu Yuansen Liu Yijia Hong7 more

Abstract

Build AI with AI

HyperAI Newsletters

Jiangning Zhang Junwei Zhu Zhenye Gan Donghao Luo Chuming Lin Feifan Xu Xu Peng Jianlong Hu Yuansen Liu Yijia Hong

Jiangning Zhang Junwei Zhu Zhenye Gan Donghao Luo Chuming Lin Feifan Xu Xu Peng Jianlong Hu Yuansen Liu Yijia Hong

Jiangning Zhang Junwei Zhu Zhenye Gan Donghao Luo Chuming Lin Feifan Xu Xu Peng Jianlong Hu Yuansen Liu Yijia Hong