3 months ago

Guozhen Zhang Zixiang Zhou Teng Hu Ziqiao Peng Youliang Zhang Yi Chen Yuan Zhou Qinglin Lu Limin Wang

Abstract

Due to the lack of effective cross-modal modeling, existing open-source audio-video generation methods often exhibit compromised lip synchronization and insufficient semantic consistency. To mitigate these drawbacks, we propose UniAVGen, a unified framework for joint audio and video generation. UniAVGen is anchored in a dual-branch joint synthesis architecture, incorporating two parallel Diffusion Transformers (DiTs) to build a cohesive cross-modal latent space. At its heart lies an Asymmetric Cross-Modal Interaction mechanism, which enables bidirectional, temporally aligned cross-attention, thus ensuring precise spatiotemporal synchronization and semantic consistency. Furthermore, this cross-modal interaction is augmented by a Face-Aware Modulation module, which dynamically prioritizes salient regions in the interaction process. To enhance generative fidelity during inference, we additionally introduce Modality-Aware Classifier-Free Guidance, a novel strategy that explicitly amplifies cross-modal correlation signals. Notably, UniAVGen's robust joint synthesis design enables seamless unification of pivotal audio-video tasks within a single model, such as joint audio-video generation and continuation, video-to-audio dubbing, and audio-driven video synthesis. Comprehensive experiments validate that, with far fewer training samples (1.3M vs. 30.1M), UniAVGen delivers overall advantages in audio-video synchronization, timbre consistency, and emotion consistency.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

3 months ago

Guozhen Zhang Zixiang Zhou Teng Hu Ziqiao Peng Youliang Zhang Yi Chen Yuan Zhou Qinglin Lu Limin Wang

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

3 months ago

Guozhen Zhang Zixiang Zhou Teng Hu Ziqiao Peng Youliang Zhang Yi Chen Yuan Zhou Qinglin Lu Limin Wang

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions | Papers | HyperAI

Command Palette

UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions

Guozhen Zhang Zixiang Zhou Teng Hu Ziqiao Peng Youliang Zhang Yi Chen Yuan Zhou Qinglin Lu Limin Wang

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions

Guozhen Zhang Zixiang Zhou Teng Hu Ziqiao Peng Youliang Zhang Yi Chen Yuan Zhou Qinglin Lu Limin Wang

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions

Guozhen Zhang Zixiang Zhou Teng Hu Ziqiao Peng Youliang Zhang Yi Chen Yuan Zhou Qinglin Lu Limin Wang

Abstract

Build AI with AI

HyperAI Newsletters