HyperAIHyperAI

Command Palette

Search for a command to run...

a month ago

OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models

OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion
  Transformer Models

Abstract

Recent advances in video insertion based on diffusion models are impressive.However, existing methods rely on complex control signals but struggle withsubject consistency, limiting their practical applicability. In this paper, wefocus on the task of Mask-free Video Insertion and aim to resolve three keychallenges: data scarcity, subject-scene equilibrium, and insertionharmonization. To address the data scarcity, we propose a new data pipelineInsertPipe, constructing diverse cross-pair data automatically. Building uponour data pipeline, we develop OmniInsert, a novel unified framework formask-free video insertion from both single and multiple subject references.Specifically, to maintain subject-scene equilibrium, we introduce a simple yeteffective Condition-Specific Feature Injection mechanism to distinctly injectmulti-source conditions and propose a novel Progressive Training strategy thatenables the model to balance feature injection from subjects and source video.Meanwhile, we design the Subject-Focused Loss to improve the detailedappearance of the subjects. To further enhance insertion harmonization, wepropose an Insertive Preference Optimization methodology to optimize the modelby simulating human preferences, and incorporate a Context-Aware Rephrasermodule during reference to seamlessly integrate the subject into the originalscenes. To address the lack of a benchmark for the field, we introduceInsertBench, a comprehensive benchmark comprising diverse scenes withmeticulously selected subjects. Evaluation on InsertBench indicates OmniInsertoutperforms state-of-the-art closed-source commercial solutions. The code willbe released.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models | Papers | HyperAI