HyperAIHyperAI

Command Palette

Search for a command to run...

a month ago

MATRIX: Mask Track Alignment for Interaction-aware Video Generation

Siyoon Jin Seongchan Kim Dahyun Chung Jaeho Lee Hyunwook Choi Jisu Nam Jiyoung Kim Seungryong Kim

MATRIX: Mask Track Alignment for Interaction-aware Video Generation

Abstract

Video DiTs have advanced video generation, yet they still struggle to modelmulti-instance or subject-object interactions. This raises a key question: Howdo these models internally represent interactions? To answer this, we curateMATRIX-11K, a video dataset with interaction-aware captions and multi-instancemask tracks. Using this dataset, we conduct a systematic analysis thatformalizes two perspectives of video DiTs: semantic grounding, viavideo-to-text attention, which evaluates whether noun and verb tokens captureinstances and their relations; and semantic propagation, via video-to-videoattention, which assesses whether instance bindings persist across frames. Wefind both effects concentrate in a small subset of interaction-dominant layers.Motivated by this, we introduce MATRIX, a simple and effective regularizationthat aligns attention in specific layers of video DiTs with multi-instance masktracks from the MATRIX-11K dataset, enhancing both grounding and propagation.We further propose InterGenEval, an evaluation protocol for interaction-awarevideo generation. In experiments, MATRIX improves both interaction fidelity andsemantic alignment while reducing drift and hallucination. Extensive ablationsvalidate our design choices. Codes and weights will be released.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
MATRIX: Mask Track Alignment for Interaction-aware Video Generation | Papers | HyperAI