10 days ago

CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models

Quang-Binh Nguyen, Minh Luu, Quang Nguyen, Anh Tran, Khoi Nguyen

Abstract

Disentangling content and style from a single image, known as content-styledecomposition (CSD), enables recontextualization of extracted content andstylization of extracted styles, offering greater creative flexibility invisual synthesis. While recent personalization methods have explored thedecomposition of explicit content style, they remain tailored for diffusionmodels. Meanwhile, Visual Autoregressive Modeling (VAR) has emerged as apromising alternative with a next-scale prediction paradigm, achievingperformance comparable to that of diffusion models. In this paper, we exploreVAR as a generative framework for CSD, leveraging its scale-wise generationprocess for improved disentanglement. To this end, we propose CSD-VAR, a novelmethod that introduces three key innovations: (1) a scale-aware alternatingoptimization strategy that aligns content and style representation with theirrespective scales to enhance separation, (2) an SVD-based rectification methodto mitigate content leakage into style representations, and (3) an AugmentedKey-Value (K-V) memory enhancing content identity preservation. To benchmarkthis task, we introduce CSD-100, a dataset specifically designed forcontent-style decomposition, featuring diverse subjects rendered in variousartistic styles. Experiments demonstrate that CSD-VAR outperforms priorapproaches, achieving superior content preservation and stylization fidelity.