Command Palette
Search for a command to run...
WithAnyone: Towards Controllable and ID Consistent Image Generation

Abstract
Identity-consistent generation has become an important focus in text-to-imageresearch, with recent models achieving notable success in producing imagesaligned with a reference identity. Yet, the scarcity of large-scale paireddatasets containing multiple images of the same individual forces mostapproaches to adopt reconstruction-based training. This reliance often leads toa failure mode we term copy-paste, where the model directly replicates thereference face rather than preserving identity across natural variations inpose, expression, or lighting. Such over-similarity undermines controllabilityand limits the expressive power of generation. To address these limitations, we(1) construct a large-scale paired dataset MultiID-2M, tailored formulti-person scenarios, providing diverse references for each identity; (2)introduce a benchmark that quantifies both copy-paste artifacts and thetrade-off between identity fidelity and variation; and (3) propose a noveltraining paradigm with a contrastive identity loss that leverages paired datato balance fidelity with diversity. These contributions culminate inWithAnyone, a diffusion-based model that effectively mitigates copy-paste whilepreserving high identity similarity. Extensive qualitative and quantitativeexperiments demonstrate that WithAnyone significantly reduces copy-pasteartifacts, improves controllability over pose and expression, and maintainsstrong perceptual quality. User studies further validate that our methodachieves high identity fidelity while enabling expressive controllablegeneration.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.