Alchemist: Turning Public Text-to-Image Data into Generative Gold

Startsev, Valerii ; Ustyuzhanin, Alexander ; Kirillov, Alexey ; Baranchuk, Dmitry ; Kastryulin, Sergey

公開日: 5/27/2025

Alchemist: Turning Public Text-to-Image Data into Generative Gold

要約

Pre-training equips text-to-image (T2I) models with broad world knowledge,but this alone is often insufficient to achieve high aesthetic quality andalignment. Consequently, supervised fine-tuning (SFT) is crucial for furtherrefinement. However, its effectiveness highly depends on the quality of thefine-tuning dataset. Existing public SFT datasets frequently target narrowdomains (e.g., anime or specific art styles), and the creation of high-quality,general-purpose SFT datasets remains a significant challenge. Current curationmethods are often costly and struggle to identify truly impactful samples. Thischallenge is further complicated by the scarcity of public general-purposedatasets, as leading models often rely on large, proprietary, and poorlydocumented internal data, hindering broader research progress. This paperintroduces a novel methodology for creating general-purpose SFT datasets byleveraging a pre-trained generative model as an estimator of high-impacttraining samples. We apply this methodology to construct and release Alchemist,a compact (3,350 samples) yet highly effective SFT dataset. Experimentsdemonstrate that Alchemist substantially improves the generative quality offive public T2I models while preserving diversity and style. Additionally, werelease the fine-tuned models' weights to the public.

論文の詳細を見る