HyperAIHyperAI

Command Palette

Search for a command to run...

il y a 3 ans

Transfert de style artistique neuronal avec des réseaux adversariaux conditionnels

Pathirage N. Deelaka

Transfert de style neuronal

20 heures de calcul sur RTX 5090 pour seulement $1 (valeur $7)
Aller à Notebook

Résumé

Un modèle de transformation de style artistique neuronal (NST) peut modifier l’apparence d’une image simple en y ajoutant le style d’une image célèbre. Bien que les images transformées ne ressemblent pas précisément à des œuvres du même artiste que celui des images de style respectives, les images générées sont esthétiquement attrayantes. En général, un modèle NST entraîné est spécialisé dans un style, et une seule image représente ce style. Cependant, la génération d’une image selon un nouveau style est un processus fastidieux, qui implique un entraînement complet du modèle. Dans cet article, nous présentons deux méthodes qui constituent une avancée vers un modèle de transfert de style neuronal indépendant de l’image de style. Autrement dit, le modèle entraîné pourrait générer des images sémantiquement précises pour toute paire entrée composée d’une image de contenu et d’une image de style. Notre contribution novatrice est un modèle unidirectionnel-GAN qui assure la cohérence cyclique grâce à son architecture. De plus, cela conduit à une taille de modèle nettement réduite et à des phases d’entraînement et de validation plus efficaces.

One-sentence Summary

The authors propose a unidirectional generative adversarial network for style-independent neural style transfer that enforces cyclic consistency through its architecture to eliminate full model retraining, yielding a compact design with efficient training that generates semantically accurate images across arbitrary content and style pairs.

Key Contributions

  • The paper introduces a style-image-independent neural style transfer framework that generates semantically accurate outputs for arbitrary content and style pairs without requiring full model retraining.
  • A novel unidirectional-GAN architecture is proposed that enforces cyclic consistency directly through its structural design, addressing the input limitations of traditional paired and single-image translation models.
  • This architectural formulation yields a substantially smaller model size while streamlining both the training and validation phases compared to conventional style transfer approaches.

Introduction

Neural style transfer enables computers to apply the artistic characteristics of one image to another, a capability that has become highly valuable for digital art creation and automated content generation. However, traditional models are typically locked to a single reference style, forcing users to retrain the entire network whenever a new aesthetic is desired. Prior GAN-based approaches also struggle with unpaired data requirements, restrictive single-input architectures, and the difficulty of cleanly separating content from style features. To overcome these bottlenecks, the authors develop a style-independent transfer framework using a unidirectional GAN that embeds cyclic consistency directly into the network structure. This architectural innovation removes the need for per-style retraining, substantially shrinks the model footprint, and delivers faster, more efficient training and validation cycles.

Method

The authors propose two distinct approaches to neural style transfer (NST) using generative adversarial networks (GANs), each with a unique architecture and training paradigm designed to address limitations of conventional CNN-based methods. The first approach, referred to as the rGAN model, employs a conditional GAN framework where a single generator is paired with two independent discriminators to separately assess the content and style of the generated image. The generator, based on a U-Net architecture with skip connections, takes both a content and a style image as inputs. It encodes features from the content image and a local-global fused representation from the style image into a latent space, which the decoder then uses to reconstruct a style-transferred image. The content discriminator, a PatchGAN, evaluates the realism of local image patches to prevent alias artifacts and preserve the original color palette of the content image. In contrast, the style discriminator is implemented as a wavelet convolutional neural network, designed to capture global and local features across multiple resolutions for effective style extraction. The overall objective function for the rGAN combines the adversarial losses from both discriminators with an L1 reconstruction loss to ensure perceptual clarity.

The second approach introduces a significant architectural shift by eliminating the separate discriminator models. Instead, the generator is composed of three distinct parameter spaces: a content encoder, a style encoder, and a decoder. The content and style encoders extract latent feature vectors from their respective input images. These features are then fed into the decoder to generate the final style-transferred image. The key innovation lies in the training process: the same content and style encoder models are used as discriminators. During training, the encoders are optimized to minimize a loss function that encourages them to generate optimal, discriminative features for their respective inputs, effectively training them as part of the adversarial process. The content encoder is trained using a pairwise marginal loss function to ensure semantic consistency, while the style encoder is trained under a metrics learning objective to cluster embeddings of similar styles together. The generator is then trained to minimize the adversarial loss using these pre-trained encoders, ensuring that the generated image accurately reflects the style while preserving the content's semantics. This shared parameter space approach reduces the overall model complexity and improves training stability compared to the first method.

Experiment

Two GAN-based neural style transfer approaches were evaluated using separate style and content discriminators trained on distinct datasets to validate training stability and artistic fidelity. The first approach demonstrated consistent convergence and successfully transferred stylistic textures while preserving the original content colors without introducing visual artifacts. The second approach further validated the benefits of dynamic batch sampling and matrix-based loss calculation by significantly reducing overfitting and mode collapse while producing images that authentically emulate an artist's unique style rather than merely copying reference features. Overall, both methods outperform traditional CNN-based style transfer by delivering higher stylistic coherence and cleaner visual outputs.

The authors compare their two proposed approaches with existing methods in a the the table, highlighting that both approaches support training without paired samples and preserve original image color while avoiding alias artifacts. The results show that the proposed methods maintain a high degree of style transfer fidelity with minimal introduction of extraneous features or artifacts, particularly in terms of color palette and texture integration. Both approaches support training without paired samples and preserve original image color. The methods avoid introducing alias artifacts in generated images. The generated images maintain style fidelity with minimal extrinsic features or artifacts.

The evaluation compares two proposed methods against existing approaches to assess their effectiveness in unpaired style transfer. The experiments validate that both techniques successfully preserve original image colors and eliminate aliasing artifacts while achieving high stylistic fidelity. Qualitatively, the methods demonstrate robust texture integration and maintain the source color palette without introducing extraneous visual features, confirming their overall superiority in generating high-quality style transfers.


Créer de l'IA avec l'IA

De l'idée au lancement — accélérez votre développement IA avec le co-codage IA gratuit, un environnement prêt à l'emploi et le meilleur prix pour les GPU.

Codage assisté par IA
GPU prêts à l’emploi
Tarifs les plus avantageux

HyperAI Newsletters

Abonnez-vous à nos dernières mises à jour
Nous vous enverrons les dernières mises à jour de la semaine dans votre boîte de réception à neuf heures chaque lundi matin
Propulsé par MailChimp