HyperAIHyperAI

Command Palette

Search for a command to run...

3 years ago

Neural Artistic Style Transfer with Conditional Adversarial Networks

Pathirage N. Deelaka

Neural Style Transfer

20 Hours of RTX 5090 Compute Resources for Only $1 (Worth $7)
Go to Notebook

Abstract

A neural artistic style transformation (NST) model can modify the appearance of a simple image by adding the style of a famous image. Even though the transformed images do not look precisely like artworks by the same artist of the respective style images, the generated images are appealing. Generally, a trained NST model specialises in a style, and a single image represents that style. However, generating an image under a new style is a tedious process, which includes full model training. In this paper, we present two methods that step toward the style image independent neural style transfer model. In other words, the trained model could generate semantically accurate generated image under any content, style image input pair. Our novel contribution is a unidirectional-GAN model that ensures the Cyclic consistency by the model architecture. Furthermore, this leads to much smaller model size and an efficient training and validation phase.

One-sentence Summary

The authors propose a unidirectional generative adversarial network for style-independent neural style transfer that enforces cyclic consistency through its architecture to eliminate full model retraining, yielding a compact design with efficient training that generates semantically accurate images across arbitrary content and style pairs.

Key Contributions

  • The paper introduces a style-image-independent neural style transfer framework that generates semantically accurate outputs for arbitrary content and style pairs without requiring full model retraining.
  • A novel unidirectional-GAN architecture is proposed that enforces cyclic consistency directly through its structural design, addressing the input limitations of traditional paired and single-image translation models.
  • This architectural formulation yields a substantially smaller model size while streamlining both the training and validation phases compared to conventional style transfer approaches.

Introduction

Neural style transfer enables computers to apply the artistic characteristics of one image to another, a capability that has become highly valuable for digital art creation and automated content generation. However, traditional models are typically locked to a single reference style, forcing users to retrain the entire network whenever a new aesthetic is desired. Prior GAN-based approaches also struggle with unpaired data requirements, restrictive single-input architectures, and the difficulty of cleanly separating content from style features. To overcome these bottlenecks, the authors develop a style-independent transfer framework using a unidirectional GAN that embeds cyclic consistency directly into the network structure. This architectural innovation removes the need for per-style retraining, substantially shrinks the model footprint, and delivers faster, more efficient training and validation cycles.

Method

The authors propose two distinct approaches to neural style transfer (NST) using generative adversarial networks (GANs), each with a unique architecture and training paradigm designed to address limitations of conventional CNN-based methods. The first approach, referred to as the rGAN model, employs a conditional GAN framework where a single generator is paired with two independent discriminators to separately assess the content and style of the generated image. The generator, based on a U-Net architecture with skip connections, takes both a content and a style image as inputs. It encodes features from the content image and a local-global fused representation from the style image into a latent space, which the decoder then uses to reconstruct a style-transferred image. The content discriminator, a PatchGAN, evaluates the realism of local image patches to prevent alias artifacts and preserve the original color palette of the content image. In contrast, the style discriminator is implemented as a wavelet convolutional neural network, designed to capture global and local features across multiple resolutions for effective style extraction. The overall objective function for the rGAN combines the adversarial losses from both discriminators with an L1 reconstruction loss to ensure perceptual clarity.

The second approach introduces a significant architectural shift by eliminating the separate discriminator models. Instead, the generator is composed of three distinct parameter spaces: a content encoder, a style encoder, and a decoder. The content and style encoders extract latent feature vectors from their respective input images. These features are then fed into the decoder to generate the final style-transferred image. The key innovation lies in the training process: the same content and style encoder models are used as discriminators. During training, the encoders are optimized to minimize a loss function that encourages them to generate optimal, discriminative features for their respective inputs, effectively training them as part of the adversarial process. The content encoder is trained using a pairwise marginal loss function to ensure semantic consistency, while the style encoder is trained under a metrics learning objective to cluster embeddings of similar styles together. The generator is then trained to minimize the adversarial loss using these pre-trained encoders, ensuring that the generated image accurately reflects the style while preserving the content's semantics. This shared parameter space approach reduces the overall model complexity and improves training stability compared to the first method.

Experiment

Two GAN-based neural style transfer approaches were evaluated using separate style and content discriminators trained on distinct datasets to validate training stability and artistic fidelity. The first approach demonstrated consistent convergence and successfully transferred stylistic textures while preserving the original content colors without introducing visual artifacts. The second approach further validated the benefits of dynamic batch sampling and matrix-based loss calculation by significantly reducing overfitting and mode collapse while producing images that authentically emulate an artist's unique style rather than merely copying reference features. Overall, both methods outperform traditional CNN-based style transfer by delivering higher stylistic coherence and cleaner visual outputs.

The authors compare their two proposed approaches with existing methods in a the the table, highlighting that both approaches support training without paired samples and preserve original image color while avoiding alias artifacts. The results show that the proposed methods maintain a high degree of style transfer fidelity with minimal introduction of extraneous features or artifacts, particularly in terms of color palette and texture integration. Both approaches support training without paired samples and preserve original image color. The methods avoid introducing alias artifacts in generated images. The generated images maintain style fidelity with minimal extrinsic features or artifacts.

The evaluation compares two proposed methods against existing approaches to assess their effectiveness in unpaired style transfer. The experiments validate that both techniques successfully preserve original image colors and eliminate aliasing artifacts while achieving high stylistic fidelity. Qualitatively, the methods demonstrate robust texture integration and maintain the source color palette without introducing extraneous visual features, confirming their overall superiority in generating high-quality style transfers.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp