HyperAIHyperAI

Command Palette

Search for a command to run...

3年前

条件付き敵対的ネットワークを用いたニューラル芸術的スタイル転送

Pathirage N. Deelaka

ニューラルスタイル転送

RTX 5090のコンピュートリソースがわずか20時間分 $1 (価値 $7)
ノートブックへ移動

概要

ニューラル芸術的スタイル変換(NST)モデルは、有名な画像のスタイルを追加することで、単純な画像の外観を変更することができる。変換された画像は、それぞれのスタイル画像の同じアーティストによる作品と正確に一致するわけではないが、生成された画像は魅力的である。一般的に、訓練されたNSTモデルは特定のスタイルに特化しており、単一の画像がそのスタイルを表す。しかし、新しいスタイルで画像を生成するには、モデルの完全な訓練を含む煩雑なプロセスが必要となる。本研究では、スタイル画像に依存しないニューラルスタイル転送モデルへの一歩となる2つの手法を提案する。つまり、訓練されたモデルは、任意のコンテンツとスタイル画像の入力ペアに対して、意味的に正確な画像を生成できる。我々の革新的な貢献は、モデルアーキテクチャによってサイクル一貫性を保証する単方向GANモデルである。さらに、これによりモデルサイズが大幅に縮小され、効率的な訓練および検証フェーズが可能となる。

One-sentence Summary

The authors propose a unidirectional generative adversarial network for style-independent neural style transfer that enforces cyclic consistency through its architecture to eliminate full model retraining, yielding a compact design with efficient training that generates semantically accurate images across arbitrary content and style pairs.

Key Contributions

  • The paper introduces a style-image-independent neural style transfer framework that generates semantically accurate outputs for arbitrary content and style pairs without requiring full model retraining.
  • A novel unidirectional-GAN architecture is proposed that enforces cyclic consistency directly through its structural design, addressing the input limitations of traditional paired and single-image translation models.
  • This architectural formulation yields a substantially smaller model size while streamlining both the training and validation phases compared to conventional style transfer approaches.

Introduction

Neural style transfer enables computers to apply the artistic characteristics of one image to another, a capability that has become highly valuable for digital art creation and automated content generation. However, traditional models are typically locked to a single reference style, forcing users to retrain the entire network whenever a new aesthetic is desired. Prior GAN-based approaches also struggle with unpaired data requirements, restrictive single-input architectures, and the difficulty of cleanly separating content from style features. To overcome these bottlenecks, the authors develop a style-independent transfer framework using a unidirectional GAN that embeds cyclic consistency directly into the network structure. This architectural innovation removes the need for per-style retraining, substantially shrinks the model footprint, and delivers faster, more efficient training and validation cycles.

Method

The authors propose two distinct approaches to neural style transfer (NST) using generative adversarial networks (GANs), each with a unique architecture and training paradigm designed to address limitations of conventional CNN-based methods. The first approach, referred to as the rGAN model, employs a conditional GAN framework where a single generator is paired with two independent discriminators to separately assess the content and style of the generated image. The generator, based on a U-Net architecture with skip connections, takes both a content and a style image as inputs. It encodes features from the content image and a local-global fused representation from the style image into a latent space, which the decoder then uses to reconstruct a style-transferred image. The content discriminator, a PatchGAN, evaluates the realism of local image patches to prevent alias artifacts and preserve the original color palette of the content image. In contrast, the style discriminator is implemented as a wavelet convolutional neural network, designed to capture global and local features across multiple resolutions for effective style extraction. The overall objective function for the rGAN combines the adversarial losses from both discriminators with an L1 reconstruction loss to ensure perceptual clarity.

The second approach introduces a significant architectural shift by eliminating the separate discriminator models. Instead, the generator is composed of three distinct parameter spaces: a content encoder, a style encoder, and a decoder. The content and style encoders extract latent feature vectors from their respective input images. These features are then fed into the decoder to generate the final style-transferred image. The key innovation lies in the training process: the same content and style encoder models are used as discriminators. During training, the encoders are optimized to minimize a loss function that encourages them to generate optimal, discriminative features for their respective inputs, effectively training them as part of the adversarial process. The content encoder is trained using a pairwise marginal loss function to ensure semantic consistency, while the style encoder is trained under a metrics learning objective to cluster embeddings of similar styles together. The generator is then trained to minimize the adversarial loss using these pre-trained encoders, ensuring that the generated image accurately reflects the style while preserving the content's semantics. This shared parameter space approach reduces the overall model complexity and improves training stability compared to the first method.

Experiment

Two GAN-based neural style transfer approaches were evaluated using separate style and content discriminators trained on distinct datasets to validate training stability and artistic fidelity. The first approach demonstrated consistent convergence and successfully transferred stylistic textures while preserving the original content colors without introducing visual artifacts. The second approach further validated the benefits of dynamic batch sampling and matrix-based loss calculation by significantly reducing overfitting and mode collapse while producing images that authentically emulate an artist's unique style rather than merely copying reference features. Overall, both methods outperform traditional CNN-based style transfer by delivering higher stylistic coherence and cleaner visual outputs.

The authors compare their two proposed approaches with existing methods in a the the table, highlighting that both approaches support training without paired samples and preserve original image color while avoiding alias artifacts. The results show that the proposed methods maintain a high degree of style transfer fidelity with minimal introduction of extraneous features or artifacts, particularly in terms of color palette and texture integration. Both approaches support training without paired samples and preserve original image color. The methods avoid introducing alias artifacts in generated images. The generated images maintain style fidelity with minimal extrinsic features or artifacts.

The evaluation compares two proposed methods against existing approaches to assess their effectiveness in unpaired style transfer. The experiments validate that both techniques successfully preserve original image colors and eliminate aliasing artifacts while achieving high stylistic fidelity. Qualitatively, the methods demonstrate robust texture integration and maintain the source color palette without introducing extraneous visual features, confirming their overall superiority in generating high-quality style transfers.


AIでAIを構築

アイデアからローンチまで — 無料のAIコーディング支援、すぐに使える環境、最高のGPU価格でAI開発を加速。

AI コーディング補助
すぐに使える GPU
最適な料金体系

HyperAI Newsletters

最新情報を購読する
北京時間 毎週月曜日の午前9時 に、その週の最新情報をメールでお届けします
メール配信サービスは MailChimp によって提供されています