0% found this document useful (0 votes)
10 views6 pages

ML_Seminar

This project report details the implementation of real-time neural style transfer using Adaptive Instance Normalization (AdaIN), which allows for fast and flexible stylization of arbitrary images without requiring style-specific training. The methodology includes a pre-trained encoder, an AdaIN feature transformation layer, and a trainable decoder, with training conducted on content and style images from distinct datasets. Results demonstrate the model's ability to generate high-quality stylized images in real-time while maintaining the content structure, confirming the effectiveness of the AdaIN approach.

Uploaded by

RUPESH KUMAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views6 pages

ML_Seminar

This project report details the implementation of real-time neural style transfer using Adaptive Instance Normalization (AdaIN), which allows for fast and flexible stylization of arbitrary images without requiring style-specific training. The methodology includes a pre-trained encoder, an AdaIN feature transformation layer, and a trainable decoder, with training conducted on content and style images from distinct datasets. Results demonstrate the model's ability to generate high-quality stylized images in real-time while maintaining the content structure, confirming the effectiveness of the AdaIN approach.

Uploaded by

RUPESH KUMAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Machine Learning Project Report

Real-Time Arbitrary Style Transfer Using Adaptive


Instance Normalization
Rupesh Kumar
Roll No: 221DS032
CDS 26
MACS, NITK
April 13, 2025

Abstract
This project implements real-time neural style transfer using Adaptive Instance
Normalization (AdaIN). The model performs artistic style transfer by aligning feature
statistics between content and style images. Unlike traditional methods that are slow
and require style-specific training, this approach enables fast and flexible stylization of
arbitrary images using a single feed-forward model. The implementation is based on
PyTorch and follows the architecture proposed by Huang and Belongie. This report
outlines the methodology, training setup, loss functions, and results of the model.

1 Introduction
Neural Style Transfer (NST) is a technique that recomposes the content of one image in
the visual appearance or ”style” of another. Traditional approaches, such as the method
introduced by Gatys et al. (2016), use a pre-trained convolutional neural network (e.g.,
VGG-19) to iteratively optimize a randomly initialized image so that its deep features match
those of a given content and style image. While this method yields visually impressive results,
it is computationally intensive and cannot be used in real-time applications.
Subsequent approaches attempted to accelerate the process by training feed-forward net-
works to mimic the optimization. However, these networks were either limited to specific
styles or required multi-style training and large models. To overcome these constraints, Xun
Huang and Serge Belongie introduced the concept of Adaptive Instance Normalization
(AdaIN) in 2017. AdaIN enables arbitrary style transfer in real-time using a single feed-
forward network by aligning the channel-wise mean and variance of content features to match
those of the style features.
This report is based on a PyTorch implementation of the AdaIN method, as provided in
the open-source repository: https://github.com/RUPESH-KUMAR01/AdaIn_style_transfer.

1
It includes detailed discussion of the architecture, loss functions, training procedures, and
evaluation results.

2 Methodology
The AdaIN model consists of three major components: a pre-trained encoder, the AdaIN
feature transformation layer, and a trainable decoder. The overall architecture is inspired
by the one proposed in the original AdaIN paper, and the implementation closely follows
the provided PyTorch code.

2.1 Encoder
The encoder is a truncated version of the VGG-19 network, stopping at layer relu4 1. It
is used to extract high-level feature representations of both the content and style images.
In the code, this is implemented by loading the pre-trained VGG-19 model and freezing its
parameters to avoid updates during training.

2.2 Adaptive Instance Normalization (AdaIN)


The core novelty of this approach lies in the AdaIN layer. Given encoded content features
fc and style features fs , AdaIN aligns their channel-wise statistics as follows:
 
fc − µ(fc )
AdaIN(fc , fs ) = σ(fs ) · + µ(fs )
σ(fc )
Here, µ(·) and σ(·) represent the channel-wise mean and standard deviation, respectively.
This transformation modifies the content features so that they have the same statistical
distribution as the style features, allowing the decoder to generate stylized outputs.
In the code, this is implemented using simple PyTorch operations over the feature maps.

2.3 Decoder
The decoder network is a symmetric convolutional network designed to reconstruct an image
from AdaIN-transformed features. It includes upsampling layers (nearest-neighbor interpola-
tion) followed by convolutional blocks. The decoder is trained from scratch while the encoder
remains fixed.
The goal of the decoder is to produce an output image that reflects the style of the
reference image while preserving the structure of the content image. In the implementation,
the decoder is trained using a joint content and style loss (described in the next section).

3 Training
3.1 Datasets
The training procedure uses two distinct datasets:

2
• Content images: Typically sampled from MS-COCO, a large dataset of natural
images.

• Style images: Sourced from artistic datasets such as WikiArt, which contain paintings
in various artistic styles.

Each training batch samples one content and one style image randomly. The images are
resized and normalized before being passed through the encoder.

3.2 Optimization
Only the decoder is trained. The encoder (VGG-19) is kept frozen. The Adam optimizer is
used with:

• Learning rate: 1 × 10−4

• Batch size: 8

• Number of iterations: 200,000

The training objective is to minimize a weighted combination of content and style losses,
encouraging the decoder to generate outputs that resemble the style image in appearance
but retain the content structure.

3.3 Training Pipeline


The training pipeline proceeds as follows:

1. Load and normalize one content and one style image.

2. Extract their features using the encoder.

3. Apply AdaIN to obtain target features.

4. Decode the target features to an output image.

5. Compute content and style losses.

6. Backpropagate the total loss and update the decoder.

4 Loss Functions
The training objective consists of two parts: content loss and style loss. Both are computed
using features extracted from the encoder.

3
4.1 Content Loss
The content loss ensures that the output image maintains the structure and semantics of the
content image. It is computed as the Euclidean distance between the AdaIN-transformed
features and the features of the output image:

Lc = ∥f (g(t)) − t∥22
where f is the encoder, g is the decoder, and t is the AdaIN-transformed target feature.

4.2 Style Loss


The style loss encourages the output image to have the same channel-wise mean and standard
deviation as the style image across multiple layers of the encoder:
X
∥µ(ϕi (Iout )) − µ(ϕi (Istyle ))∥22 + ∥σ(ϕi (Iout )) − σ(ϕi (Istyle ))∥22

Ls =
i

Here, ϕi denotes the features extracted from the ith layer of the encoder.

4.3 Total Loss


The final loss used during training is:

Ltotal = Lc + λLs
The hyperparameter λ balances content and style. In practice, λ = 10 gives good results.

5 Results
5.1 Stylized Output
The model is capable of generating high-quality stylized images that retain the spatial struc-
ture of the content image while adopting the visual appearance (colors, textures, brush-
strokes) of the style image. This is achieved without training a separate model for each
style, demonstrating the flexibility of AdaIN-based style transfer.
Figure 1 shows a sample result, where the content image has been transformed with
the style of a reference painting. The model generalizes well to unseen styles and produces
results in real-time, making it suitable for interactive applications.

4
Figure 1: Example of stylized image output. The output preserves the structure of the
content image while adopting the artistic characteristics of the style image.

5.2 Loss Curve


The training process shows stable convergence. The loss graph (Figure 2) plots the combined
content and style loss over training iterations. A consistent downward trend is observed,
indicating effective learning by the decoder.
As the model learns to balance the content and style objectives, the stylized outputs
improve in perceptual quality. Notably, the model avoids overfitting to particular styles
due to the arbitrary nature of AdaIN and the wide variety of training pairs sampled during
training.

Figure 2: Training loss over time. The total loss combines content and style objectives.

5
6 Conclusion
This project successfully reimplements the AdaIN-based style transfer architecture for real-
time stylization using PyTorch. The approach demonstrates how statistical alignment of
feature maps using AdaIN allows arbitrary style transfer without requiring per-style retrain-
ing. The encoder is fixed, and the decoder is trained to reconstruct stylized outputs using a
content-style trade-off loss.
The results confirm that AdaIN produces perceptually convincing stylized images that
preserve the content structure while adopting artistic styles. Compared to earlier methods,
this model is lightweight and fast enough for real-time use.
Future improvements could include temporal consistency for video stylization, user-
guided controls for blending styles, or improving high-frequency texture detail in stylized
results.

References
• Huang, Xun, and Serge Belongie. ”Arbitrary Style Transfer in Real-Time with Adap-
tive Instance Normalization.” arXiv preprint arXiv:1703.06868 (2017).

• Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. ”Image Style Transfer
Using Convolutional Neural Networks.” In Proceedings of the IEEE conference on
computer vision and pattern recognition. 2016.

• GitHub Repository: https://github.com/RUPESH-KUMAR01/AdaIn_style_transfer

You might also like