Adjustable Real-time Style Transfer

1. Introduction

Style transfer: Synthesizing an image with content similar to a given image and style similar to another.

There are two main problems in style transfer on the existing methods. (i) The first weak point is that they generate only one stylization for a given content/style pair. (ii) One other issue of them is their high sensitivity to the hyper-parameters.

1.2 Goal

To solve these problems, the authors provide a novel mechanism that allows adjustment of crucial hyper-parameters, after the training and in real-time, through a set of manually adjustable parameters.

2. Background

2.1 Style transfer using deep networks

Style transfer can be formulated as generating a stylized image $p$ whose content is similar to a given content image $c$ and its style is close to another given style image $s$ .

$p = Ψ (c, s)$

The similarity in style can be vaguely defined as sharing the same spatial statistics in low-level features of a network (use VGG-16 in this paper), while similarity in content is roughly having a close Eculidean distance in high-level features.

The main idea is that the features obtained by the network contain information about the content of the input image while the correlation between these features represents its style.

In order to increase the similarity between two images, minimize the following distances between their extracted features:

$\begin{array}{l} L_{c}^{l} (p) = ‖ ϕ^{l} (p) - ϕ^{l} (s) ‖_{2}^{2} \dots E q . (1) \\ L_{c}^{l} (p) = ‖ G (ϕ^{l} (p)) - G (ϕ^{l} (s)) ‖_{F}^{2} \dots E q . (2) \end{array}$

where

$ϕ^{l} (x)$ : Activation of a pre-trained network at layer $l$ .
- $x$ : Given the input image.
$L_{c}^{l} (p),; L_{c}^{l} (p)$ : Content and style loss at layer $l$ respectively.
$G (ϕ^{l} (s))$ : Gram matrix associated with $ϕ^{l} (p)$ .
- Gram matrix $G_{i j}^{l} = \sum_{k} ϕ_{i k}^{l} ϕ_{j k}^{l}$ : Variance of RGB between image textures.

The total loss is calculated as a weighted sum of losses a set of content layers $C$ and style layers $S$ :

$L_{c} (p) = \sum_{l \in C} w_{c}^{l} L_{c}^{l} (p), L_{s} (p) = \sum_{l \in S} w_{s}^{l} L_{s}^{l} (p) \dots E q . (3)$

where $w_{c}^{l}, w_{s}^{l}$ are hyper-parameters to adjust the contribution of each layer to the loss. The problem is that these hyper-parameters have to be manually fine-tuned through try and error.
Finally, the objective of style transfer can be defined as:

$m i n_{p} (L_{c} (p) + L_{s} (p)) \dots E q . (4)$

2.2 Real-time feed-forward style transfer

We can solve the objective in Eq. (4) using the iterative method but it can be very slow and has to be repeated for any given input.
A much faster method is to directly train a deep network $T$ which maps a given content image $c$ to a stylized image $p$ . $T$ is a feed-forward CNN (parameterized by $θ$ ) with residual connections between down-sampling and up-sampling layers and is trained on content images like as:

$m i n_{θ} (L_{c} (T (c)) + L_{s} (T (c))) \dots E q . (5)$

The second problem from this is that this generates only one stylization for a pair of style and content images.

3. Proposed Method

To address the two issues that are mentioned above, the authors condition the generated stylized image on additional input parameters where each parameter controls the share of the loss from a corresponding layer.

As for figural style, they enable the users to adjust $w_{c}^{l}, w_{s}^{l}$ without retraining the model by replacing them with input parameters and conditioning the generated style images on these parameters:

$p = Φ (c, s, α_{c} α_{s})$

$α_{c}$ and $α_{s}$ are vectors of parameters where each element corresponds to a different layer in content layers $C$ and style layers $S$ respectively. $α_{c}^{l}$ and $α_{s}^{l}$ replace the hyper-parameters $w_{c}^{l}$ and $w_{s}^{l}$ :

$L_{c} (p) = \sum_{l \in C} α_{c}^{l} L_{c}^{l} (p), L_{s} (p) = \sum_{l \in S} α_{s}^{l} L_{s}^{l} (p) \dots E q . (6)$

To learn the effect of $α_{c}$ and $α_{s}$ on the objective, the authors use a technique called conditional instance normalization.
This method transforms the activations of a layer $x$ in the $T$ to a normalized activation $z$ which is conditioned on additional inputs $α_{=} [α_{c}, α_{s}]$ :

$z = γ_{α} (\frac{x - μ}{σ}) + β_{α} \dots E q . (7)$

where $μ$ and $σ$ are mean and standard deviation of activations at layer $x$ across spatial axes and $γ_{a l p h a}$ , $β_{α}$ are the trained mean and standard deviation this transformation. These parameters can be approximated using a second neural network (in here, MLP) which will be trained end-to-end with $T$ :

$γ_{α}, β_{α} = Λ (α_{c}, α_{s}) \dots E q . (8)$

Since $L^{l}$ can be different in scale, do normalize them using their exponential moving average as a normalizing factor, i.e. each $L^{l}$ will be normalized to:

$L^{l} (p) = \frac{\sum_{{} i \in C \cup S} \bar{L^{i}} (p)}{\bar{L^{l}} (p)} \dots E q . (9)$

where $\bar{L^{l}} (p)$ is the exponential moving average of $L^{l} (p)$ .

4. Experiment setting

They trained $T$ and $Λ$ jointly by sampling random values for $α$ from $U (0, 1)$ . And they used ImageNet as content images while using paintings from Kaggle Painter by Numbers and textures from Descibable Texture Dataset as style images for training.
Similar to previous approaches, they used the last feature set of conv3 as content layer $C$ and used the last feature set of conv2, conv3, conv4 layers from VGG-19 network as style layers $S$ . Since there is only one content layer, they fix $α_{c} = 1$ .

5. Experiment

Reference

Babaeizadeh, Mohammad, and Golnaz Ghiasi. "Adjustable real-time style transfer." arXiv preprint arXiv:1811.08560 (2018).

저작자표시

'AI paper review > Mobile-friendly' 카테고리의 다른 글

[MobileOne] An Improved One millisecond Mobile Backbone 논문 리뷰 (0)	2022.06.25
EfficientFormer: Vision Transformers at MobileNet Speed 논문 리뷰 (1)	2022.06.08
Lite Pose 논문 리뷰 (0)	2022.04.18
MobileViT 논문 리뷰 (0)	2022.03.28
EfficientNetv2 논문 리뷰 (0)	2022.03.24

1. Introduction
1.1 Motivation
1.2 Goal
2. Background
2.1 Style transfer using deep networks
2.2 Real-time feed-forward style transfer
3. Proposed Method
4. Experiment setting
5. Experiment
Reference

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Adjustable Real-time Style Transfer

1. Introduction

1.1 Motivation

1.2 Goal

2. Background

2.1 Style transfer using deep networks

2.2 Real-time feed-forward style transfer

3. Proposed Method

4. Experiment setting

5. Experiment

Reference

'AI paper review > Mobile-friendly' 카테고리의 다른 글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역