2018-07-22

Note: A Neural Algorithm of Artistic Style

[A Neural Algorithm of Artistic Style] presented an interesting way to disentangle style of a picture from content of a picture.

Using the convolution layers of VGG-Network trained for classification, both styles and contents of images could already be well extracted.

Def 1:
Define two image have the same content if they have the same filter response at every layer.
In short:
$$Loss_{def1}(img_1,img_2)=\sum_{l=0}^n{||F_{img_1}^l-F_{img_2}^l||_2}$$

where $F_{img}^l$ is the feature maps of $img$ at $l^{th}$ layer.

Def 2:
Define two image have same style if they have the same matrix of correlation among filter responses at every layer

Why define styles like this? Simple to remove spatial properties?
In short:

$$Loss_{def2}(img1,img2) = \sum_{n=1}^l{||corr(F_{img_1}^l)- corr(F_{img_2}^l)||_2}$$

Goal:
Find an image $X_{gen}$ that minimize $loss_{def1}(X_{content}, X_{gen})$ using the gradient desent where $X_{content}$ be the image providing layout of the result and $X_{gen}$ be the resultant image starting from white noise.
At the same time, $X_{gen}$ need to minimize $loss_{def1}(X_{style}, X_{gen})$, where now $X_{style}$ is the image of a painting of some sort used to provide only style.

Let $X_{gen}$ be our final generated image.

Why use white noise?

Reference

2015

A Neural Algorithm of Artistic Style

Gatys, L.~A. and Ecker, A.~S. and Bethge, M.

2015

[bibtex]

Berton's Workshop

A Record of Thought and Experience

Note: A Neural Algorithm of Artistic Style

Reference