Note: A Neural Algorithm of Artistic Style

[A Neural Algorithm of Artistic Style] presented an interesting way to disentangle style of a picture from content of a picture.

Using the convolution layers of VGG-Network trained for classification, both styles and contents of images could already be well extracted.

Def 1:
Define two image have the same content if they have the same filter response at every layer.

In short:
$$Loss_{def1}(img_1,img_2)=\sum_{l=0}^n{||F_{img_1}^l-F_{img_2}^l||_2}$$

where $F_{img}^l$ is the feature maps of $img$ at $l^{th}$ layer.

Def 2:
Define two image have same style if they have the same matrix of correlation among filter responses at every layer

  • Why define styles like this? Simple to remove spatial properties?
    In short:

$$Loss_{def2}(img1,img2) = \sum_{n=1}^l{||corr(F_{img_1}^l)- corr(F_{img_2}^l)||_2}$$

Goal:
Find an image $X_{gen}$ that minimize $loss_{def1}(X_{content}, X_{gen})$ using the gradient desent where $X_{content}$ be the image providing layout of the result and $X_{gen}$ be the resultant image starting from white noise.
At the same time, $X_{gen}$ need to minimize $loss_{def1}(X_{style}, X_{gen})$, where now $X_{style}$ is the image of a painting of some sort used to provide only style.

Let $X_{gen}$ be our final generated image.

  • Why use white noise?

Reference


2015

  • A Neural Algorithm of Artistic Style

    Gatys, L.~A. and Ecker, A.~S. and Bethge, M.



    2015




    [bibtex]