Neural Style Transfer
Have you ever wondered about creating own artistic pictures using specific styles? Then Neural Style Transfer will make you do that.
it is possible to separate the style representation and content representations in a CNN, learnt during a computer vision task (e.g. image recognition task).
In this blog, I am going to tell you guys about Neural style transfer fundamentals for basic understanding.
Below is illustration of Neural style transfer
Neural Style Transfer(NST) is to create a new image from already existed images. We will generate our New image by taking content from one image and style from another image. It is like applying effects over the image.
In NST, we will extract those effects(style) from second image and main content from first image and combine them to form a new image.
Above is short overview of what is Neural Style Transfer.
From Now onwards i am going into the high level detailed version of how to generate an image from content image(first image) and style image(second image).
Below is structure of style transfer optimization.
In above picture as you see we will be using Two neural networks to update inputs of middle neural network. first NN is of content image, second one is generated image and last one is Style image.
Here comes interesting part, how to extract content and style from images.
Content:- For getting content from image will be taking layer activations of specific layer from Content Neural Network. In image we have taken activations from layer 2. Here content refers to activations of certain layer.
In Deep neural networks as we go deeper we will be capturing more complex shapes of an image. In initial layers we will end up extracting features like edges etc…, as we go deeper we will end up with features such as shapes of objects in an image. Below is illustration of that.
As you can see in each layer our model will build features from simple to complex structures.
Now the most important point is what is meaning of style in an image we talking about here , Style is simply the correlation between each filter in layers output. For example If you apply some 32 filters you will end up with 32 feature matrices then we will take those activations(features) and find correlation fro each matrix, then we will end up with style of that feature(most repeating pattern in image). If correlation of specific feature is high then we can conclude that it is frequent pattern in image. We use those features for styling generated image.
I am not going that much deeper into coding part but i will cover points which one has to know in order to understand NST.
Below is the some code snippet to understand our NN architecture
graph = {}
graph['input'] = tf.Variable(np.zeros((1, IMAGE_HEIGHT, IMAGE_WIDTH, COLOR_CHANNELS)), dtype = 'float32')
graph['conv1_1'] = _conv2d_relu(graph['input'], 0, 'conv1_1')
graph['conv1_2'] = _conv2d_relu(graph['conv1_1'], 2, 'conv1_2')
graph['avgpool1'] = _avgpool(graph['conv1_2'])
graph['conv2_1'] = _conv2d_relu(graph['avgpool1'], 5, 'conv2_1')
graph['conv2_2'] = _conv2d_relu(graph['conv2_1'], 7, 'conv2_2')
graph['avgpool2'] = _avgpool(graph['conv2_2'])
graph['conv3_1'] = _conv2d_relu(graph['avgpool2'], 10, 'conv3_1')
graph['conv3_2'] = _conv2d_relu(graph['conv3_1'], 12, 'conv3_2')
graph['conv3_3'] = _conv2d_relu(graph['conv3_2'], 14, 'conv3_3')
graph['conv3_4'] = _conv2d_relu(graph['conv3_3'], 16, 'conv3_4')
graph['avgpool3'] = _avgpool(graph['conv3_4'])
From the You can come to know that we have input image as an variable which will be updated in training. It is most important here will use make all weights and biases of network constants and only input image will be variable. Here we are making input as first layer in network.
Initially we will create generated image as a noise image of content image so that output will be little bit similar to content image.
Now , We will need to define cost functions for pairs(content,generated) and (style , generated) activations.
We will get activations of layer thorough passing out content image to NN model and extracting that activations from Network same for style image as well. But here in style image we can take activations from multiple layers.
By using activations from both NN we will define cost functions
Above is total cost function for NST.
lets talk about them one by one:-
For content loss, we will be calculating how similar both images are.
here in that function ‘g’ is of generated image and ‘c’ is for content image activations. Our goal here is to reduce this cost funtion so that we will make generated image as similar as possible to content image.
Above one is for style cost function in this we will consider two parts calculating gram matrix for style image and generated image and computing cost oh wait what is gram matrix? here it is gram matrix is nothing but style matrix in which we will find correlation between each feature to another feature as explained above about meaning of style. After that we will try to reduce style cost so that generated image get as much style as possible from style image.
By optimizing total cost function by adjusting alpha and beta (to adjust amount of style and content we want in out generated image) we will be replacing pixel values of generated image like shown in Figure 1.
Entire training process will be responsible for updating input data again and again so that our cost will be less and image is mostly similar to content image and style images.
This blog is just high level overview of neural style transfer. I hope you have understood about what is what in neural style transfer. Thank you..,