Rejailbreak Kindle PaperWhite3 After Automatic Upgrade to 5.9.2/5.9.3/5.9.4/5.9.5/5.9.6/5.9.7

(Tested to be working with later versions)
My kindle pw3 seems to be automatically updated from 5.9.1 to 5.9.2 and I am unable to run KUAL anymore. To fix this, the following is performed and I am not sure whether any of these steps are necessary.But I did get everything back without losing any data/plugins.

  1. Applied a jailbreak hotfix Update_jailbreak_hotfix_1.14_nomax_install.bin. File found at here. Patched by copying it to the root fs mounted on computer and manually select “update” in menu.

  2. Applied Update_KUALBooklet_v2.7_install.bin and update_kpvbooklet_0.6.6_install.bin. File found at here and here. Patched by copying it to /mrpackages and searching ;log mrpi.

This process is recorded hoping to be helpful. Do notice that your devices may not work just as mine and I will not be responsible for any following consequences happened to your device.

2017/11/15

Use System Memory Together with Graphic Memory in TensorFlow

The default options in tensorflow uses graphic memories only and all the tensors are allocated on boot. These are relatively faster, but are also disaster when using a graphic card with lower memories, and I am always getting OOMs from my model.

There are two ways to ease the issue, which are to enable tensorflow to utilize system memory and disable tensorflow to allocated all memory blocks on boot.

    config = tf.ConfigProto()
    config.gpu_options.allow_growth = True
    config.gpu_options.per_process_gpu_memory_fraction = 0.9
    with tf.Session(config=config) as sess:
        code to run...
        pass

The attribute config.gpu_options.per_process_gpu_memory_fraction specifies the fraction of maximum graphic memory to use before using system memory.

2017/11/8

Thoughts For Possible New Network Structures

These are my own ideas of how might the neural network architecture may be improved, as a note reminding my further implementation of them

Measure for Overfitting

TODO: Add measure in respect to acc.
TODO: Compare $O$ with normal CNN/DNN.

We does need some measure to see if/how the network is overfitting the training data.

My first thought is just use $O=LOSS(X_{train})-LOSS(X_{test})$. This seems to be working, and the value stays at around $0\pm0.04$, which means sometimes the network performs better in the training set($O>0$) and sometimes the opposite($)<0$).

However, after accuracy reaches $99.2\%$, the frequency of observing a positive overfit index $O$ increases and the average index also increase.

To compare this index with the overfitting condition of other networks, I need to first somehow add the accuracy into the calculation of the index and then compare them on other architectures.
2017/11/21

Generator For Cheating a Network

What would happen if we first train a classifier to classify image, and then train another network where it distort an input image to maximize the change in the estimation done by the fore-mentioned classifier at the same time minimize the distortion to the image?

This could be implemented using a new cost function for the second network.

Implementing.
2017/10/27

I failed to implement this structure in tensorflow in that rather than an actual value, I seems to be getting the uninitialized tensor representation of the output rather than the actual value and I failed to replace the input of the original network with that tensor.
2017/10/29

I found the Generative Adversarial Network (GAN) model which there isn’t that much of a difference between my idea and the model. However, using another network to probe the weak points of an existing network is likely to be a possible idea of improving it.
2017/10/30

This looks like a possible way of modifying the GAN model in that while the discriminator and the generator in the original GAN model works against each other, maybe we can make the discriminator to to create augmented data where it previously wrongly classified. Testing.
2017/10/31

I’m pretty sure this will work, but there are more to it. If I am to train a generator to modify the minimum amount to a picture and at the same time cheating the discriminator, this may work. However, if I am to use the modified image to retrain the network, things may not be as the same… A badly written 6 might be twisted into something like 8 from even human perspective and training the discriminator with the image of 8 and label of 6 is not likely to improve its performance.

However, adding the regularization term forcing the generated image to be different from the training set may be a good idea for generators to generate hopefully more creative samples. To be tested, too.
2017/11/1

This is indeed different from regular gan. It looks working but further experiments are to be done.
2017/11/2

Seems working. Work hanged for something else.
2017/11/8

This is called Generative Poisoning Attack and extensive research seems to has been performed on it.
2017/11/8

Ambiguity Measure for Dropout

Since increasing ambiguity is a good thing to do in ensembled classifiers, and dropout is essentially a cheap way to ensemble networks, why don’t we add another regularization term in the cost function to increase the ambiguity while dropout?
2017/11/1

Certainty Measure for the Result

According to my observation, it’s weird that a classifier networks always give a clear indication of the result and there is no indication for “Uncertain”. In another word, it seems unlikely for multiple/none of the output nodes to fire at the same time.

The above may not necessarily be true, but it’s obvious that when a classifier network is given a sample that is totally different from any of the samples in the training set, the result tends to be coming from the weighted combination of the training examples. Will it be possible if we add another class to contain everything else and feed in random noises?

Unlikely to be useful.
2017/11/1

Recursive Conv Net

What would happen is we make a convolution neural network recursive? Will this somehow benefit video generation?

To be tested.
2017/10/27

It exists…
2017/10/30

Dynamic Cost Function

Is it possible if we make the cost function dynamic and change according to the condition so the convergence could be faster? Or is it possible to even use RNN to generate cost function? How would that help? What change would that bring about?
To be tested.
2017/10/27

Flip network

According to here, it looks like computers may have different speed between adding and subtracting numbers. If we flip the network so all addition became subtraction and vise versa (Then we might be maximizing the “cost” function), will it be any faster?

To be tested.
2017/10/27

to_categorical() missing 1 required positional argument nb_classes caused by tflearn version mismatch

TypeError: to_categorical() missing 1 required positional argument: 'nb_classes' was encountered while I was trying to run an example program based on it.

The function to_catagorical() used to require two arguments, where the second is called nb_classes and represents the total class numbers in classification as an integer.

It’s not until this pr that the second function is no longer needed but the support for the previous usage is dropped and all the examples are updated to adapt to the new usage.

However, until today, the default version installed directed from pip is not new enough to support this new feature, thus causing a problem.

To solve it, either try update to the newest version via pip install git+https://github.com/tflearn/tflearn.git or just add the class number.

2017/10/25

Note on 1x1 Convolutions

What does it do?

This is pretty straight forward, just like normal convolution operation, it converts a piece of data structured [channels, height, width] to [kernal_numbers, same_height, same_width], using a set of learn-able parameters in the size of [kernel_number, channel_number, weight_for_this_channel]. This operation is usually just a meaningless scaling by a constant unless it is performed across channels, which is usually called a cascade cross channel pooling layer.

What’s the purpose?

Along all materials read, it seems like this operation is widely used on the following purposes:

Dimension augmentation/reduction.

This operation is capable of mapping an input of any channel to an output of any channel while preserving the original size of the picture. Do be aware that such operation, especially in dimensionality augmentations, uses large amount of extra parameters, making the network possible more prone to overfitting.

Rescaling the last layer.

In the above it’s mentioned that 1x1 convolution could serve as a plain scale of a whole channel, which is usually unnecessary. However, if such need exist, it could be met though this operation.

Increasing non-linearity.

The fact that it involves a non-linear mapping without drastically altering the input, the non-linearity of the network is increased without using plain fully-connected layers which destroys the relationship between nearby pixels. At the same time, the original size of the input could be preserved.

The (Network in Network)NIN Structure

This concept first appeared to me while reading a research paper called network in network, which seems to be another powerful modification made on CNNs. One of the key aspect this paper adapted is using a multilayer perceptron model to replace the traditional convolution kernel.

As I wonder how could this model be implemented using higher level api of popular machine learning libraries without modifying the lower level codes, the paper actually stated that such operation of sliding a mini-MLP over a picture across the previous channels is equivalent to cross channel convolution with 1x1 kernels, where a few new 1x1 convolutional layer(CCCP) is appended to a normal convolutional layer to reach the goal.

The fact that this CCCP operation appending to a normal convolution layers will make a equivalent MLP serves as a sliding convolution kernel, is hard to imagine to be true. It creates confusion in my understanding and it is not until I unroll the whole process so it could be under stood.

Rather than 2d convolution, using 1d convolution makes things more straight forward while the same rule applies to arbitrary dimensions of convolution.

Suppose we have an 1d input with 2 channels:





And we perform a normal convolution with one kernel of 2.





After appending a 1x1(in 1d convolution, just 1) convolution layer with two kernels, it looks like:





And another 1x1 layer with two kernels added:





It’s not hard to see, that this structure is indeed a sliding MLP layer with the input size of the convolution size of the layer appended to, the depth of the number of 1x1 layers appended and the hidden unit numbers of the product of all kernel numbers in the hidden unit, the input numbers and the last layer.

The Inception module

The inception module is first used in the googlenet architecture, and proved to be really useful.

picture retrieved from this website

What this module does is really just adjoining all the output of different size of convolution/pooling together and let the network to choose which to use itself. The pro of this method is that the network is made more resistant to shift in sizes of the target, and the manually adjusting size of the kernels is no longer required–we got most of the possible sizes needed all here.

As a result, the 1x1 convolution naturally became one of the choices.

Ideas for Future Research

As I read, neurons that fired together creates a relationship between each other and every one of them got easier to fire next time given the condition that the related neurons are fired.

I am thinking that neurons in regular DNNs does not have any knowledge of the state of other neurons in the same layer, thus maybe it would be possible to create such a relation, to somehow create “logic” for networks?

To do so, maybe we need another set of weights in each neuron used to scale the states of other neurons in the same layer and add to the output, somehow like this:
$$
\vec{y_{n}}=(\mathbf{W_n}^T \times \vec{y_{n-1}} + \vec{b_n})+\mathbf{W_{new}}^T (\mathbf{W_n}^T \times \vec{y_{n-1}} + \vec{b_n})
$$

If the formula is not displayed correctly, please allow unsecure(HTTP) cross site scripts in your browser.

The new weight matrix for the new output will be an n*n matrix considering n as the size of the output.

The detailed implementation is to be researched.

Edit: Okay… This seems to be just an really stupid way of adding an extra layer, and won’t really make any difference from just adding one at all. (2017/10/27)

References:
http://blog.csdn.net/yiliang_/article/details/60468655
http://blog.csdn.net/mounty_fsc/article/details/51746111
http://jntsai.blogspot.com/2015/03/paper-summary-network-in-network-deep.html
https://www.zhihu.com/question/64098749

getopt() function on raspery pi

While the getopt() function provided by getopt.h usually returns -1 after there are no more arguments. But probably due to certain issues relating to treatment of signs of integers on arm-devices (to be researched), it might return 255 instead when out of arguments.

2017/10/23

The Harmonic Series: How Comes the Divergence

IF FORMULAS ARE NOT DISPLAYED CORRECTLY, PLEASE ALLOW UNSECURE(HTTP) CROSS SITE SCRIPTS

What are the Harmonic Series?

From Wikipedia, Haronmic Series)

In mathematics, the harmonic series is the divergent infinite series: Its name derives from the concept of overtones, or harmonics in music: the wavelengths of the overtones of a vibrating string are $1/2, 1/3, 1/4$, etc., of the string’s fundamental wavelength. Every term of the series after the first is the harmonic mean of the neighboring terms; the phrase harmonic mean likewise derives from music.

It is basically a series defined like this:
$$
\sum_{x=1}^{\infty} \frac{1}{x} = \frac{1}{1} + \frac{1}{2} + \frac{1}{3} + \ldots + \frac{1}{\infty}
$$
and it feels, intuitively, to be converging. However, it’s actually the opposite.

Why are they diverging?

Mathematical Perspective

There are multiple ways to think about it and I find the following the easiest to understand.

  1. Use $f(x)=\frac{1}{x}$ to regress the series and take an infinite integral:
    $$
    \int_{1}^{\infty} f(x) dx = ln(x)
    $$
  2. From the above, if we are taking a finite integral
    $$
    ln(x)|_{1}^{\infty} = \infty
    $$
    , we can see that the result diverges into infinity.

Intuitive Understanding

Even if the fact that this series is diverging seems to be mathematically correct, it’s really weird to think about it. Here’s an intuitive way of thinking it.

For any infinite series, think of is as doing two things together:

  • The fact it’s “infinite” is trying to push the sum of it into $\infty$, in another word, divergence.
  • The fact that there is an altered term in every new sum added, is trying to push the sum into a finite value by making the newly generated value as small as possible, thus converging.

And here we have a race between the two forces and by comparing the speed of change the two forces create, we know what’s the result.

Let’s look at a similar series, that looks a lot alike to our series but actually converges.
$$
\sum_{x=1}^{\infty} \frac{1}{x^n} = \frac{1}{1^n} + \frac{1}{2^n} + \frac{1}{3^n} + \ldots + \frac{1}{\infty^n}
$$
The $n$ here is the key and in the Harmonic Series it equals 1. At here, let’s use $n=2$ as an example to create our new series.
$$
\sum_{x=1}^{\infty} \frac{1}{x^2} = \frac{1}{1^2} + \frac{1}{2^2} + \frac{1}{3^2} + \ldots + \frac{1}{\infty^2}
$$
Over the course of adding the first term to the $\infty^{st}$ term, looking at the first force, we have a course of:

  • 1 -> $\infty$ with step length of 1
    (Adding one more term each iteration)
    Over the course of adding the first term to the $\infty^{st}$ term, looking at the second force, we have a course of:
  • 1 -> $\infty$ with step length of $\frac{1}{x^2-(x-1)^2}$, which proved to be $\frac{1}{step_{x-1}+2}$.
    (Since we are comparing the force of shrinking a value, we can ignore the upper part of the fraction, which makes the step ${step_{x-1}+2}$)
    For example, the steps between term_1, 2, 3, 4 are 3, 5 and 7.

This way, we can compare the first force with the second:
$$
\frac{1}{step_{x-1}+2} < 0
$$
which means in all steps the speed convergence of this series is larger than the speed of divergence, which makes its overall behavior convergence.

Let’s look at another example series where n=0.5:
$$
\sum_{x=1}^{\infty} \frac{1}{x^{0.5}} = \frac{1}{1^{0.5}} + \frac{1}{2^{0.5}} + \frac{1}{3^{0.5}} + \ldots + \frac{1}{\infty^{0.5}}
$$
(The steps between term_1, 2, 3, 4 are 0.4, 0.3 and 0.25.)
And compare the two forces of it:
$$
\frac{1}{(x^{0.5}-(x-1)^{0.5})^2} > 0
$$
which means in all steps the speed convergence of this series is smaller than the speed of divergence, which makes its overall behavior divergence.

Finally, time to deal with our n=1 series. We surprisingly found that the speed of divergence is equal to the speed of convergence, which make it lies at the critical area that yet to be defined.

However, we know that a series have to be either converges or diverges. Think of it like this: a man chasing another man holding a finish line, both of them are at the same speed. The fact that they are chasing each other makes sure the finish line is ahead of the man. This way, the man never gets to the finish line, thus making the final pattern diverges.

To draw a conclusion:
$$
\sum_{x=1}^{\infty} \frac{1}{x^n}$$ Converges only if $n>0$

Extending the question

From the above we can tell that the Harmonic Series indeed diverges, but at the critical speed where any tiny change in its speed of divergence/convergence could knock it into another side. What about we try to give it a knock?
These problem has already been discussed and the modified series are called Depleted Harmonic Series.

It’s easy to conclude that changes like pulling away all the recipicals of odd numbers will cause this series to converge.

Prime Numbers?

What about pulling away all the recipicals of prime numbers? I heard from a guy on a forum that it still diverges, but I haven’t read any proves.

This sounds really anti-intuitive because we are excluding an infinite series out of another infinite series(Really? Not proved yet…) where its divergence speed is too low that any reduction will cause it to converge.

There is yet another possible way of thinking this.

Due to the unknowns of prime numbers, I will replace them with “any series that gets sparser as it grows”. We know that making a series of all the terms pulled out, this new series actually converges. So we are excluding a finite value out of infinite, which still, equals infinite.

I can almost immidiently feel that a lot of my logics are really sloppy, please point out if anything is wrong.

2017/10/17

Update: ok… my understanding seems to be a bit off. It’s still possible to extract a divergent series out of it to make it still diverges…
2017/10/19

"libcusolver.so.8.0: cannot open shared object file" error while importing tensorflow

After installing the gpu version of the tensorflow, I got this error while importing it:

>>> import tensorflow as tf
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 41, in 
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in 
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/usr/lib/python3.5/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/lib/python3.5/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: libcusolver.so.8.0: cannot open shared object file: No such file or directory

None of the solutions online, like setting the environmental variables to cuda installation.

I tried locate libcusolver.so and none of the paths refers to the version I wanted. This is because tensorflow only works with cuda 8.0 but the newest version of cuda is installed. (In my case, 8.0).

Because I am using ubuntu and installed cuda through ppa, I simply installed cuda 8.0 through sudo apt-get install cuda-8-0 and everything worked.

2017/10/16