Adding time to Deep Learning, Jurgen Schmidhuber 2
Time lets the sun rise through dawn and fall into evening.
Space(the universe) in Chinese Character holds both the idea of space and time.
The Eastern ancient sages did not see “space(as in room)” as just a room between objects.
They processed it in the viewpoint of time. It is indeed a startling insight.
Now, turning our attention back to the world of Artificial Intelligence,
Recognizing handwritings or images was being somewhat completed via cat’s cortex research,
Neocognitron and CNN(Convolutional Neural Network).
Of course, the challenge never stops there. Like they always do, humans come with another challenge.
We need the idea of ‘time’ to understand the natural language and voice.
RNN, Recurrent Neural Network, has embodied the concept of time.
On the original neural network, RNN took time series into account.
However, there was a problem, called the Vanishing Problem, to this RNN.
In the process of training, ‘weight’ plays an important role.
When the weights repeat the number smaller than 1, the gradient eventually converges to 0.
If the weights repeat the number bigger that 1, it converges to a bigger number.
Then, the accuracy is not guaranteed.
Thankfully, we have an experience to solve this problem.
Place: Movies serialize still images, and add afterimages to previous images to have a moving effect.
Based on this principle, Jurgen Schmidhuber and his student Sepp Hochreiter comes up with
LSTM(Long Short-Term Memory), which made a breakthrough in RNN.
Sepp Hochreiter first thought of this concept in 1991, when he was writing his doctoral dissertation,
and made a formal offering to the world in 1997.
Come to think of it, humans might be studying the nature itself. We are studying the principle
of natural phenomenon, rather than artificial things.
Nevertheless, nothing is perfect. Thanks to the LSTM, accuracy of RNN dramatically improved.
Due to many calculations, however, there rised a problem of low performance.
To solve this kind of problem, Kuanghyun Cho, a young scientist at NewYork University,
is suggesting a model called GRU(Gated Recurrent Unit).