Deep Learning Initialization, just as important as Learning
A fine education in a good environment allows you to grow as a great person.
However, apart from the education and experience, 
one’s growth also heavily relies on what gene you are born with. Just like BTS!
The same also applies in the field of Deep Learning.
In 1986, David E. Rumelhart and Geoffrey Everest Hinton came up with back propagation algorithm 
and solved the problem of Multi Layer Neural Network.
However, as the Vanishing Gradient problem appeared in deeper neural network,
its field entered a long winter.
Until then, most of the research focused on how to conduct better learning
for the given initial value through gradient descent method.
In 2006, Geoffrey Hinton’s idea outside the box provides a hint to solve the problem.
In his paper 'A fast learning algorithm for deep belief nets', 
Geoffrey Hinton highlighted the importance of initialization for better learning in deep neural network. 
With the method, he suggested the initialization by connecting two layers 
by Restricted Boltzmann Machine, RBM. 
The RBM is the application of his existing Boltzmann Machine in 1980.
This opened up a possibility of solving the Vanishing Gradient problem,
and provided a clue to escape from the long winter.
We will listen to Professor Sung Hun Kim for further explanation.
Everyone studying Deep Learning in Korea would have heard of him.
In an online interview with Andrew Ng in 2007, 
Geoffrey Hinton said that
his most proud accomplishment was using RBM to initialize the weight in 2006,
and providing the possibility of solving the Vanishing Gradient problem.
Eventually, in deep neural network learning,
Learning with a good quality data is important just like humans’ learning with good education materials.
On the other hand, the initial value of neural network,
just like humans’ genes they are born with, also has a big influence in learning.
In 2010, Xavier Glorot, from Yoshua Bengio’s lab in the University of Montreal,
suggested a better method of initialization in the paper
'Understanding the difficulty of training deep feedforward neural networks'.
It was using normal distribution in proportion
to the number of input and output node of the initializing neural network.
With Normal distribution initialization, the process was much faster and easier.
We will listen to Professor Sung Hun Kim for further explanation.
Five years later, in 2015, Kaming He’s team
from Microsoft Research Asia in Beijing came up with a new initialization method
in the paper 'Deep Residual Learning for Image Recognition'.
It led to their victory in that year’s ILSVRC contest.
In the ResNet model, Kaming He slightly alters Xavier initialization to dividing the number of input nodes to 2.
This method was proved to be much more efficient in the ReLu activation function. 
Today, the most used Keras Framework mainly utilizes Xavier and He initialization.
Do you need database performance monitoring? Contact us and we will send you a free quote
[email protected]