Evolution of Algorithm / Challenges and Solution
When I started studying the neural network/Deep Learning, most of the things seemed relatively clear and the simple fully connected feedforward network seemed to work for all the problems. But as I learn more, I realized that it is not that simple as expected and started seeing so many different algorithmic tweaking (e.g, selection of transfer function, selection of optimization functions
etc) and so many different structure of the neural networks(e.g, fully connected network, CNN, RNN etc). Then this question came to my mind. Why we need to go through so many tweakings and selections ? It turns out that there is no single 'fit for all' algorithm. An algorithm works for a certain situation but does not work well for other situation. To make the neural network work better for more diverse situation, many different tricks has been invented. As the result of such inventions for a long period
of time, we now see a neural networks with such a many options to choose from.
Following is the brief summary of Neural Network models.
Event/Model
|
Year
|
Description
|
Perceptrons
|
1958
|
Single-layer neural networks for binary classification tasks.
|
Backpropagation
|
1974-1986
|
Multi-layer neural networks training algorithm.
|
Convolutional Neural Networks (CNNs)
|
1989
|
LeNet-5 by Yann LeCun for image recognition tasks.
|
Long Short-Term Memory (LSTM)
|
1997
|
RNN variant for improved handling of long-term dependencies.
|
Deep Belief Networks (DBNs)
|
2006
|
Unsupervised pre-training with restricted Boltzmann machines.
|
ImageNet Challenge
|
2012
|
AlexNet revolutionized computer vision with deep CNNs.
|
Generative Adversarial Networks (GANs)
|
2014
|
Unsupervised learning with two competing neural networks.
|
Neural Machine Translation (NMT)
|
2014
|
Sequence-to-sequence models for improved machine translation.
|
Attention Mechanism
|
2014
|
Improved handling of long sequences in sequence-to-sequence models.
|
Transformer Models
|
2017
|
Attention-based models for large-scale pre-training and fine-tuning.
|
BERT
|
2018
|
Pre-trained model that set new benchmarks in NLP tasks.
|
Following is the list of important tricks that was invented to handle some of the early problems of the neural network.
Issues
|
Techniques to solve the issue
|
Vanishing Gradient |
Replacing the classing sigmoid function with ReLU (Ref [1]) |
Slow Convergence |
Replacing the cloassic GD(Gradient Descent) with SGD(Stochastic GD) (Ref [1]) |
Sever Fluctuation in training due to SGD |
Replacing SGD with mini-batch SGD (Ref [1]) |
Falling into Local Minima |
Introducing adaptive learning rate algorithms (e.g, Adagrad, RMSProp, Momentum, Adam) (Ref [1])
|
Overfitting |
Introducing early-stopping, regularization, dropout (Ref [1])
|
Too big fully connected network in CNN |
Introducing Pooling layer before the fully connected network (Ref [1])
|
No Memory |
Introducing RNN (Recurrent Neural Network) (Ref [1])
|
Reference
[1] Deep Learning for Wireless Physical Layer : Opportunities and Challenges
YouTube
|
|