Deep learning is receiving enormous attention after some recent breakthrough results. But what exactly happened to get us here?
The revolution (i.e., a real step-change) happening at the moment is an effect of a number of developments:
1. Improvements in hardware acceleration and commoditization of GPU access: electrical engineers have increased computation speed by providing GPUs like NVIDIA K40 and the associated CUDA parallel programming paradigm. Furthermore, clusters of GPUs are now available for rent on demand in the cloud for affordable prices (e.g. Amazon AWS EC2).
2. Increasing RAM availability: electrical engineers have increased integration density, so that computers with many GBs or even several TB of RAM are available.
3. Big data availability: enormous amounts of data (text, images, audio, videos) are now available for training machine learning systems. Digitization of analog assets and connecting machines over the Internet amplify this effect.
4. Progress in machine learning models, representation and learning algorithms:
- Deep Belief Networks (DBN) (Hinton et al., 2006) and greedy pre-training: a disruptive change occurred when Hinton and co-workers discovered more effective and efficient training procedures for multi-layer neural networks based on pre-training
individual layers in a greedy and unsupervised (Restricted Boltzman Machines (RBM) learning) fashion (Bengio 2009: 6); Auto-Encoders a.k.a. Auto-Associators a.k.a. Diabolo networks and Stacked
Auto-encoders (Bengio 2009: 45-47), are another family of models, discovered shortly DBNs, that likewise exploit unsupervised training applied locally to intermediate layers of representation;
- Word embeddings (Weston et al, 2008; Collobert and Weston 2008) are a way to encode/represent word meaning and context as word co-text vectors, which can achieve competitive results in natural language processing tasks;
- the Neurocognitron, Feed-Forward Neural Networks a.k.a. Convolutional Neural Networks (CNN) (Bengio 2009: 43-45) are families of models that capable of sequence modeling (corresponding to HMMs or CRF-based sequence taggers in
traditional generative modeling);
- Neural Network Language Models (NNLM) (Mnih and Hinton 2009) can replace traditional statistical word n-gram language models; and
- Recursive Neural Networks (RNN) (Goller and Küchler 1996; Socher, Manning, and Ng 2010) “operate on any hierarchical structure, combining child representations into parent representations,” (Wikipedia). They are a generalization of the longer-known Recurrent Neural Networks: recursive neural networks with a simpler structure, namely that of a (time-)linear chain are Recurrent NNs (which combine the previous time step and a hidden representation into the representation for the current time step in a feedback loop). Socher applied RNNs to parsing, learning of relations between image and text and sentiment analysis.In a previous revolution, when statistical methods led to a paradigm shift in natural language processing in the 1990s, this revolution was triggered by success of statistics in speech recognition, which was pursued by electrical engineers and computer scientists. This time, the origin of the breakthrough comes from theoretical machine learning researchers from within computer science.
Bengio, Y. (2009) “Learning Deep Architectures for AI” Foundations and Trends in Machine Learning 2(1)
G. E. Hinton, S. Osindero, and Y. Teh (2006) “A fast learning algorithm for deep belief nets,” IEEE Neural Computation 18, pp. 1527–1554
R. Collobert and J. Weston (2008) “A unified architecture for natural language processing: Deep neural networks with multitask learning,” in Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML’08), (W. W. Cohen, A. McCallum, and S. T. Roweis, eds.), pp. 160–167, ACM
C. Goller and A. Küchler (1996) “Learning task-dependent distributed representations by backpropagation through structure” IEEE Neural Networks
A. Mnih and G. E. Hinton (2009) “A scalable hierarchical distributed language model,” in Advances in Neural Information Processing Systems 21 (NIPS’08), (D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, eds.), pp. 1081–1088
R. Socher, C. D. Manning, A. Y. Ng (2010) “Learning continuous phrase representations and syntactic parsing with recursive neural networks.” Proceedings of the NIPS-2010 Deep Learning and Unsupervised Feature Learning Workshop, 1-9.
J. Weston, F. Ratle, and R. Collobert (2008) “Deep learning via semi-supervised embedding,” in Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML’08), (W. W. Cohen, A. McCallum, and S. T. Roweis, eds.), pp. 1168–1175, New York, NY, USA: ACM