Deep Learning in Neural Networks: An Overview (1404.7828v4)

Published 30 Apr 2014 in cs.NE and cs.LG

Abstract: In recent years, deep artificial neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarises relevant work, much of it from the previous millennium. Shallow and deep learners are distinguished by the depth of their credit assignment paths, which are chains of possibly learnable, causal links between actions and effects. I review deep supervised learning (also recapitulating the history of backpropagation), unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.

PDF Abstract

Overview of Deep Learning in Neural Networks: An Overview

This essay offers a comprehensive summary of Jürgen Schmidhuber's paper titled "Deep Learning in Neural Networks: An Overview." The paper presents an extensive historical and technical survey of deep learning (DL) and neural networks (NNs), covering a myriad of topics from supervised and unsupervised learning to reinforcement learning (RL). With 888 references, the paper is an essential resource for researchers aiming to grasp the depth and breadth of DL and its applications.

Historical Context and Development

The paper traces the roots of DL, noting significant milestones and developments over the decades. Early NN architectures date back to the 1940s, with their roots in linear regression methods from the 1800s. A pivotal moment came in the 1960s and 1970s, with the introduction of gradient descent methods for backpropagation (BP). Schmidhuber’s review acknowledges contributions such as the Neocognitron (Fukushima, 1979), which introduced convolutional neural networks (CNNs), and the Group Method of Data Handling (GMDH) networks (Ivakhnenko, 1965), both of which laid the foundation for modern DL.

The development of BP in the 1960s to 1980s was crucial for training multilayer NNs. However, the late 1980s and early 1990s saw the recognition of challenges like the vanishing and exploding gradient problems, which were formally addressed in Hochreiter's thesis in 1991. These problems motivated numerous subsequent innovations and research efforts to make deep learning practically feasible.

Supervised and Unsupervised Learning

The core of DL research initially centered on supervised learning (SL). The paper highlights significant innovations such as Long Short-Term Memory (LSTM) networks, introduced in 1995, which helped overcome some of the fundamental problems associated with training deep recurrent neural networks (RNNs). LSTM networks have demonstrated efficacy in a wide range of tasks, from speech recognition to time-series prediction.

Schmidhuber also details the role of unsupervised learning (UL) in facilitating DL. Techniques like autoencoders (AEs) and deep belief networks (DBNs) showed that UL could be effectively used to pre-train deep networks, making subsequent supervised training more efficient. This approach became widely recognized around 2006 and played a significant role in reviving interest in deep learning.

Practical and Theoretical Implications

The paper underscores the practical implications of DL, emphasizing its success in various competitions and benchmarks. For instance, DL techniques have achieved superhuman performance in traffic sign recognition and visual pattern recognition, notably through the use of GPU-based implementations of CNNs and Max-Pooling CNNs (MPCNNs).

On the theoretical side, the concept of Credit Assignment Paths (CAPs) is introduced to measure the depth of learning problems. The paper also discusses advanced optimization techniques, regularization methods, and the importance of search methods for low-complexity, generalizing NNs. These methods aim to address overfitting and enhance the generalization capabilities of DL models.

Future Directions

Schmidhuber speculates on future developments in AI and DL. He anticipates that future deep NNs will incorporate selective attention mechanisms, learn to minimize computational costs, and evolve self-modularizing structures. The paper points out the potential of combining supervised and unsupervised learning techniques and the importance of search methods in program space for further advancements.

Additionally, the potential of universal problem solvers or universal RL machines that can improve through experience and self-modifications is highlighted. These methods aim to solve any well-defined problem optimally but are not yet practical or commercially relevant.

Conclusion

Schmidhuber's paper provides an exhaustive overview of DL in NNs, covering historical developments, technical advancements, and the future potential of the field. It serves as a valuable reference for researchers and practitioners, offering insights into the complexities and achievements of DL. The detailed survey encourages a deeper understanding of the challenges and opportunities in making neural networks more efficient and widely applicable in various domains.