On the Origin of Deep Learning (1702.07800v4)

Published 24 Feb 2017 in cs.LG, cs.NE, and stat.ML

Abstract: This paper is a review of the evolutionary history of deep learning models. It covers from the genesis of neural networks when associationism modeling of the brain is studied, to the models that dominate the last decade of research in deep learning like convolutional neural networks, deep belief networks, and recurrent neural networks. In addition to a review of these models, this paper primarily focuses on the precedents of the models above, examining how the initial ideas are assembled to construct the early models and how these preliminary models are developed into their current forms. Many of these evolutionary paths last more than half a century and have a diversity of directions. For example, CNN is built on prior knowledge of biological vision system; DBN is evolved from a trade-off of modeling power and computation complexity of graphical models and many nowadays models are neural counterparts of ancient linear models. This paper reviews these evolutionary paths and offers a concise thought flow of how these models are developed, and aims to provide a thorough background for deep learning. More importantly, along with the path, this paper summarizes the gist behind these milestones and proposes many directions to guide the future research of deep learning.

PDF Abstract

A Comprehensive Overview of the Evolutionary Path of Deep Learning Models

The paper "On the Origin of Deep Learning" by Haohan Wang and Bhiksha Raj provides an extensive review of the historical development of deep neural network models, from their conceptual origins to the state-of-the-art architectures dominating recent research. The authors meticulously trace the lineage of neural networks, highlighting the impactful moments and transitions that have shaped today's landscape of deep learning.

The authors organize the discourse into a chronological structure, beginning with the philosophical underpinnings in Aristotle's associationism. It outlines how early theoretical explorations laid down principles still evident in today's machine learning, such as Hebbian learning. The paper further explores foundational milestones like the McCulloch-Pitts model, Rosenblatt's Perceptron, and the evolution of learning rules with Hebb's and Oja's contributions, portraying the incremental enhancements aiming to mimic cognitive processes.

The narrative proceeds to explore the transition from linear models to more complex architectures capable of capturing intricate patterns. This includes discussions on universal approximation properties and the motivations driving the development of multiclass models, such as Multi-Layer Perceptrons (MLPs). The authors emphasize that although shallow networks are theoretically capable of any function approximation, deeper networks offer computational feasibilities that are indispensable at practical application levels.

The segmentation then shifts to individual deep learning architecture families:

Generative Models: The authors examine the conceptual leap from Hopfield networks to Boltzmann Machines and their constrained version, the Restricted Boltzmann Machine (RBM). The discussion encapsulates how Hinton et al.'s Deep Belief Networks enabled layer-wise unsupervised pretraining, pushing deep generative models into mainstream AI research. The segment contrasts models like the Deep Boltzmann Machine and anticipates their descendants' capabilities in generative adversarial networks.
Convolutional Neural Networks (CNNs): The paper positions CNNs through their foundation in human visual processing, highlighting key innovations from Neocognitron to modern architectures credited with breakthroughs in vision tasks. Success in the ImageNet competition exemplifies CNNs' effectiveness, detailed through notable architectures like AlexNet, VGG, and ResNet, each iterating on prior ideas to scale network depth and optimize architecture efficiency.
Recurrent Neural Networks (RNNs): This section addresses networks tailored for sequential data, tracing development from the basic RNN architectures, such as Elman and Jordan Networks, to Long Short-Term Memory Networks (LSTMs) and the more recent attention mechanisms. The narrative outlines the recurring theme of resolving temporal dependencies and gradient instability, showcasing advancements and continued research directions to enhance sequence modeling.

Throughout the paper, the authors not only account for architectural evolution but also address optimization advances critical to deep learning's success. Techniques like backpropagation adjustments and innovative optimization strategies like Adam and Batch Normalization are examined for their roles in improving training stability and efficiency.

The paper wisely refrains from sensationalizing the narrative but acknowledges the transformative impact of deep learning across AI disciplines. By providing historical insights, the authors aim to inform the trajectory of future research endeavors, suggesting the potential of deep networks lies in both understanding their complex history and embracing interdisciplinary inspirations, such as cognitive neuroscience.

In conclusion, the rigorous review and narrative clarity presented in this paper offer seasoned researchers a valuable resource to appreciate the deep learning frameworks' evolution, while fostering inventive development in the neural network paradigm.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Haohan Wang (96 papers)
Bhiksha Raj (180 papers)

Citations (218)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos