Why and When Can Deep -- but Not Shallow -- Networks Avoid the Curse of Dimensionality: a Review (1611.00740v5)

Published 2 Nov 2016 in cs.LG

Abstract: The paper characterizes classes of functions for which deep learning can be exponentially better than shallow learning. Deep convolutional networks are a special case of these conditions, though weight sharing is not the main reason for their exponential advantage.

Citations (554)

View on Semantic Scholar

Summary

The paper shows that deep networks efficiently approximate compositional functions, avoiding the exponential complexity seen in shallow architectures.
It demonstrates that shallow networks require exponentially many units to achieve similar accuracy, underscoring their limitations in high dimensions.
The study provides theoretical insights into hierarchical composition and learning dynamics that enable deep networks to generalize better.

Understanding the Circumstances Where Deep Networks Excel Over Shallow Networks

This essay provides an in-depth analysis of the CBMM Memo No. 058, focusing on when and why deep networks can outperform shallow networks. The paper by Poggio et al. explores the function classes where deep learning architectures have an exponential advantage, addressing critical insights into the function approximation capabilities of deep and shallow networks.

Theoretical Foundations and Function Classes

The paper organizes its theoretical inquiry into three foundational questions about deep neural networks (DNNs):

Expressive Power: What function classes can DNNs approximate effectively?
Learning Efficiency: Why does stochastic gradient descent (SGD) appear so effective in optimizing deep networks?
Generalization Capability: How do DNNs manage to generalize beyond the performance of classical shallow networks, despite overparameterization?

While primarily addressing the first question, the paper builds upon seminal theoretical work to explore the approximation properties of neural networks. The authors confirm that deep networks mitigate the curse of dimensionality for specific compositional functions, unlike their shallow counterparts.

Approximation Power of Deep vs. Shallow Networks

The authors evaluate the degree of approximation by comparing shallow (one-layer) and deep networks. They introduce compositional functions as a unique class of functions that play into hierarchical architectures.

Key results include:

Shallow Networks: The approximation complexity is exponential with respect to the dimensionality of the input space. Any accuracy requires a number of units growing exponentially with dimension, highlighting the curse of dimensionality.
Deep Networks: For compositional functions, deep networks can achieve the same approximation accuracy with significantly fewer parameters, exploiting function composition.

Insights from Compositionality

The paper provides an insightful characterization of compositional functions by structuring them in terms of hierarchical local compositions. Highlights include:

Hierarchically Local Functions: Specific classes of functions can be represented as hierarchical compositions where small-dimensional constituents interact locally. This perspective naturally aligns with deep convolutional networks where "locality" is realized through kernel operations in convolutional layers.
Compositional Structure: Despite deep networks not requiring an exact match between the compositional geometry and network structure, they must encapsulate a suitable subgraph allowing function approximation with reduced curse of dimensionality effects.

Implications and Future Directions

The implications for DNN utility are substantial:

Architectural Design: The findings underscore the inherent architecture in DNNs that caters to capturing compositional hierarchies in data, offering strategic insights for designing deep learning models.
Efficient Learning and Memory Utilization: The hierarchical architecture suggests deep networks serve as efficient memories, particularly for compositional data, supporting efficient storage and retrieval akin to hierarchical vector quantization.
Potential for Advanced Applications: The critical observation that certain convolutional architectures can inherently bypass the constraints of dimensionality opens avenues for intricate applications in AI, particularly in tasks with intrinsic compositional structures like vision or language tasks.

In conclusion, the discussion provided by Poggio et al. underscores the nuanced advantage of deep architectures, particularly for compositional functions. Future research might deepen understanding of the interaction among architectural depth, function complexity, and generalization in AI models, paving the way for more advanced models that align more closely with theoretical optimality.

PDF Markdown