Exponential expressivity in deep neural networks through transient chaos (1606.05340v2)

Published 16 Jun 2016 in stat.ML, cond-mat.dis-nn, and cs.LG

Abstract: We combine Riemannian geometry with the mean field theory of high dimensional chaos to study the nature of signal propagation in generic, deep neural networks with random weights. Our results reveal an order-to-chaos expressivity phase transition, with networks in the chaotic phase computing nonlinear functions whose global curvature grows exponentially with depth but not width. We prove this generic class of deep random functions cannot be efficiently computed by any shallow network, going beyond prior work restricted to the analysis of single functions. Moreover, we formalize and quantitatively demonstrate the long conjectured idea that deep networks can disentangle highly curved manifolds in input space into flat manifolds in hidden space. Our theoretical analysis of the expressive power of deep networks broadly applies to arbitrary nonlinearities, and provides a quantitative underpinning for previously abstract notions about the geometry of deep functions.

Citations (570)

View on Semantic Scholar

Summary

The paper demonstrates that deep networks compute functions with exponentially growing curvature in the chaotic regime.
It employs Riemannian geometry and mean field theory to elucidate signal propagation and rapid decorrelation.
Numerical simulations validate that depth, rather than width, is key to achieving advanced expressive power in neural architectures.

Exponential Expressivity in Deep Neural Networks Through Transient Chaos

The paper under review provides a rigorous exploration of how deep neural networks (DNNs) can achieve exponential expressivity through the dynamics of signal propagation, specifically by employing principles of Riemannian geometry combined with the mean field theory of high-dimensional chaos. This research addresses fundamental questions about the innate capabilities of deep networks compared to shallow ones, namely the ability to compute complex nonlinear functions and to effectively disentangle intricate data manifolds within hidden spaces.

Core Contributions

The authors focus on the phase transition between order and chaos in generic deep networks with random weights. In the chaotic phase, such networks can compute functions whose global curvature increases exponentially with depth. This behavior is shown to be independent of network width, highlighting the unique advantages conferred by depth.

Strong Numerical Results:

The paper mathematically demonstrates and provides simulations showing that even random neural networks can achieve expressivity unattainable by any shallow counterpart. The theoretical basis is validated against network simulations, showing remarkable agreement.

Theoretical Implications:

The paper extends beyond analyses of specific functions to reveal that any generic function computed by a deep network cannot be efficiently approximated by a shallow network.
Utilization of Riemannian geometry enables a quantitative and geometric interpretation of the network's expressive power.

Methodology Overview

Researchers explored the dynamics of two closely related neural inputs and their rapid de-correlation as they propagate through the network using a newly derived correlation map. This approach distinguishes the paper, as it combines the deterministic character of signal propagation in layers with large neuron counts through stochastic Gaussian processes, yielding deterministic convergence results.

Key Findings

Transient Chaos: The paper identifies an order-to-chaos transition where networks first undergo rapid changes in the correlation of hidden representations. As a function of weight statistics and biases, this transition explains how deep networks can navigate large, complex function spaces.
Exponential Curvature Growth: Networks in the chaotic state maintain a consistently complex and curved manifold throughout layers, refuting the notion that exponential hidden space exploration could be achieved through adjustments in width alone.

Practical Implications

This framework not only advances theoretical understanding but also provides a model against which trained networks can be compared. By elucidating the fundamental properties of untrained networks, this research lays groundwork for exploring what differentiates trained networks' geometry and functionality.

Speculation on Future Developments

A natural future development discussed is the exploration of networks' ability to generalize beyond the fixed geometric properties of random weights. Additionally, it prompts investigations into how these geometric properties affect learning trajectories, stability, and optimization processes in learned representations.

Conclusion

The paper succeeds in formalizing the abstract capabilities associated with deep learning into measurable and mathematical constructs. It simultaneously advances theoretical understanding of deep learning architecture while challenging existing paradigms linking expressivity linearly with parameters like width. This contribution is invaluable in the evolving landscape of neural network research, offering a deeper insight into the intrinsic capabilities bestowed by neural depth.

In conclusion, the intersectional approach adopted in this paper, combining geometry and dynamics, provides a robust framework to further interpret and exploit the capabilities of DNNs. Such foundational understandings pave the way for designing more efficient architectures, potentially leading to new breakthroughs in AI applications.

PDF Markdown

Related Papers

Tweets

https://twitter.com/KhonaMikail/status/1903562863551381588