The Representation Theory of Neural Networks (2007.12213v2)

Published 23 Jul 2020 in cs.LG, cs.NE, math.RT, and stat.ML

Abstract: In this work, we show that neural networks can be represented via the mathematical theory of quiver representations. More specifically, we prove that a neural network is a quiver representation with activation functions, a mathematical object that we represent using a network quiver. Also, we show that network quivers gently adapt to common neural network concepts such as fully-connected layers, convolution operations, residual connections, batch normalization, pooling operations and even randomly wired neural networks. We show that this mathematical representation is by no means an approximation of what neural networks are as it exactly matches reality. This interpretation is algebraic and can be studied with algebraic methods. We also provide a quiver representation model to understand how a neural network creates representations from the data. We show that a neural network saves the data as quiver representations, and maps it to a geometrical space called the moduli space, which is given in terms of the underlying oriented graph of the network, i.e., its quiver. This results as a consequence of our defined objects and of understanding how the neural network computes a prediction in a combinatorial and algebraic way. Overall, representing neural networks through the quiver representation theory leads to 9 consequences and 4 inquiries for future research that we believe are of great interest to better understand what neural networks are and how they work.

Citations (26)

View on Semantic Scholar

Summary

The paper introduces a novel algebraic framework that models neural networks through quiver representations, precisely capturing weights, activations, and computational flow.
It establishes isomorphisms proving that different network configurations compute the same function, formalizing concepts like positive scale invariance in ReLU networks.
The paper defines a moduli space for stable double-framed representations, offering new insights into network architecture, generalization, and training dynamics.

This paper introduces a novel mathematical framework for representing neural networks and the data they process using the theory of quiver representations. The core idea is that the structure and computations of a neural network can be precisely captured by algebraic objects derived from quivers (oriented graphs).

Neural Networks as Quiver Representations with Activations

Network Quiver: A specific type of quiver (oriented graph) arranged in layers is defined, called a "network quiver" $Q$ . It has designated input, bias, hidden, and output vertices. Hidden vertices have exactly one loop, while source and sink vertices have none.
Neural Network Definition: A neural network is formally defined as a pair $(W, f)$ $(W, f)$ , where:
- $W$ is a "thin representation" of the delooped quiver $Q^\circ$ (the quiver $Q$ with all loops removed). This means each vertex is associated with a 1D complex vector space ( $C$ ), and each edge $\epsilon$ corresponds to a linear map $W_\epsilon: C \to C$ (multiplication by a complex weight). $W$ essentially captures the network's weights.
- $f = (f_v)$ is a collection of activation functions, one for each hidden vertex $v$ , associated with the loops in the original network quiver $Q$ .
Forward Pass: The standard forward pass computation is defined combinatorially based on summing weighted inputs and applying the activation function at each hidden vertex. Output vertices typically just sum their inputs.
Universality: This framework is shown to encompass various standard neural network components, including fully-connected layers, convolutional layers (with shared weights), pooling operations (average and max), batch normalization, residual connections, and even randomly wired networks. Each component corresponds to specific constraints on the network quiver structure (combinatorial architecture), the weights $W$ (weight architecture), and the activation functions $f$ (activation architecture).

Isomorphisms and Network Function Invariance

Isomorphism of Neural Networks: An isomorphism between two neural networks $(W, f)$ and $(V, g)$ over the same quiver $Q$ is defined based on an isomorphism $\tau: W \to V$ of the underlying thin quiver representations. This isomorphism $\tau$ involves a set of invertible linear maps (non-zero scalars $\tau_v$ for each hidden vertex $v$ ). It requires commutative diagrams for both the weights (edges in $Q^\circ$ ) and the activation functions (loops in $Q$ ). The change of basis group $G$ consists of tuples of non-zero complex numbers $(\tau_v)$ , one for each hidden vertex.
Network Function Preservation (Theorem 4.10): The key theoretical result is that if two neural networks $(W, f)$ and $(V, g)$ are isomorphic, they compute the exact same function: $\Psi(W, f) = \Psi(V, g)$ .
Consequences:
- There exist infinitely many different neural networks (with different weights and potentially different activation functions) that compute the same function. These networks are related by the action of the change of basis group $G$ .
- The well-known "positive scale invariance" of ReLU networks is shown to be a special case of this isomorphism theorem, where the change of basis $\tau_v$ consists of positive real numbers, leaving the ReLU activation function unchanged.
Teleportation: Isomorphisms can preserve the weight architecture (e.g., shared weights in convolutions) while changing the activation architecture. This process, termed "teleportation," yields a network with the same function but potentially different activation functions, which might affect the optimization landscape.

Data Representation via Quiver Representations

Data as Quiver Representations: The paper proposes representing a data sample $x$ not just as an input vector but through the computational state it induces in the network $(W, f)$ . For a given input $x$ , a new thin quiver representation $W_x^f$ of the delooped quiver $Q^\circ$ is constructed. The weights $(W_x^f)_\epsilon$ of this representation incorporate the original weights $W_\epsilon$ , the input values $x_v$ , and the activation outputs $a(W, f)_v(x)$ and pre-activations during the forward pass of $x$ through $(W, f)$ .
Key Property (Theorem 5.4): This data representation $W_x^f$ is constructed such that the neural network $(W_x^f, 1)$ (using identity activation functions) produces the same output as the original network $(W, f)$ on input $x$ , when fed a vector of ones: $\Psi(W_x^f, 1)(1^d) = \Psi(W, f)(x)$ .
Significance: This means the entire computational process for input $x$ , including all intermediate feature map values, is encoded within the linear structure of the quiver representation $W_x^f$ . The network's output depends only on the isomorphism class $[W_x^f]$ under the action of the group $G$ .

The Moduli Space

Double-Framed Representations: To paper the space of these data representations, the concept of "double-framed thin quiver representations" $(\ell, \widetilde{W}, h)$ of the hidden quiver $\widetilde{Q}$ is introduced. $\widetilde{W}$ represents the weights between hidden neurons, while $\ell$ maps from the input space $C^d$ to the first hidden layer, and $h$ maps from the last hidden layer to the output space $C^k$ . There is a bijective correspondence between isomorphism classes $[W]$ on $Q^\circ$ and $[(\ell, \widetilde{W}, h)]$ on $\widetilde{Q}$ .
Stability: The data representations $W_x^f$ , when viewed as double-framed representations, are shown to satisfy a "stability" condition from geometric invariant theory.
Moduli Space Definition: The "moduli space" $_d\mathcal{M}_k(Q)$ is defined as the geometric space whose points are the isomorphism classes $[(\ell, \widetilde{W}, h)]$ of stable double-framed thin quiver representations. This space's structure depends only on the combinatorial architecture ( $Q$ ) of the network.
Dimension: The complex dimension of the moduli space is computed as the number of edges in the delooped quiver minus the number of hidden vertices: $dim_C(_d\mathcal{M}_k(Q)) = \#\mathcal{E}^\circ - \#\widetilde{V}$ . This dimension matches values previously linked empirically to generalization capacity in ReLU networks.
Network Function Decomposition: The network function universally decomposes through the moduli space. The input $x$ is first mapped to its corresponding isomorphism class $[W_x^f]$ in the moduli space via $\varphi(W, f)$ , and then a map $\hat{\Psi}$ from the moduli space to the output space $C^k$ computes the final prediction: $\Psi(W, f) = \hat{\Psi} \circ \varphi(W, f)$ .
Manifold Hypothesis: This framework provides a formalization of the manifold hypothesis, suggesting that the data manifold in the input space is mapped via $\varphi(W, f)$ into a corresponding structure within the moduli space, whose geometry can be studied using algebraic tools. Training dynamics can be viewed as evolving this mapped manifold within the fixed moduli space.

Overall Contributions and Implications

Provides a universal, exact algebraic language for describing feed-forward neural networks and their forward pass computations.
Reveals inherent algebraic symmetries (isomorphisms) leading to infinitely many network configurations computing the same function.
Offers a novel way to represent data samples as algebraic objects (quiver representations) that encode the network's processing.
Introduces the moduli space as a geometric object intrinsic to the network's architecture, through which all computations pass.
Connects neural network concepts like positive scale invariance, feature maps, pruning, and potentially training dynamics to established mathematical theories.
Opens avenues for future research, including exploring training in the moduli space, applying the framework to other architectures (RNNs, GANs), and potentially generating data augmentations or adversarial examples via algebraic manipulations.

PDF Markdown

Related Papers

Neural Teleportation (2020)
Quiver Signal Processing (QSP) (2020)
Quiver neural networks (2022)
Double framed moduli spaces of quiver representations (2021)
Kähler Geometry of Quiver Varieties and Machine Learning (2021)