Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
140 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Representation Theory of Neural Networks (2007.12213v2)

Published 23 Jul 2020 in cs.LG, cs.NE, math.RT, and stat.ML

Abstract: In this work, we show that neural networks can be represented via the mathematical theory of quiver representations. More specifically, we prove that a neural network is a quiver representation with activation functions, a mathematical object that we represent using a network quiver. Also, we show that network quivers gently adapt to common neural network concepts such as fully-connected layers, convolution operations, residual connections, batch normalization, pooling operations and even randomly wired neural networks. We show that this mathematical representation is by no means an approximation of what neural networks are as it exactly matches reality. This interpretation is algebraic and can be studied with algebraic methods. We also provide a quiver representation model to understand how a neural network creates representations from the data. We show that a neural network saves the data as quiver representations, and maps it to a geometrical space called the moduli space, which is given in terms of the underlying oriented graph of the network, i.e., its quiver. This results as a consequence of our defined objects and of understanding how the neural network computes a prediction in a combinatorial and algebraic way. Overall, representing neural networks through the quiver representation theory leads to 9 consequences and 4 inquiries for future research that we believe are of great interest to better understand what neural networks are and how they work.

Citations (26)

Summary

  • The paper introduces a novel algebraic framework that models neural networks through quiver representations, precisely capturing weights, activations, and computational flow.
  • It establishes isomorphisms proving that different network configurations compute the same function, formalizing concepts like positive scale invariance in ReLU networks.
  • The paper defines a moduli space for stable double-framed representations, offering new insights into network architecture, generalization, and training dynamics.

This paper introduces a novel mathematical framework for representing neural networks and the data they process using the theory of quiver representations. The core idea is that the structure and computations of a neural network can be precisely captured by algebraic objects derived from quivers (oriented graphs).

Neural Networks as Quiver Representations with Activations

  1. Network Quiver: A specific type of quiver (oriented graph) arranged in layers is defined, called a "network quiver" QQ. It has designated input, bias, hidden, and output vertices. Hidden vertices have exactly one loop, while source and sink vertices have none.
  2. Neural Network Definition: A neural network is formally defined as a pair (W,f)(W, f), where:
    • WW is a "thin representation" of the delooped quiver QQ^\circ (the quiver QQ with all loops removed). This means each vertex is associated with a 1D complex vector space (CC), and each edge ϵ\epsilon corresponds to a linear map Wϵ:CCW_\epsilon: C \to C (multiplication by a complex weight). WW essentially captures the network's weights.
    • f=(fv)f = (f_v) is a collection of activation functions, one for each hidden vertex vv, associated with the loops in the original network quiver QQ.
  3. Forward Pass: The standard forward pass computation is defined combinatorially based on summing weighted inputs and applying the activation function at each hidden vertex. Output vertices typically just sum their inputs.
  4. Universality: This framework is shown to encompass various standard neural network components, including fully-connected layers, convolutional layers (with shared weights), pooling operations (average and max), batch normalization, residual connections, and even randomly wired networks. Each component corresponds to specific constraints on the network quiver structure (combinatorial architecture), the weights WW (weight architecture), and the activation functions ff (activation architecture).

Isomorphisms and Network Function Invariance

  1. Isomorphism of Neural Networks: An isomorphism between two neural networks (W,f)(W, f) and (V,g)(V, g) over the same quiver QQ is defined based on an isomorphism τ:WV\tau: W \to V of the underlying thin quiver representations. This isomorphism τ\tau involves a set of invertible linear maps (non-zero scalars τv\tau_v for each hidden vertex vv). It requires commutative diagrams for both the weights (edges in QQ^\circ) and the activation functions (loops in QQ). The change of basis group GG consists of tuples of non-zero complex numbers (τv)(\tau_v), one for each hidden vertex.
  2. Network Function Preservation (Theorem 4.10): The key theoretical result is that if two neural networks (W,f)(W, f) and (V,g)(V, g) are isomorphic, they compute the exact same function: Ψ(W,f)=Ψ(V,g)\Psi(W, f) = \Psi(V, g).
  3. Consequences:
    • There exist infinitely many different neural networks (with different weights and potentially different activation functions) that compute the same function. These networks are related by the action of the change of basis group GG.
    • The well-known "positive scale invariance" of ReLU networks is shown to be a special case of this isomorphism theorem, where the change of basis τv\tau_v consists of positive real numbers, leaving the ReLU activation function unchanged.
  4. Teleportation: Isomorphisms can preserve the weight architecture (e.g., shared weights in convolutions) while changing the activation architecture. This process, termed "teleportation," yields a network with the same function but potentially different activation functions, which might affect the optimization landscape.

Data Representation via Quiver Representations

  1. Data as Quiver Representations: The paper proposes representing a data sample xx not just as an input vector but through the computational state it induces in the network (W,f)(W, f). For a given input xx, a new thin quiver representation WxfW_x^f of the delooped quiver QQ^\circ is constructed. The weights (Wxf)ϵ(W_x^f)_\epsilon of this representation incorporate the original weights WϵW_\epsilon, the input values xvx_v, and the activation outputs a(W,f)v(x)a(W, f)_v(x) and pre-activations during the forward pass of xx through (W,f)(W, f).
  2. Key Property (Theorem 5.4): This data representation WxfW_x^f is constructed such that the neural network (Wxf,1)(W_x^f, 1) (using identity activation functions) produces the same output as the original network (W,f)(W, f) on input xx, when fed a vector of ones: Ψ(Wxf,1)(1d)=Ψ(W,f)(x)\Psi(W_x^f, 1)(1^d) = \Psi(W, f)(x).
  3. Significance: This means the entire computational process for input xx, including all intermediate feature map values, is encoded within the linear structure of the quiver representation WxfW_x^f. The network's output depends only on the isomorphism class [Wxf][W_x^f] under the action of the group GG.

The Moduli Space

  1. Double-Framed Representations: To paper the space of these data representations, the concept of "double-framed thin quiver representations" (,W~,h)(\ell, \widetilde{W}, h) of the hidden quiver Q~\widetilde{Q} is introduced. W~\widetilde{W} represents the weights between hidden neurons, while \ell maps from the input space CdC^d to the first hidden layer, and hh maps from the last hidden layer to the output space CkC^k. There is a bijective correspondence between isomorphism classes [W][W] on QQ^\circ and [(,W~,h)][(\ell, \widetilde{W}, h)] on Q~\widetilde{Q}.
  2. Stability: The data representations WxfW_x^f, when viewed as double-framed representations, are shown to satisfy a "stability" condition from geometric invariant theory.
  3. Moduli Space Definition: The "moduli space" dMk(Q)_d\mathcal{M}_k(Q) is defined as the geometric space whose points are the isomorphism classes [(,W~,h)][(\ell, \widetilde{W}, h)] of stable double-framed thin quiver representations. This space's structure depends only on the combinatorial architecture (QQ) of the network.
  4. Dimension: The complex dimension of the moduli space is computed as the number of edges in the delooped quiver minus the number of hidden vertices: dimC(dMk(Q))=#E#V~dim_C(_d\mathcal{M}_k(Q)) = \#\mathcal{E}^\circ - \#\widetilde{V}. This dimension matches values previously linked empirically to generalization capacity in ReLU networks.
  5. Network Function Decomposition: The network function universally decomposes through the moduli space. The input xx is first mapped to its corresponding isomorphism class [Wxf][W_x^f] in the moduli space via φ(W,f)\varphi(W, f), and then a map Ψ^\hat{\Psi} from the moduli space to the output space CkC^k computes the final prediction: Ψ(W,f)=Ψ^φ(W,f)\Psi(W, f) = \hat{\Psi} \circ \varphi(W, f).
  6. Manifold Hypothesis: This framework provides a formalization of the manifold hypothesis, suggesting that the data manifold in the input space is mapped via φ(W,f)\varphi(W, f) into a corresponding structure within the moduli space, whose geometry can be studied using algebraic tools. Training dynamics can be viewed as evolving this mapped manifold within the fixed moduli space.

Overall Contributions and Implications

  • Provides a universal, exact algebraic language for describing feed-forward neural networks and their forward pass computations.
  • Reveals inherent algebraic symmetries (isomorphisms) leading to infinitely many network configurations computing the same function.
  • Offers a novel way to represent data samples as algebraic objects (quiver representations) that encode the network's processing.
  • Introduces the moduli space as a geometric object intrinsic to the network's architecture, through which all computations pass.
  • Connects neural network concepts like positive scale invariance, feature maps, pruning, and potentially training dynamics to established mathematical theories.
  • Opens avenues for future research, including exploring training in the moduli space, applying the framework to other architectures (RNNs, GANs), and potentially generating data augmentations or adversarial examples via algebraic manipulations.