A Functorial Formulation of Neighborhood Aggregating Deep Learning

Published 27 Apr 2026 in cs.LG and math.AT | (2604.24672v1)

Abstract: We provide a mathematical interpretation of convolutional (or message passing) neural networks by using presheaves and copresheaves of the set of continuous functions over a topological space. Based on this interpretation, we formulate a theoretical heuristic which elaborates a number of empirical limitations of these neural networks by using obstructions on such sets of continuous functions over a topological space to be sheaves or copresheaves.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper presents a functorial formulation that recasts neighborhood aggregation as global sections of (co)presheaves to explain empirical limitations.
It models convolutional, graph, and recurrent architectures via sheaf and cosheaf theory, highlighting issues such as non-unique gluing and adversarial vulnerabilities.
The framework suggests future directions including hybrid architectures and enriched topological methods to overcome dataset dependency and capture global structures.

Functorial Foundations of Neighborhood Aggregating Deep Learning

Introduction

This paper, "A Functorial Formulation of Neighborhood Aggregating Deep Learning" (2604.24672), develops a rigorous algebraic-topological perspective on convolutional and message passing neural networks (MPNNs) by recasting their operations in terms of (co)presheaves, sheaves, and cosheaves of continuous functions over topological spaces. The authors detail how neural architectures, especially those based on neighborhood aggregation (such as CNNs, MPNNs, and RNNs), can be systematically modeled as global sections of functors from the category of open subsets of a space ( $X$ ) to real vector spaces, leveraging skyscraper sheaves and cosheaves. They argue that numerous empirical limitations—such as non-unique gluing, adversarial sensitivity, dataset dependency, and lack of topological awareness—are deeply rooted in functorial and sheaf-theoretic obstructions inherent in these architectures.

Sheaf and Cosheaf Theoretic Modeling

The foundational section provides the key categorical structures. Data aggregation in neighborhood-based deep learning is formalized by mapping the space $X$ (which may represent a grid/image, graph, or time index) into collections of vector spaces via presheaves and copresheaves of continuous functions. Skyscraper sheaves/cosheaves are central since they encode measurement at discrete points and naturally admit both covariant and contravariant functorial structure.

Figure 1: Construction of a pushforward of constant sheaves and cosheaves over discrete sets.

Figure 2: Construction of functors for the presheaf/copresheaf of continuous functions induced from a sheaf.

Obstructions to satisfying the sheaf/cosheaf axioms (locality and surjectivity) serve as theoretical heuristics for the empirical limitations of deep networks. The authors demonstrate that presheaves and copresheaves derived from skyscraper sheaves/cosheaves fail to satisfy locality/surjectivity in general, particularly when non-linear and non-factorizable layers are present. This is shown in detail with explicit mappings and their failure modes.

Figure 3: The restriction map $res_{X,U}$ in presheaf construction.

Figure 4: Inclusion map $i_{U,X}$ for neighborhood aggregation.

Functorial Deep Learning: Definitions and Axioms

Neighborhood aggregating deep learning is mathematically cast as functorial composition of layers, each defined as functions between products of local sections (open neighborhoods in $X$ ). Neighborhood aggregation is formalized by a set of axioms—locality, strictness, non-triviality, and distinctness—which specify how local information is aggregated and determine potential architectural limitations.

Figure 5: Summation by inclusion maps, defining the surjectivity condition of the cosheaf axiom.

Figure 6: A layer of a convolutional/message passing neural network, illustrating aggregation over neighborhoods.

Layers may factor through inclusion (linear aggregation) or violate this (non-linear, pooling, etc). The deep learning algorithm thus corresponds to a global section of a (co)presheaf of continuous functions, but not necessarily a sheaf/cosheaf, leading to fundamental limitations.

Theoretical Derivation of Architectural Limitations

Four major empirical limitations are shown to derive from functorial properties:

Non-unique Gluing: Non-linear or non-factorizable layers can produce distinct global outputs that agree locally, violating locality. This models the misidentification of images or graphs with isomorphic local parts but differing global structures.
Figure 7: Presheaf failure on locality—distinct global sections cannot be distinguished locally.

Figure 8: Presheaf satisfiability of gluing—local sections can be patched together under certain conditions.
Adversarial Attacks: The lack of injectivity in aggregation maps implies vulnerability to perturbations that are indistinguishable locally but arbitrarily impactful globally, precisely formalizing adversarial phenomena.
Dataset Dependency: Architectural choices constrain the classes of functions achievable, producing a situation where optimal performance is dataset-specific and certain global functions are unreachable.
Topological Inferences: Sheafification removes local obstructions but produces flasque sheaves devoid of nontrivial cohomology, proving that topological invariants are not captured by neighborhood aggregators.
Figure 9: Copresheaf failure on surjectivity—global sections cannot be obtained by local aggregation.

Figure 10: Copresheaf satisfying gluing—local sections aggregate where surjectivity holds.

Figure 11: Sheafification erases geometric/topological obstructions, preventing discrimination of nontrivial invariants.

Applications to Canonical Architectures

Convolutional Neural Networks

CNNs are cast as global sections of a copresheaf over the Euclidean grid, with convolutional and fully connected layers factoring through inclusion, while max pooling explicitly violates inclusion. The framework proves sensitivity of CNNs to adversarial attacks and non-unique gluings, and formally validates the motivation behind capsule networks.

Graph Neural Networks and Weisfeiler-Lehman Kernels

Graph convolutional networks (GCNs), MPNNs, and WL kernels are modeled using presheaves/cosheaves over universal covers of graphs. The limitations derived here agree with depth bounds, performance ceilings, and adversarial sensitivity known for GNNs.

Recurrent Neural Networks

RNNs and LSTMs are recast in functorial terms over principal circle bundles, with time modeled as a universal cover. Their adversarial susceptibility is derived identically from functorial obstructions.

Beyond Neighborhood Aggregation: Prospects and Extensions

Attention-based architectures (Transformers) and neural ODEs do not satisfy all neighborhood aggregation axioms, mitigating certain limitations, notably dataset dependency and adversarial robustness. Transformers project input spaces onto contractible or periodic topological spaces and employ global aggregation, which the functorial model handles more flexibly.

Sheaf-theoretic learning algorithms beyond skyscraper cosheaves (e.g., cellular sheaves, persistent homology) are identified as strong candidates for capturing global topological and geometric invariants. These approaches utilize non-flasque sheaves, providing injective restriction maps and nontrivial cohomological signatures, thereby overcoming the proven limitations.

Implications and Future Developments

The formalization provides a rigorous mathematical explanation for widespread empirical phenomena and architectural limitations in deep learning. The categorical analysis makes clear that the neighborhood aggregation paradigm cannot, in isolation, capture global structure or defend against adversarial perturbations. This suggests two future directions: hybrid architectures mixing local and global functors, and learning on topological or geometric data via richer sheaf and cosheaf structures. This framework also guides principled design of architectures for dynamic graphs, time series, and non-Euclidean domains where universal covers and principal bundles provide a geometric context for neural computation.

Conclusion

The functorial framework advanced in this paper bridges algebraic topology and deep learning, offering precise mathematical underpinnings for neighborhood aggregation and its limitations. The categorical formulation exposes intrinsic barriers arising from sheaf/co-sheaf obstructions, explaining adversarial vulnerability, dataset dependency, and poor topological sensitivity. Addressing these limitations will require architectures that go beyond simple neighborhood aggregation, either via global functorial processing or by leveraging richer (co)sheaf structures with nontrivial cohomology.

Markdown Report Issue