Deep Parameter Sharing in Neural Networks
- Deep parameter sharing is the systematic reuse of network weights across layers and tasks, guided by group symmetries that enforce equivariance and reduce redundancy.
- Methodologies such as dense and sparse designs tie weights via group orbits, ensuring controlled expressiveness and robust inductive biases.
- Applications span convolutional, graph, and permutation-invariant architectures, improving generalization and reducing sample complexity.
Deep parameter sharing refers to the systematic reuse or tying of network parameters across multiple layers, units, tasks, or agents within deep learning models, enabling inductive biases, enhanced generalization, and drastic reductions in the number of unique parameters. Unlike traditional parameter sharing in convolutional layers—where weights are shared spatially within one layer—deep parameter sharing operates across the network architecture, often motivated by underlying symmetries (such as group equivariances), domain priors, efficiency requirements, or multi-task settings. A central insight is that tying parameters according to the action of a symmetry group over the model's indices yields neural operations that are (and only are) equivariant to that group, unifying and generalizing classical constructs such as convolutions, group convolutions, and permutation-invariant functions.
1. Mathematical Foundations: Equivariance and Parameter Symmetries
The foundational principle of deep parameter sharing is equivariance under group action. Consider a standard neural network layer with weight matrix . For a group acting discretely on both input and output indices (with permutation matrices , ), the layer is -equivariant if and only if for all and :
This property—that the function commutes with the group action—is enforced by embedding symmetries into via parameter sharing. Specifically, the pattern of tied parameters is constructed so that its symmetry group matches : if the automorphism group of the associated parameter-sharing (often formalized as a colored bipartite graph) equals , then the neural layer is guaranteed to be -equivariant (Ravanbakhsh et al., 2017).
2. Architectures and Parameter-Sharing Schemes
Two parameter-sharing schemes are introduced to induce the required symmetry on the weight matrix :
- Dense Design: Every input is connected to every output (complete bipartite), and the group partitions the edge set into orbits. Each orbit receives a unique parameter; thus, entries in within the same orbit are tied. This construction, while always guaranteeing -equivariance, can be highly redundant for small groups or structured data.
- Sparse Design: By leveraging the orbits of on input/output and a minimal generating set for the group, parameter sharing is imposed efficiently and only where necessary. Parameters are tied across edges for , with representatives of input/output orbits and . Under semi-regular actions, this guarantees the automorphism group of the resulting structure is precisely , yielding unique equivariance ("tight" parameter sharing).
These schemes are realized mathematically as:
where indexes parameter sharing patterns defined by the group action.
3. Sensitivity and Uniqueness: Balancing Equivariance and Expressiveness
A significant result is that these parameter-sharing designs guarantee not only -equivariance but also sensitivity to all transformations outside . Provided parameters indexed by orbit/color are distinct and group actions are semi-regular, the layer's behavior is constrained only by the desired symmetry—there are no accidental additional invariances, preserving the model's discriminative power (Ravanbakhsh et al., 2017). This property is critical for applications requiring maximal expressiveness given known or hypothesized inductive biases.
4. Application Scope: Unification and Generalization
Deep parameter sharing via group-induced symmetries serves as a unifying framework for many neural architectures:
- Convolutional Layers: Standard convolutions correspond to translation-equivariant parameter-sharing.
- Group Convolutions and Steerable Filters: By choosing as rotation or reflection groups, the resulting layers are equivariant to those transformations, enabling robust vision models.
- Graph Convolutions: Sharing induced by the automorphism group of a graph allows convolution-like operations to generalize to non-Euclidean domains.
- Deep Sets (Permutation-Equivariant Models): Parameter sharing via the symmetric group recovers and generalizes architectures for set-based or permutation-invariant data.
- Automated Symmetry Discovery: If desired equivariances are unknown, the structure of the parameter-sharing pattern could, in principle, be discovered from data, pointing toward future research directions.
5. Implications for Neural Network Design
Embedding symmetries via deep parameter sharing yields several key advantages:
- Reduced Sample Complexity: Designed invariances and equivariances are "hard-wired," which avoids the need for the network to learn them from data, thus requiring fewer samples.
- Interpretability and Control: Behavior under group transformation is well-characterized, facilitating both theoretical analysis and predictable model behaviors.
- Improved Generalization: Imposing the "right" parameter sharing limits model capacity to hypothesis spaces aligned with domain symmetries, enhancing generalization.
- Unified Perspective Across Models: Classical and modern architectures are recoverable as special cases of this group-theoretic, parameter-sharing principle.
6. Mathematical Characterization
The use of permutation matrices and colored bipartite graphs enables rigorous description of deep parameter sharing:
- The commutative diagram encodes the equivariance constraint on weight matrices.
- The color (or orbit) assignments in the edge-colored graph correspond to tied parameter groups.
- Formulations for both dense and sparse sharing guarantee implementation of desired symmetry properties via combinatorial and group-theoretic arguments.
7. Broader Context and Future Directions
The theoretical development of deep parameter sharing provides tools for:
- Extending to more general group actions (beyond permutations on finite index sets) and to other types of deep architectures (e.g., recurrent networks, transformers, graph neural networks).
- Semi-automated or data-driven symmetry discovery, where parameter-sharing structures could be learned rather than designed, paving the way for networks to adapt to domains with unknown or latent symmetries.
- Principled architecture search, where possible symmetry groups and their associated sharing patterns become axes along which to search for optimal network inductive biases.
As such, deep parameter sharing—when viewed through the lens of group theory and parameter symmetries—constitutes both a theoretical foundation for modern equivariant architectures and a practical blueprint for embedding domain priors in neural models (Ravanbakhsh et al., 2017).