Papers
Topics
Authors
Recent
2000 character limit reached

Random Neuron Subsets in Deep Networks

Updated 27 December 2025
  • Random neuron subsets are collections of neural network units selected randomly or via combinatorial masks to enable initialization, pruning, and subspace learning.
  • They ensure universal approximation by leveraging overparameterized architectures and binary-weight configurations, achieving high match probabilities in expressive models.
  • Empirical studies demonstrate that well-distributed random subsets enhance ensemble performance and computational efficiency in tasks like image classification and 3D recognition.

Random neuron subsets are collections of neural network units (neurons) selected through randomization or combinatorial mechanisms, often in the context of initialization, masking, pruning, or subspace methods. The study of such subsets and their expressive power spans neural network theory, subset learning, and algorithmic design for deep architectures. Theoretical and empirical results demonstrate that random neuron subsets, even from overparameterized or binary-weight networks, can approximate any target ReLU network, enable high expressive capacity, and serve as the foundation for efficient randomized architectures and robust learning modules.

1. Formal Models and Definitions

Random neuron subsets arise in different neural settings, including overparameterized ReLU networks with binary weights, subset learning masks, and permutation-based subspaces in deep nets.

  • Random Binary Overparameterized Network (Sreenivasan et al., 2021): For the function class F\mathcal{F} of fully-connected ReLU networks f:Rd0Rf: \mathbb{R}^{d_0} \to \mathbb{R},

f(x)=σ(Wσ(W1...σ(W1x)...))f(x) = \sigma(W_\ell \sigma(W_{\ell-1} ... \sigma(W_1 x)...))

a random binary network gg is defined by

g(x)=σ(ϵBσ(B1σ(B1x)))g(x) = \sigma\left( \epsilon' B_{\ell'} \sigma\left( B_{\ell'-1} \dots \sigma(B_1 x) \right) \right)

where each weight matrix BiB_i has i.i.d. entries in {±1}\{\pm1\}, and the layer widths and depths are polylogarithmic factors exceeding those of the target network.

  • Subset Learning via Allocation Mask (Schlisselberg et al., 10 Feb 2025): For a network with parameter vector WRpW \in \mathbb{R}^p, a random neuron subset corresponds to selecting a subset A{1,...,p}A \subset \{1, ..., p\} of rr coordinates, with W[A]W[A] learnable and W[Aˉ]W[\bar{A}] fixed at random. The allocation AA functions as a binary mask indicating the active subset.
  • Neural Random Subspace (NRS) Layer (Cao et al., 2019): Here, subsets of input features or neurons are selected via random permutations, grouped convolutions, and channel-wise operations, yielding ensembles of subspaces through permutation-based feature groupings integrated into deep networks.

2. Expressive Power and Approximation Guarantees

Random neuron subsets in sufficiently wide networks exhibit remarkable universal approximation and expressive capacity.

With high probability, a random binary {±1}\{\pm1\} ReLU network gg contains a subnetwork g~\tilde{g}—formed by pruning neurons via entrywise binary masks—that can uniformly approximate any target fFf \in \mathcal{F} to error ϵ\epsilon over the 2\ell_2 unit ball:

supx21g~(x)f(x)ϵ\sup_{\|x\|_2 \leq 1} | \tilde{g}(x) - f(x) | \leq \epsilon

when the architecture width and depth are only polylogarithmic blowups of the original (m/d=O(log2(d/ϵ)log(dlog2(d/ϵ)/δ))m/d = O(\log^2(d\ell/\epsilon)\log(d\ell \log^2(d\ell/\epsilon)/\delta)), /=O(log(d/ϵ))\ell'/\ell = O(\log(d\ell/\epsilon))).

For subset learning, the expressive power of a neuron subset (allocation AA) is quantified by the match probability:

MP(A;m)=Pr[W:MW(X)=MW(X),W[Aˉ] fixed]MP(A; m) = \Pr\left[ \exists W: M_W(X) = M_{W^*}(X), W[\bar{A}] \text{ fixed} \right]

Distributed allocations that spread learnable weights and avoid concentration in single rows/columns achieve MP(A)=1MP(A) = 1 (universal expressivity), while clustered allocations yield MP(A)=0MP(A) = 0. For large networks, random allocations transition sharply from MP(A)0MP(A) \approx 0 to $1$ as they cover more neurons.

Neural Random Subspace constructs an ensemble of "one-level trees" by permuting inputs and applying group convolutions with ReLU. The subspaces offer robust, end-to-end non-linear transformation capacity, matching or exceeding traditional random forest and higher-order pooling techniques in expressive power.

3. Pruning, Masking, and Combinatorial Constructions

The mechanism of extracting powerful subnetworks from random neuron sets hinges on combinatorial and algebraic constructions.

Integer weight values are represented in binary, and for each nonzero bit, a "diamond gadget"—a fixed-size ReLU subnet with only ±1\pm1 weights—realizes the required scalar multiplication. Pruning away all but the needed gadgets delivers a subnetwork computing arbitrary integer-weighted layers. Polylogarithmic overparameterization ensures that every required gadget appears somewhere in the random binary network with high probability.

The optimal allocation strategy distributes the set of learnable weights across rows and columns, maximizing the number of non-linear constraints and ensuring generic (full-rank) polynomial systems for expressivity. Heuristic principles state that allocations covering more neurons or layer indices maximize match probability and learning capacity.

In NRS, the random permutation of the input vector and reshaping form MM subspaces, to which grouped (often depthwise) convolutions and non-linearities are applied. The ensemble effect over randomly permuted subspaces increases robustness and empirical accuracy.

4. Empirical Findings and Theoretical Extensions

Empirical and simulation-based evaluations support theoretical predictions regarding random neuron subset performance.

Experiments with linear recurrent and feedforward networks confirm that MP is nearly zero for allocations covering too few rows/neurons, rising rapidly to one as more are covered. In shallow ReLU networks on random and real data (e.g., MNIST), a subset covering roughly 40% of neurons already suffices for perfect match probability, consistent with the critical threshold predicted by theory.

Across UCI/MNIST vector datasets, fine-grained image classification, and 3D point cloud recognition, NRS modules outperform or match classical random forests, decision trees, and higher-order pooling while incurring negligible extra computational cost. Larger numbers/varieties of subspaces (nMul), moderate group sizes (nPer), and small spatial tiling (nH=3) optimize performance.

  • Algorithmic Implications: While existence proofs for powerful subnetworks in random binary architectures are nonconstructive (finding such masks is combinatorially hard), the NRS layer and permutation-based methods are efficient, easily implementable, and compatible with standard backpropagation.

5. Architectural Integration and Applications

Random neuron subset mechanisms can be incorporated at varying levels into neural architectures to impart ensemble effects, robustness, and computational efficiency.

Randomized Subset Technique Integration Point Key Empirical Impact
Diamond gadget pruning (Sreenivasan et al., 2021) Overparameterized binary-weight networks Universal approximation by random subnetworks
Allocation mask/pruning (Schlisselberg et al., 10 Feb 2025) Subset learning in linear/ReLU nets Expressivity depends on coverage/distribution
NRS module (Cao et al., 2019) Input layers, after GAP, SE-module replacement Ensemble-like accuracy gains, low cost

In practice, Neural Random Subspace layers are used in 2D/3D vision, vector and regression tasks, added after global feature extraction or replacing fully connected/SE attention layers. The underlying random neuron subset logic can be adapted to promote modularity, ensemble diversity, and controlled parameter efficiency.

6. Implications, Limitations, and Future Directions

  • Amplitude Irrelevance: For binary-weight networks, the amplitude of initial weights is shown to be irrelevant for expressivity; only sign patterns and the masking mechanism ("supermask") matter (Sreenivasan et al., 2021).
  • Sharp Allocative Dichotomy: In linear network settings, the expressive power of a subset is a sharp function of allocation—widespread distribution is essential (Schlisselberg et al., 10 Feb 2025).
  • Algorithmic Hardness and Heuristics: While theoretical constructions are nonconstructive, structural heuristics (distribution coverage, polynomial constraint counting) effectively guide mask or subspace design.
  • Extension Directions: Further directions include structured (non-random) pruning, algorithmic mask recovery, and tightening overparameterization constants. Biological neural systems, where only a subset of synapses is modifiable, also motivate ongoing research in subset allocation and learning dynamics.

A plausible implication is that random neuron subsets, when properly overparameterized or distributed, can enable efficient, robust, and universality-guaranteed architectures in both theory and practice. Refining the allocation/selection process, balancing efficiency and expressivity, and algorithmically recovering optimal subnetworks remain central open problems.


Key references:

Definition Search Book Streamline Icon: https://streamlinehq.com
References (3)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Random Neuron Subsets.