Papers
Topics
Authors
Recent
Search
2000 character limit reached

StochasticNet: Stochastic Deep Networks

Updated 29 January 2026
  • The paper introduces a stochastic connectivity framework using probabilistic sampling that yields sparse neural architectures with enhanced efficiency and reduced overfitting.
  • It employs random graph models such as Erdős–Rényi, Barabási–Albert, and Watts–Strogatz to automatically generate neural network topologies prior to training.
  • Empirical results demonstrate that StochasticNet achieves competitive accuracy, lowers computational costs, and minimizes overfitting across standard benchmarks.

StochasticNet defines a class of deep neural network architectures characterized by the formation of network connectivity via stochastic processes. This approach instantiates the neural wiring—whether fully-connected or convolutional—by drawing edges according to a prescribed probabilistic model, resulting in sparse and randomly structured computation graphs. Beyond the reduction of parameter count and computational overhead, StochasticNet architectures demonstrate competitive or improved generalization ability, reduced overfitting, and operational speedups relative to conventionally dense networks. Grounded in experimental neuroscience findings that biological microcircuit connectivity can be statistically modeled as random formation, StochasticNet provides a mathematically tractable framework for deep network design, leveraging random graph theory and algorithmic sampling to instantiate architectures.

1. Stochastic Connectivity: Mathematical Framework

StochasticNet introduces network sparsity through sampling of binary connectivity masks drawn from independent Bernoulli trials per edge. For layer \ell with m1m_{\ell-1} preceding and mm_\ell subsequent neurons, the connectivity mask C()C^{(\ell)} is defined as:

Cij()Bernoulli(pij())C^{(\ell)}_{ij} \sim \mathrm{Bernoulli}\bigl(p^{(\ell)}_{ij}\bigr)

P[Cij()=1]=pij(),P[Cij()=0]=1pij()P[C^{(\ell)}_{ij}=1] = p^{(\ell)}_{ij}, \qquad P[C^{(\ell)}_{ij}=0] = 1-p^{(\ell)}_{ij}

Uniform connectivity models set pij()=pp^{(\ell)}_{ij}=p for all (i,j)(i,j), while spatially aware models (Gaussian) adjust pij()p^{(\ell)}_{ij} according to distance from receptive field centers, e.g., for convolutional layers:

pu,v()=1Zexp((uu0)2+(vv0)22σ2)p^{(\ell)}_{u,v} = \frac{1}{Z}\exp\left(-\frac{(u-u_0)^2 + (v-v_0)^2}{2\sigma^2}\right)

The stochastic formation is executed once before training and then held fixed throughout both learning and inference. This distinguishes StochasticNet architectures from methods such as Dropout, which resample connections at each iteration. The expected sparsity at each layer is ρ()=pˉ()\rho^{(\ell)} = \bar{p}^{(\ell)}, the mean connection probability. Controlling {pij()}\{p^{(\ell)}_{ij}\} enables tuning of the trade-off between computational cost and model capacity (Shafiee et al., 2015, Shafiee et al., 2015).

2. Random Graph Models and Network Generators

The process of network instantiation in StochasticNet can be abstracted as a stochastic network generator G:(θ,s)gNG: (\theta, s) \mapsto g \in \mathcal{N}, where θ\theta denotes hyperparameters (e.g., number of nodes, graph model parameters), ss is a pseudo-random seed, and N\mathcal{N} is the space of computation graphs. Three canonical random graph generators have been studied:

  • Erdős–Rényi (ER): Each possible edge is included independently with probability pp. Notation: ER(N,p)\mathrm{ER}(N,p).
  • Barabási–Albert (BA): Edges are attached with probability proportional to target node degree (“preferential attachment”); notation: BA(N,m)\mathrm{BA}(N,m).
  • Watts–Strogatz (WS): Begins with a ring lattice; edges are rewired at random with probability β\beta to introduce small-world properties. Notation: WS(N,K,β)\mathrm{WS}(N,K,\beta).

Transforming an undirected random graph into a neural network involves sorting nodes, constructing a DAG, aggregating input/output nodes, and mapping edges to neural operations (Xie et al., 2019).

3. Architecture Instantiation and Algorithmic Sampling

Architectures such as LeNet-5-style ConvNets have been instantiated as StochasticNets by applying layerwise independent sampling of connectivity matrices. For each layer, edges from each input neuron to possible output neurons are sampled according to a Bernoulli trial with layerwise probability plp_l (uniform) or from a Gaussian (according to convolutional spatial locality). A representative pseudocode algorithm samples and constructs connectivity masks M(l)M^{(l)}, enforcing minimum per-filter inputs via re-sampling for robustness (Shafiee et al., 2015).

In the RandWire framework, multiple stages are formed by sampling random graphs—typically with N=32N=32 nodes per stage—and mapping them to blocks with fixed channel width CC, ensuring model complexity scales linearly with NN. Special input and output nodes aggregate fan-in/out at stage boundaries. The classifier head comprises global pooling, 1×11\times1 convolution, and a fully connected output layer (Xie et al., 2019).

4. Training Protocols and Hyperparameter Choices

StochasticNet architectures are trained using conventional protocols:

  • Optimizers: Stochastic gradient descent with momentum, learning rate decay, and weight decay, as appropriate for regime size.
  • Data Augmentation: Standard image classification augmentations; batch sizes comparable to baselines.
  • Regularization: Intrinsic from sparsity, with additional ablation regularization (random edge removal per batch in large regimes) and final-layer dropout as needed.
  • Training Regimes: Dense nets versus StochasticNets with various sparsity levels (ρ\rho), typically 50%50\% to 90%90\% retained connections, are compared over standard benchmark datasets: ImageNet, CIFAR-10, MNIST, SVHN, STL-10 (Xie et al., 2019, Shafiee et al., 2015, Shafiee et al., 2015).

Feature learning experiments also assess transfer of the learned representations to different datasets by freezing the feature extractor and evaluating classification performance (Shafiee et al., 2015).

5. Empirical Comparison and Performance

Quantitative results demonstrate that StochasticNet configurations maintain competitive accuracy under substantial sparsification. ImageNet experiments show RandWire-WS models achieving Top-1 accuracy of 74.7±0.25%74.7\pm0.25\% in small regime (\sim580M FLOPs) and 80.1±0.19%80.1\pm0.19\% in large regime (\sim7.9B FLOPs), outperforming baselines including MobileNet, ResNet-50/101, ResNeXt-50/101, and NAS-discovered architectures with comparable or superior accuracy at similar computational cost (Xie et al., 2019). Transfer to COCO object detection yields mean Average Precision (AP) of $39.9$ for RandWire-WS (C=109C=109), compared to $37.1$ for ResNet-50.

Experiments on SVHN and STL-10 report relative reductions in test error of 1%\sim1\% and 4.5%\sim4.5\%, respectively, for StochasticNets with 75%75\% connectivity versus dense networks, as well as classification efficiency gains (up to 5×5\times speedup at 25%25\% connectivity) (Shafiee et al., 2015). Similarly, at 39%39\% connectivity, StochasticNets match dense baselines in error (e.g., 43.6%43.6\% vs. 43.4%43.4\% on CIFAR-10) and exhibit reduced overfitting (smaller train-test gap) (Shafiee et al., 2015).

Representative Results Table

Model Dataset Test Error (%) Relative Δ Error Speedup (Ratio)
Dense ConvNet (100%) SVHN 8.0 1.00
StochasticNet (75%, Gaussian) SVHN 7.9 –1% 0.82
Dense ConvNet (100%) STL-10 43.9 1.00
StochasticNet (75%, Gaussian) STL-10 41.6 –4.5% 0.82

6. Theoretical and Biological Motivation

The framework draws on classical random graph theory, treating artificial neural networks as instantiations of graphs G(V,pij)\mathcal{G}(\mathcal{V}, p_{ij}), where edge existence is determined by independent Bernoulli trials (Erdős–Rényi–Kovalenko paradigm). This process mirrors findings from neuroscience, notably the work of Hill et al. (PNAS 2012), which models neocortical synaptic connectivity in biological circuits as random formation. StochasticNet operationalizes this principle for artificial networks, positing that stochastic initialization can produce architectures with desirable efficiency and generalization properties, even without post hoc regularization mechanisms (Shafiee et al., 2015).

7. Analysis, Insights, and Limitations

Empirical studies reveal several key insights for StochasticNet systems:

  • Flat Accuracy-Sparsity Regime: Test error remains stable at sparsity as low as 75%\sim75\%; degradation occurs primarily below 50%50\% retained connections.
  • Built-In Regularization: Sparsity induces regularization, manifested as reduced overfitting and lower train-test error gap.
  • Data Efficiency: StochasticNet feature extractors maintain performance with limited data (down to 10%10\% of full training set for STL-10).
  • Instance Variance: Performance has low variance across different stochastic realizations; the probabilistic generator prior dominates accuracy.
  • Wiring Priors: Watts–Strogatz mechanisms with small-world topology outperform alternatives, with hub nodes crucial for accuracy.
  • Decoupling Compute and Topology: Fixing per-node channel widths allows direct attribution of performance differences to graph topology.
  • Comparison to NAS: Purely stochastic wiring via well-chosen generators can match or exceed performance of architectures found through exhaustive neural architecture search.

Identified limitations include static connectivity (non-adaptive post-instantiation), potential suboptimality in fixed sparse graphs, and absence of dynamic rewiring. Suggested extensions encompass adaptive edge probabilities, hybrid regularization schemes, hardware-optimized structured sparsity, and theoretical study of variance/generalization bounds under randomized sparsification (Shafiee et al., 2015, Shafiee et al., 2015, Xie et al., 2019).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to StochasticNet: Forming Deep Neural Networks via Stochastic Connectivity.