Papers
Topics
Authors
Recent
Search
2000 character limit reached

Resonant Sparse Geometry Networks (RSGN)

Updated 27 January 2026
  • Resonant Sparse Geometry Networks (RSGN) are brain-inspired models defined by dynamically sparse connectivity and hyperbolic embeddings that capture hierarchical structure.
  • They leverage two adaptation timescales with fast gradient-driven activation propagation and slow, Hebbian-like plasticity to optimize connectivity.
  • RSGN achieve competitive performance with significantly fewer parameters than dense models, demonstrating efficiency in hierarchical and long-range dependency tasks.

Resonant Sparse Geometry Networks (RSGN) are a brain-inspired neural network architecture characterized by dynamically sparse, self-organizing connectivity, with computational nodes embedded in learned hyperbolic space. RSGN introduces two distinct timescales of adaptation: a fast, differentiable propagation of neural activations optimized via gradient descent, and a slow, local correlation-driven plasticity rule for adapting the network's connectivity. Connection strengths are a function of learned affinity, geodesic distance in hyperbolic space, and a hierarchical level bias. This approach yields input-dependent network graphs with competitive accuracy while offering significantly improved parameter efficiency compared to dense-attention models such as Transformers. RSGN’s design and efficacy are detailed in (Hays, 26 Jan 2026).

1. Structural Foundations and Dynamic Graph Construction

RSGN instantiates NN computational nodes with positions piBdp_i \in B^d in a dd-dimensional Poincaré ball. Each node maintains both fast-changing states hiRdhh_i \in \mathbb{R}^{d_h} and slowly evolving parameters {pi,θi,i}\{p_i, \theta_i, \ell_i\} representing spatial position, activation threshold, and hierarchical level, respectively. Upon receiving an input, only a sparse subset (approximately 1–2%) of nodes are "ignited" via input-specific embedding and soft matching in hyperbolic space.

Across KK discrete propagation steps, the network operates as follows:

  • Messages are passed along dynamically determined sparse edges, determined by geodesic proximity, affinity, and hierarchical level bias.
  • Node activations αi(t)\alpha_i^{(t)} are updated using a smooth, differentiable threshold function.
  • Local inhibition normalizes activations within each node’s radius-rr hyperbolic neighborhood, supporting competitive interaction and preventing over-activation.
  • After KK steps, active node states are aggregated by a learned readout for output computation.

Slow timescale adaptation directly modulates the graph structure through Hebbian-like plasticity, updating connexion affinities, thresholds, and occasionally structural reconfigurations (pruning and growth of edges) in a reward-modulated manner.

2. Hyperbolic Embedding, Connection Weights, and Hierarchical Organization

Nodes are embedded in Bd={xRdx<1}B^d = \{x\in\mathbb{R}^d\mid \|x\|<1\}, with a hyperbolic metric tensor gxg_x scaling distances such that geodesic distance between two nodes pi,pjp_i, p_j is: dhyp(pi,pj)=arcosh(1+2pipj2(1pi2)(1pj2))d_\mathrm{hyp}(p_i, p_j) = \operatorname{arcosh}\left(1 + 2\,\frac{\|p_i-p_j\|^2}{(1-\|p_i\|^2)(1-\|p_j\|^2)}\right) This embedding ensures that tree-structured or hierarchical regimes map with low distortion: leaves are near the boundary; ancestors are near the origin. Connection weights between nodes are defined as

wij=σ(uivj)exp(dhyp(pi,pj)τ)ϕ(ji)w_{ij} = \sigma(u_i^\top v_j) \exp\left(-\frac{d_\mathrm{hyp}(p_i,p_j)}{\tau}\right) \phi(\ell_j - \ell_i)

Here, σ\sigma is the sigmoid, ui,vju_i, v_j are learned low-rank embeddings (rNr\ll N), τ\tau is a learned distance-decay parameter, and ϕ(x)=log(1+ex+1)\phi(x) = \log(1 + e^{x + 1}) promotes connections that align with the hierarchy. This construction yields an effective receptive field size that adapts to input and local geometry, supporting efficient, context-sensitive computation.

3. Learning Dynamics: Fast Propagation and Structural Plasticity

Fast Timescale (Differentiable Activation Propagation)

At each time step tt:

  1. Message passing: Active neighbors jA(t)j \in \mathcal{A}^{(t)} contribute to node ii's pre-activation state via wijw_{ij}-weighted aggregation.
  2. Activations are updated by a differentiable soft threshold:

αi(t+1)=σ(αi(t)+βh~i(t+1)θiT)\alpha_i^{(t+1)} = \sigma\left(\frac{\alpha_i^{(t)} + \beta \|\tilde h_i^{(t+1)}\| - \theta_i}{T}\right)

where β\beta scales contribution from incoming messages, and TT is a fixed temperature.

  1. State update: Incorporates LayerNorm and residual connections, ensuring stability and normalization.
  2. Local inhibition further regularizes activity within each node’s local hyperbolic radius.

Output is read via pooling over all node activations after KK steps, applying a learned function foutf_\text{out} to the pooled state.

Slow Timescale (Hebbian Structural Adaptation)

After each batch, slow variables adapt via correlation-driven rules:

  • Affinity update: aij=uivja_{ij} = u_i^\top v_j increments proportional to product of average activations and task reward, i.e., Δaij=ηaαˉiαˉjR\Delta a_{ij} = \eta_a\, \bar\alpha_i\, \bar\alpha_j\, R.
  • Threshold homeostasis: Each θi\theta_i shifts towards maintaining a target mean activation, Δθi=ηθ(αˉiαtarget)\Delta \theta_i = \eta_\theta (\bar\alpha_i - \alpha_{\rm target}).
  • Structural pruning/sprouting: Edges below significance are pruned; new edges between highly correlated (but disconnected) nodes may be grown.

The slow drift of positions and hierarchical levels analogously organizes the geometry to reflect task structure over time.

4. Computational Complexity and Efficiency

RSGN achieves favorable computational efficiency relative to dense-attention architectures:

  • If the average number of active nodes kNk \ll N and average local neighbor count mNm \ll N, the dominant per-step cost is O(kmdh2)O(k m d_h^2).
  • Empirically, k,mO(N)k, m \approx O(\sqrt{N}) render a complete forward pass O(Ndh2)O(N d_h^2), i.e., linear in the number of nodes.
  • In contrast, Transformer self-attention over nn tokens incurs O(n2d)O(n^2 d) cost.
  • The result is input-dependent, sub-quadratic memory and computation scaling.

The following table summarizes key architectural points:

Mechanism Property Scalability
Sparse hyperbolic routing Dynamic, input-driven locality O(Nk), kNO(Nk),\ k\ll N
Two-timescale adaptation Plastic connectivity and weights Linear in NN
Local inhibition Prevents over-activation O(km)O(km) per step

5. Experimental Benchmarks and Ablation Studies

Benchmark Tasks and Baselines

RSGN has been evaluated on:

  • Hierarchical sequence classification (20-class; input sequences with multi-scale structure and noise, random baseline 5%)
  • Long-range dependency classification (10-class; key signals at beginning/end of 128-token sequences with 112 distractor tokens, random baseline 10%)

Baselines include MLP, 2-layer bidirectional LSTM, standard 2-layer Transformer, and fixed-pattern Sparse Transformer.

Results Summary

Model Hierarchical Accuracy (%) Params Long-Range Acc (%) Params
Transformer 30.1 ± 0.2 403,348 100.0 ± 0.0 600,330
RSGN (+Hebb) 23.8 ± 0.2 41,672 96.5 ± 0.5 40,382
RSGN (no Hebb) 23.8 ± 0.1 41,672 96.1 ± 0.2 40,382
LSTM 18.1 ± 0.4 566,292 100.0 ± 0.0 563,722
MLP 16.0 ± 0.8 281,364
Sparse Transformer 15.9 ± 0.2 403,348

RSGN achieves 79% of the Transformer’s accuracy in hierarchical classification using approximately 10× fewer parameters, and 96.5% of Transformer performance on long-range tasks with approximately 15× fewer parameters. Ablations demonstrate robustness across variation in node count and propagation steps. Removal of Hebbian adaptation reduces performance by ~0.4% on the hierarchical task.

6. Advantages, Limitations, and Potential Extensions

Advantages

  • Parameter Efficiency: Comparable performance to dense-attention baselines with 10–15× fewer parameters.
  • Input-Dependent Routing: Adaptive, context-dependent sparsity rather than fixed dense connectivity.
  • Hierarchical Representation: Hyperbolic embedding enabling direct encoding of multi-scale and hierarchical structure.
  • Two-Timescale Learning: Combination of fast, end-to-end gradient descent and slow, local, reward-modulated plasticity permits continual structural adaptation.

Limitations

  • Absolute task accuracy is lower than that of the best-performing Transformer baselines on these benchmarks.
  • Current hardware (e.g., GPUs) is not optimized for sparse, asynchronous computation, which may limit realized speedup in practice; neuromorphic hardware may better align with RSGN's computational model.
  • Scalability to very large models and standard NLP or vision tasks has not been demonstrated.
  • Careful tuning of both fast and slow learning rates is required.

Prospective Research Directions

  • Hybrid architectures blending RSGN's dynamic sparse routing with attention modules.
  • Continual and online learning scenarios exploiting RSGN's structural plasticity.
  • Multimodal embeddings with distinct hyperbolic submanifolds.
  • Efficient inference on neuromorphic or event-based platforms.
  • Interfaces with biological or brain–computer interface systems leveraging sparse coding and reward-modulated adaptation.

7. Concluding Synthesis

Resonant Sparse Geometry Networks instantiate a biologically inspired computational paradigm that integrates sparse, geometry-driven connectivity, local inhibitory dynamics, and two-timescale adaptation. RSGN achieves sub-quadratic computational and memory complexity while flexibly adapting its computational graph to each input. Experimental results indicate strong parameter efficiency, interpretable multi-scale representations, and task-dependent adaptability. These findings suggest that sparse, hierarchical, and dynamically plastic architectures may represent a promising avenue for the development of efficient, biologically plausible neural models (Hays, 26 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Resonant Sparse Geometry Networks (RSGN).