Resonant Sparse Geometry Networks (RSGN)
- Resonant Sparse Geometry Networks (RSGN) are brain-inspired models defined by dynamically sparse connectivity and hyperbolic embeddings that capture hierarchical structure.
- They leverage two adaptation timescales with fast gradient-driven activation propagation and slow, Hebbian-like plasticity to optimize connectivity.
- RSGN achieve competitive performance with significantly fewer parameters than dense models, demonstrating efficiency in hierarchical and long-range dependency tasks.
Resonant Sparse Geometry Networks (RSGN) are a brain-inspired neural network architecture characterized by dynamically sparse, self-organizing connectivity, with computational nodes embedded in learned hyperbolic space. RSGN introduces two distinct timescales of adaptation: a fast, differentiable propagation of neural activations optimized via gradient descent, and a slow, local correlation-driven plasticity rule for adapting the network's connectivity. Connection strengths are a function of learned affinity, geodesic distance in hyperbolic space, and a hierarchical level bias. This approach yields input-dependent network graphs with competitive accuracy while offering significantly improved parameter efficiency compared to dense-attention models such as Transformers. RSGN’s design and efficacy are detailed in (Hays, 26 Jan 2026).
1. Structural Foundations and Dynamic Graph Construction
RSGN instantiates computational nodes with positions in a -dimensional Poincaré ball. Each node maintains both fast-changing states and slowly evolving parameters representing spatial position, activation threshold, and hierarchical level, respectively. Upon receiving an input, only a sparse subset (approximately 1–2%) of nodes are "ignited" via input-specific embedding and soft matching in hyperbolic space.
Across discrete propagation steps, the network operates as follows:
- Messages are passed along dynamically determined sparse edges, determined by geodesic proximity, affinity, and hierarchical level bias.
- Node activations are updated using a smooth, differentiable threshold function.
- Local inhibition normalizes activations within each node’s radius- hyperbolic neighborhood, supporting competitive interaction and preventing over-activation.
- After steps, active node states are aggregated by a learned readout for output computation.
Slow timescale adaptation directly modulates the graph structure through Hebbian-like plasticity, updating connexion affinities, thresholds, and occasionally structural reconfigurations (pruning and growth of edges) in a reward-modulated manner.
2. Hyperbolic Embedding, Connection Weights, and Hierarchical Organization
Nodes are embedded in , with a hyperbolic metric tensor scaling distances such that geodesic distance between two nodes is: This embedding ensures that tree-structured or hierarchical regimes map with low distortion: leaves are near the boundary; ancestors are near the origin. Connection weights between nodes are defined as
Here, is the sigmoid, are learned low-rank embeddings (), is a learned distance-decay parameter, and promotes connections that align with the hierarchy. This construction yields an effective receptive field size that adapts to input and local geometry, supporting efficient, context-sensitive computation.
3. Learning Dynamics: Fast Propagation and Structural Plasticity
Fast Timescale (Differentiable Activation Propagation)
At each time step :
- Message passing: Active neighbors contribute to node 's pre-activation state via -weighted aggregation.
- Activations are updated by a differentiable soft threshold:
where scales contribution from incoming messages, and is a fixed temperature.
- State update: Incorporates LayerNorm and residual connections, ensuring stability and normalization.
- Local inhibition further regularizes activity within each node’s local hyperbolic radius.
Output is read via pooling over all node activations after steps, applying a learned function to the pooled state.
Slow Timescale (Hebbian Structural Adaptation)
After each batch, slow variables adapt via correlation-driven rules:
- Affinity update: increments proportional to product of average activations and task reward, i.e., .
- Threshold homeostasis: Each shifts towards maintaining a target mean activation, .
- Structural pruning/sprouting: Edges below significance are pruned; new edges between highly correlated (but disconnected) nodes may be grown.
The slow drift of positions and hierarchical levels analogously organizes the geometry to reflect task structure over time.
4. Computational Complexity and Efficiency
RSGN achieves favorable computational efficiency relative to dense-attention architectures:
- If the average number of active nodes and average local neighbor count , the dominant per-step cost is .
- Empirically, render a complete forward pass , i.e., linear in the number of nodes.
- In contrast, Transformer self-attention over tokens incurs cost.
- The result is input-dependent, sub-quadratic memory and computation scaling.
The following table summarizes key architectural points:
| Mechanism | Property | Scalability |
|---|---|---|
| Sparse hyperbolic routing | Dynamic, input-driven locality | |
| Two-timescale adaptation | Plastic connectivity and weights | Linear in |
| Local inhibition | Prevents over-activation | per step |
5. Experimental Benchmarks and Ablation Studies
Benchmark Tasks and Baselines
RSGN has been evaluated on:
- Hierarchical sequence classification (20-class; input sequences with multi-scale structure and noise, random baseline 5%)
- Long-range dependency classification (10-class; key signals at beginning/end of 128-token sequences with 112 distractor tokens, random baseline 10%)
Baselines include MLP, 2-layer bidirectional LSTM, standard 2-layer Transformer, and fixed-pattern Sparse Transformer.
Results Summary
| Model | Hierarchical Accuracy (%) | Params | Long-Range Acc (%) | Params |
|---|---|---|---|---|
| Transformer | 30.1 ± 0.2 | 403,348 | 100.0 ± 0.0 | 600,330 |
| RSGN (+Hebb) | 23.8 ± 0.2 | 41,672 | 96.5 ± 0.5 | 40,382 |
| RSGN (no Hebb) | 23.8 ± 0.1 | 41,672 | 96.1 ± 0.2 | 40,382 |
| LSTM | 18.1 ± 0.4 | 566,292 | 100.0 ± 0.0 | 563,722 |
| MLP | 16.0 ± 0.8 | 281,364 | — | — |
| Sparse Transformer | 15.9 ± 0.2 | 403,348 | — | — |
RSGN achieves 79% of the Transformer’s accuracy in hierarchical classification using approximately 10× fewer parameters, and 96.5% of Transformer performance on long-range tasks with approximately 15× fewer parameters. Ablations demonstrate robustness across variation in node count and propagation steps. Removal of Hebbian adaptation reduces performance by ~0.4% on the hierarchical task.
6. Advantages, Limitations, and Potential Extensions
Advantages
- Parameter Efficiency: Comparable performance to dense-attention baselines with 10–15× fewer parameters.
- Input-Dependent Routing: Adaptive, context-dependent sparsity rather than fixed dense connectivity.
- Hierarchical Representation: Hyperbolic embedding enabling direct encoding of multi-scale and hierarchical structure.
- Two-Timescale Learning: Combination of fast, end-to-end gradient descent and slow, local, reward-modulated plasticity permits continual structural adaptation.
Limitations
- Absolute task accuracy is lower than that of the best-performing Transformer baselines on these benchmarks.
- Current hardware (e.g., GPUs) is not optimized for sparse, asynchronous computation, which may limit realized speedup in practice; neuromorphic hardware may better align with RSGN's computational model.
- Scalability to very large models and standard NLP or vision tasks has not been demonstrated.
- Careful tuning of both fast and slow learning rates is required.
Prospective Research Directions
- Hybrid architectures blending RSGN's dynamic sparse routing with attention modules.
- Continual and online learning scenarios exploiting RSGN's structural plasticity.
- Multimodal embeddings with distinct hyperbolic submanifolds.
- Efficient inference on neuromorphic or event-based platforms.
- Interfaces with biological or brain–computer interface systems leveraging sparse coding and reward-modulated adaptation.
7. Concluding Synthesis
Resonant Sparse Geometry Networks instantiate a biologically inspired computational paradigm that integrates sparse, geometry-driven connectivity, local inhibitory dynamics, and two-timescale adaptation. RSGN achieves sub-quadratic computational and memory complexity while flexibly adapting its computational graph to each input. Experimental results indicate strong parameter efficiency, interpretable multi-scale representations, and task-dependent adaptability. These findings suggest that sparse, hierarchical, and dynamically plastic architectures may represent a promising avenue for the development of efficient, biologically plausible neural models (Hays, 26 Jan 2026).