Sparsey: Hierarchical Sparse Memory Model
- Sparsey is an unsupervised hierarchical associative memory model that employs sparse distributed representations (SDRs) to learn spatial and spatiotemporal patterns in a single trial.
- Its design uses fixed-time, non-iterative, Hebbian-style learning with winner-take-all modules, ensuring efficient storage and mitigating catastrophic forgetting.
- The model's architecture, featuring layered macs with critical periods and overlap-based inference, unifies episodic and semantic memory with biological plausibility.
Sparsey is an unsupervised, hierarchical associative memory model implementing single-trial learning of both spatial and spatiotemporal patterns through sparse distributed representations (SDRs). Its design departs fundamentally from mainstream artificial neural network architectures by leveraging fixed-time, non-iterative learning and retrieval, superposed memories, and mechanisms that interleave episodic and semantic memory as emergent properties of the same representational substrate. Sparsey offers operational efficiency, robust memory capacity, and biological plausibility, overcoming limitations associated with localist coding and catastrophic forgetting (Rinkus et al., 2017, Rinkus, 2016, Rinkus, 2018, Rinkus, 2017).
1. Hierarchical Architecture and Representational Substrate
Sparsey's architecture is organized as a hierarchy of layers, each composed of multiple macrocolumns, or "macs". Every mac comprises Winner-Take-All (WTA) competitive modules (CMs), with each CM comprising binary units. The architectural blueprint is motivated by biological neocortex models, where macrocolumns correspond to local population codes, and minicolumns relate to competitive subfields (Rinkus et al., 2017, Rinkus, 2016, Rinkus, 2018).
A Sparsey instance consists of levels:
- Level 0: Raw binary input (pixels, feature vectors, or spike patterns).
- Level 1: Macs receive small, overlapping local receptive fields from Level 0.
- Level : Each mac at level draws bottom-up input from neighboring macs at level -$1$, horizontal (lateral) input from the same level, and top-down input from level +$1$.
Codes are SDRs: for each mac, precisely one unit is active in each CM, producing codes of constant sparsity in a field of units. The number of possible codes per mac is , enabling a combinatorially large capacity (Rinkus et al., 2017, Rinkus, 2017).
2. Sparse Distributed Representations and Similarity Mapping
SDRs in Sparsey are structurally sparse: each code for input is a subset of units with , featuring exactly one winner in each of the CMs. Codes are compared using their intersection size, .
The core mapping preserves similarity:
- Inputs that are similar in the original space are mapped to codes sharing more active units, i.e., similarity is represented by code overlap.
- This mapping is implemented via the Code Selection Algorithm (CSA), which dynamically modulates competition within each mac so that code overlap is proportional to input similarity.
Formally, the CSA computes a total drive for each unit by summing normalized bottom-up, lateral, and top-down inputs, which are processed via a familiarity-dependent nonlinearity: Here, is a global mac-level familiarity measure, and are mean and standard deviation within CM , and increases with , sharpening selection for highly familiar patterns (Rinkus et al., 2017, Rinkus, 2016, Rinkus, 2017).
3. Storage, Learning Dynamics, and Critical Periods
Sparsey implements single-trial, Hebbian-style learning for both episodic and statistical structure:
- When a mac selects a code for input , all afferent weights (bottom-up , horizontal , top-down ) into the winning units are potentiated (set to 1 if binary). No prior associations are overwritten.
- Synaptic permanence and decay mechanisms, along with event-triggered critical periods, prevent catastrophic forgetting and bound memory plasticity locally:
- Each mac tracks the fraction of potentiated synapses; once a threshold is reached, the mac's plasticity is "frozen", addressing the stability-plasticity dilemma (Rinkus, 2018).
- At higher levels, metaplasticity mechanisms operate by enabling synaptic permanence to grow with repeated co-activation, slowing decay of weights encoding genuinely recurrent patterns.
- Codes coexist in perfect superposition within the same synapse matrix; prior episodes are not erased when new codes are formed, and statistical regularities accumulate as overlap patterns.
Learning and retrieval are both fixed-time with respect to the number of stored items, scaling only with network architecture (i.e., number of macs and units per mac) (Rinkus et al., 2017, Rinkus, 2018).
4. Emergence of Episodic and Semantic Memory
Episodic memory (EM) in Sparsey is defined by the set of all SDR codes formed across episodes and stored in superposition. Semantic memory (SM) emerges automatically as the structure of code overlaps: similarity among codes embodies higher-order statistical structure across the stored inputs (Rinkus et al., 2017):
- No explicit SM module nor rehearsal/replay mechanism is required.
- The overlap pattern among codes encodes semantic class structure and generative knowledge as a side effect of single-trial storage.
- Partial input (pattern completion) or partial code clamping during retrieval leads the CSA to reinstate most probable hypotheses, enabling generative (fill-in) reconstructions or class inference.
This dual emergence exemplifies a unification of EM and SM within the same substrate and mechanism, distinct from architectures that separate episodic and semantic modules (e.g., contemporary deep memory networks).
5. Retrieval, Recognition, and Representation of Uncertainty
Retrieval in Sparsey is implemented by repeating the same CSA process as in coding:
- Given a query input , its code is computed; the similarity to every stored episode is the overlap .
- A best-match index is returned, or multiple competing hypotheses are handled by tie-aware schemes that can propagate ambiguous state.
- Recognition and generative fill-in both exploit the graded similarity encoded by SDR overlap.
The SDR active for a given input simultaneously represents the most-likely hypothesis and a coarsely ranked likelihood distribution over all stored codes, providing a compatible notion of distributed probabilistic inference without scalar probabilities or rate coding (Rinkus, 2017). Uncertainty and ambiguity are naturally encoded and handled through code intersections and modulated competition.
6. Computational Efficiency and Scalability
A central property of Sparsey is that both storage (learning) and retrieval are fixed-time operations. Complexity per mac per time step is , with no dependence on the number of stored codes. Thus, as storage load increases, operation latencies remain bounded (Rinkus et al., 2017, Rinkus, 2018, Rinkus, 2017):
- Full-model operations traverse each mac in parallel, yielding overall fixed-time complexity per input, given a fixed architecture.
- No iterative search over memory is required—CSA operates in a single pass.
- The model is highly parallelizable, as each mac may operate largely independently in its computation.
A comparative summary is provided below:
| Operation | Sparsey Complexity | Implication |
|---|---|---|
| Storage/Learning | /mac | Bound does not grow with # of episodes |
| Retrieval | /mac | Enables scalability |
| Deep Nets (reference) | Iterative; grows with data & epochs | Catastrophic forgetting risk, high data movement |
7. Empirical Results and Benchmark Performance
Sparsey has demonstrated competitive performance on several benchmarks employing single-trial, unsupervised learning (Rinkus et al., 2017, Rinkus, 2018):
MNIST Spatial Classification
- Input: binary images (preprocessed).
- Architecture: 2-level; 672 L1 macs (, ).
- Training: 2,000 samples (200/class); one-pass, single-trial.
- Test: up to 7,000 samples.
- Accuracy: .
- Training time: 220s on single CPU (no GPU).
Weizmann Video Action Recognition
- Input: 42x60 cropped frames, skeletonized, reduced to 10 frames/video.
- Architecture: 3-level; L1: 216 macs, L2: 54 macs (, ).
- Training: 540 video snippets.
- Accuracy: 67\% (SOTA = 100\% as of the writing).
- Training time: s on single CPU.
These results demonstrate that Sparsey achieves substantial representational efficiency and rapid learning, with moderate but not state-of-the-art classification accuracy on benchmarks. The key strengths are in speed, fixed-time operation, and avoidance of catastrophic forgetting—qualities critical for long-lived, scalable associative memories (Rinkus et al., 2017, Rinkus, 2018).
8. Relation to Probabilistic Coding and Theoretical Significance
Sparsey's coding and inference mechanisms contrast with traditional probabilistic population coding (PPC) theories, which generally employ continuous-valued, densely distributed codes, graded synapses, and rate coding. In Sparsey:
- Codes are SDRs (structurally sparse), units and synapses are fundamentally binary.
- Probability distributions over stored hypotheses are implicitly represented by SDR code overlaps rather than explicit scalar probabilities.
- Noise is a controlled resource, modulating the bias/variance tradeoff via the global familiarity measure ; high sharpens code reinstatement (completion), low increases capacity through pattern separation (Rinkus, 2017).
Sparsey provides a plausible mechanistic account for probabilistic inference and lifelong learning in cortical circuits characterized by cell assemblies, fixed-time synaptic operations, and biologically plausible learning dynamics.
References:
- (Rinkus et al., 2017) Superposed Episodic and Semantic Memory via Sparse Distributed Representation
- (Rinkus, 2016) Sparsey: Event Recognition via Deep Hierarchical Spare Distributed Codes
- (Rinkus, 2018) Sparse distributed representation, hierarchy, critical periods, metaplasticity: the keys to lifelong fixed-time learning and best-match retrieval
- (Rinkus, 2017) A Radically New Theory of how the Brain Represents and Computes with Probabilities