Sparsey: Hierarchical Sparse Memory Model

Updated 31 January 2026

Sparsey is an unsupervised hierarchical associative memory model that employs sparse distributed representations (SDRs) to learn spatial and spatiotemporal patterns in a single trial.
Its design uses fixed-time, non-iterative, Hebbian-style learning with winner-take-all modules, ensuring efficient storage and mitigating catastrophic forgetting.
The model's architecture, featuring layered macs with critical periods and overlap-based inference, unifies episodic and semantic memory with biological plausibility.

Sparsey is an unsupervised, hierarchical associative memory model implementing single-trial learning of both spatial and spatiotemporal patterns through sparse distributed representations (SDRs). Its design departs fundamentally from mainstream artificial neural network architectures by leveraging fixed-time, non-iterative learning and retrieval, superposed memories, and mechanisms that interleave episodic and semantic memory as emergent properties of the same representational substrate. Sparsey offers operational efficiency, robust memory capacity, and biological plausibility, overcoming limitations associated with localist coding and catastrophic forgetting (Rinkus et al., 2017, Rinkus, 2016, Rinkus, 2018, Rinkus, 2017).

1. Hierarchical Architecture and Representational Substrate

Sparsey's architecture is organized as a hierarchy of layers, each composed of multiple macrocolumns, or "macs". Every mac comprises $Q$ Winner-Take-All (WTA) competitive modules (CMs), with each CM comprising $K$ binary units. The architectural blueprint is motivated by biological neocortex models, where macrocolumns correspond to local population codes, and minicolumns relate to competitive subfields (Rinkus et al., 2017, Rinkus, 2016, Rinkus, 2018).

A Sparsey instance consists of $L$ levels:

Level 0: Raw binary input (pixels, feature vectors, or spike patterns).
Level 1: Macs receive small, overlapping local receptive fields from Level 0.
Level $\ell$ : Each mac at level $\ell$ draws bottom-up input from neighboring macs at level $\ell$ -$1$, horizontal (lateral) input from the same level, and top-down input from level $\ell$ +$1$.

Codes are SDRs: for each mac, precisely one unit is active in each CM, producing codes of constant sparsity $Q$ in a field of $N=QK$ units. The number of possible codes per mac is $K^Q$ , enabling a combinatorially large capacity (Rinkus et al., 2017, Rinkus, 2017).

2. Sparse Distributed Representations and Similarity Mapping

SDRs in Sparsey are structurally sparse: each code $\phi(A)$ for input $A$ is a subset of $N$ units with $|\phi(A)| = Q$ , featuring exactly one winner in each of the $Q$ CMs. Codes are compared using their intersection size, $\mathrm{sim}(\phi(A),\phi(B)) = |\phi(A) \cap \phi(B)|$ .

The core mapping preserves similarity:

Inputs that are similar in the original space are mapped to codes sharing more active units, i.e., similarity is represented by code overlap.
This mapping is implemented via the Code Selection Algorithm (CSA), which dynamically modulates competition within each mac so that code overlap is proportional to input similarity.

Formally, the CSA computes a total drive $h_u$ for each unit $u$ by summing normalized bottom-up, lateral, and top-down inputs, which are processed via a familiarity-dependent nonlinearity: $p(u\mid c) = \frac{\exp\left(\rho(G)\, \tilde h_u\right)}{\sum_{v\in c} \exp\left(\rho(G)\, \tilde h_v\right)}, \quad \tilde h_u = \frac{h_u - \mu_c}{\sigma_c}$ Here, $G \in [0,1]$ is a global mac-level familiarity measure, $\mu_c$ and $\sigma_c$ are mean and standard deviation within CM $c$ , and $\rho(G)$ increases with $G$ , sharpening selection for highly familiar patterns (Rinkus et al., 2017, Rinkus, 2016, Rinkus, 2017).

3. Storage, Learning Dynamics, and Critical Periods

Sparsey implements single-trial, Hebbian-style learning for both episodic and statistical structure:

When a mac selects a code $\phi(A)$ for input $A$ , all afferent weights (bottom-up $U$ , horizontal $H$ , top-down $D$ ) into the winning units are potentiated (set to 1 if binary). No prior associations are overwritten.
Synaptic permanence and decay mechanisms, along with event-triggered critical periods, prevent catastrophic forgetting and bound memory plasticity locally:
- Each mac tracks the fraction of potentiated synapses; once a threshold is reached, the mac's plasticity is "frozen", addressing the stability-plasticity dilemma (Rinkus, 2018).
- At higher levels, metaplasticity mechanisms operate by enabling synaptic permanence to grow with repeated co-activation, slowing decay of weights encoding genuinely recurrent patterns.
Codes coexist in perfect superposition within the same synapse matrix; prior episodes are not erased when new codes are formed, and statistical regularities accumulate as overlap patterns.

Learning and retrieval are both fixed-time with respect to the number of stored items, scaling only with network architecture (i.e., number of macs and units per mac) (Rinkus et al., 2017, Rinkus, 2018).

4. Emergence of Episodic and Semantic Memory

Episodic memory (EM) in Sparsey is defined by the set of all SDR codes formed across episodes and stored in superposition. Semantic memory (SM) emerges automatically as the structure of code overlaps: similarity among codes embodies higher-order statistical structure across the stored inputs (Rinkus et al., 2017):

No explicit SM module nor rehearsal/replay mechanism is required.
The overlap pattern among codes encodes semantic class structure and generative knowledge as a side effect of single-trial storage.
Partial input (pattern completion) or partial code clamping during retrieval leads the CSA to reinstate most probable hypotheses, enabling generative (fill-in) reconstructions or class inference.

This dual emergence exemplifies a unification of EM and SM within the same substrate and mechanism, distinct from architectures that separate episodic and semantic modules (e.g., contemporary deep memory networks).

5. Retrieval, Recognition, and Representation of Uncertainty

Retrieval in Sparsey is implemented by repeating the same CSA process as in coding:

Given a query input $Q$ , its code $\phi(Q)$ is computed; the similarity to every stored episode $\phi(A_i)$ is the overlap $|\phi(Q) \cap \phi(A_i)|$ .
A best-match index $i^* = \arg\max_i |\phi(Q) \cap \phi(A_i)|$ is returned, or multiple competing hypotheses are handled by tie-aware schemes that can propagate ambiguous state.
Recognition and generative fill-in both exploit the graded similarity encoded by SDR overlap.

The SDR active for a given input simultaneously represents the most-likely hypothesis and a coarsely ranked likelihood distribution over all stored codes, providing a compatible notion of distributed probabilistic inference without scalar probabilities or rate coding (Rinkus, 2017). Uncertainty and ambiguity are naturally encoded and handled through code intersections and modulated competition.

6. Computational Efficiency and Scalability

A central property of Sparsey is that both storage (learning) and retrieval are fixed-time operations. Complexity per mac per time step is $O(QK)$ , with no dependence on the number of stored codes. Thus, as storage load increases, operation latencies remain bounded (Rinkus et al., 2017, Rinkus, 2018, Rinkus, 2017):

Full-model operations traverse each mac in parallel, yielding overall fixed-time complexity per input, given a fixed architecture.
No iterative search over memory is required—CSA operates in a single pass.
The model is highly parallelizable, as each mac may operate largely independently in its computation.

A comparative summary is provided below:

Operation	Sparsey Complexity	Implication
Storage/Learning	$O(QK)$ /mac	Bound does not grow with # of episodes
Retrieval	$O(QK)$ /mac	Enables scalability
Deep Nets (reference)	Iterative; grows with data & epochs	Catastrophic forgetting risk, high data movement

7. Empirical Results and Benchmark Performance

Sparsey has demonstrated competitive performance on several benchmarks employing single-trial, unsupervised learning (Rinkus et al., 2017, Rinkus, 2018):

MNIST Spatial Classification

Input: $16 \times 24$ binary images (preprocessed).
Architecture: 2-level; 672 L1 macs ( $Q=11$ , $K=9$ ).
Training: 2,000 samples (200/class); one-pass, single-trial.
Test: up to 7,000 samples.
Accuracy: $\approx 90\%$ .
Training time: 220s on single CPU (no GPU).

Weizmann Video Action Recognition

Input: 42x60 cropped frames, skeletonized, reduced to $\sim$ 10 frames/video.
Architecture: 3-level; L1: 216 macs, L2: 54 macs ( $Q=6$ , $K=6$ ).
Training: 540 video snippets.
Accuracy: 67\% (SOTA = 100\% as of the writing).
Training time: $\approx 210$ s on single CPU.

These results demonstrate that Sparsey achieves substantial representational efficiency and rapid learning, with moderate but not state-of-the-art classification accuracy on benchmarks. The key strengths are in speed, fixed-time operation, and avoidance of catastrophic forgetting—qualities critical for long-lived, scalable associative memories (Rinkus et al., 2017, Rinkus, 2018).

8. Relation to Probabilistic Coding and Theoretical Significance

Sparsey's coding and inference mechanisms contrast with traditional probabilistic population coding (PPC) theories, which generally employ continuous-valued, densely distributed codes, graded synapses, and rate coding. In Sparsey:

Codes are SDRs (structurally sparse), units and synapses are fundamentally binary.
Probability distributions over stored hypotheses are implicitly represented by SDR code overlaps rather than explicit scalar probabilities.
Noise is a controlled resource, modulating the bias/variance tradeoff via the global familiarity measure $G$ ; high $G$ sharpens code reinstatement (completion), low $G$ increases capacity through pattern separation (Rinkus, 2017).

Sparsey provides a plausible mechanistic account for probabilistic inference and lifelong learning in cortical circuits characterized by cell assemblies, fixed-time synaptic operations, and biologically plausible learning dynamics.

References:

(Rinkus et al., 2017) Superposed Episodic and Semantic Memory via Sparse Distributed Representation
(Rinkus, 2016) Sparsey: Event Recognition via Deep Hierarchical Spare Distributed Codes
(Rinkus, 2018) Sparse distributed representation, hierarchy, critical periods, metaplasticity: the keys to lifelong fixed-time learning and best-match retrieval
(Rinkus, 2017) A Radically New Theory of how the Brain Represents and Computes with Probabilities

Markdown Upgrade to Chat

References (4)

Superposed Episodic and Semantic Memory via Sparse Distributed Representation (2017)

Sparsey: Event Recognition via Deep Hierarchical Spare Distributed Codes (2016)

Sparse distributed representation, hierarchy, critical periods, metaplasticity: the keys to lifelong fixed-time learning and best-match retrieval (2018)

A Radically New Theory of how the Brain Represents and Computes with Probabilities (2017)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Sparsey.

Sparsey: Hierarchical Sparse Memory Model

1. Hierarchical Architecture and Representational Substrate

2. Sparse Distributed Representations and Similarity Mapping

3. Storage, Learning Dynamics, and Critical Periods

4. Emergence of Episodic and Semantic Memory

5. Retrieval, Recognition, and Representation of Uncertainty

6. Computational Efficiency and Scalability

7. Empirical Results and Benchmark Performance

MNIST Spatial Classification

Weizmann Video Action Recognition

8. Relation to Probabilistic Coding and Theoretical Significance

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Sparsey: Hierarchical Sparse Memory Model

1. Hierarchical Architecture and Representational Substrate

2. Sparse Distributed Representations and Similarity Mapping

3. Storage, Learning Dynamics, and Critical Periods

4. Emergence of Episodic and Semantic Memory

5. Retrieval, Recognition, and Representation of Uncertainty

6. Computational Efficiency and Scalability

7. Empirical Results and Benchmark Performance

MNIST Spatial Classification

Weizmann Video Action Recognition

8. Relation to Probabilistic Coding and Theoretical Significance

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research