State-Based Memory Updating

Updated 19 November 2025

State-based memory updating is a framework where an explicit memory state is continuously revised using a function that integrates prior context with incoming data.
It employs mechanisms such as blending, calibration, and maximal selection to efficiently update memory representations in dynamic systems.
Empirical studies demonstrate improved sample efficiency, stability, and scalability in applications ranging from deep reinforcement learning to cognitive modeling.

State-based memory updating refers to frameworks and algorithms in which an explicit internal state—typically a vector, matrix, or compositional algebraic structure—summarizes the relevant memory of a system, agent, or process, and is incrementally revised as new events, data, or sensory inputs are received. These approaches construct, maintain, and modify such a state directly, rather than storing sequences verbatim or treating memory retrieval as independent of ongoing update dynamics. State-based schemes appear in probabilistic inference, cognitive modeling, neuroscience, deep reinforcement learning, continual learning, and systems engineering. This article provides a comprehensive account of the key formal models, algorithms, computational properties, empirical evidence, and theoretical principles that underlie state-based memory updating in contemporary research.

1. Formal Definitions and Core Mechanisms

A state-based memory update is any process in which memory at time $t+1$ is computed as a function $F$ of the previous memory state $M_t$ and the new input or context $x_{t+1}$ , i.e.,

$M_{t+1} = F(M_t, x_{t+1})$

Here, $M_t$ may be a vector, a matrix, or a higher-order tensor. The design of $F$ governs retention, erasure, and integration of new and past information.

Examples from Contemporary Models

Iterative Working Memory Update: In models of cognitive architecture, working memory (FoA) is updated by blending the old state and activated retrievals from long-term memory:

$s_{t+1} = \alpha s_t + (1-\alpha)H(W s_t)$

where $s_t$ represents the current FoA state, $W$ is an associative weight matrix, $H$ is a sparsifying function, and $\alpha$ controls persistence (Reser, 2022).

State-based Episodic Memory (SEM) in MARL: SEM maintains a lookup table $Q^S(\phi(s))$ indexed by projected global state. After each episode, SEM updates $Q^S$ with the highest observed return-to-go for each visited state, ignoring joint actions. Update rule:

$Q^S(s) \leftarrow \max\{Q^S(s), R_{\max}(s)\}$

with $R_{\max}(s) = \max_{k}{R_k(s)}$ , using discounted returns accumulated during the episode (Ma et al., 2021).

Hadamard Matrix Memory (SHM): SHM for partially observable RL updates a memory matrix $M_t \in \mathbb{R}^{H \times H}$ via an elementwise (Hadamard) product and calibrated additive updates:

$M_{t} = M_{t-1} \circ C_t + U_t$

where $C_t$ is a calibration (“forget/strengthen”) matrix and $U_t$ is a new information matrix (Le et al., 2024).

Non-associative Algebraic Bundling: For high-dimensional sequence memory, states are bundled recursively with a non-associative “bundling” operator $+_\theta$ introducing noise, yielding distinctive recency and primacy dynamics:

$L_{n+1} = v_{n+1} +_\theta L_n$

(Reimann, 13 May 2025).

2. Algorithmic Structures and Update Rules

The choice of state representation and update function $F$ determines tractability, representational scope, and the types of temporal and contextual dependencies that can be modeled.

Architectures and Their Updates

Model Type	State $M_t$	Update Mechanism
Episodic Table	Lookup table $Q^S$	Max-over-return batch
Hebbian Memory	Matrix $W$	Outer-product / decay
Map-based Replay	Graph $G=(V,E)$	Prototype insertion/merge
Neural Working Mem	Vector $s_t$	Blended activation
Hadamard Memory	Matrix $M_t$	Elementwise update
Non-Assoc. Bundle	Vector $L_n$ , $R_n$	Order-sensitive sum

Episodic Table: Each observed state is mapped (by random projection or learned embedding) to a compact index; the table records only the highest reward ever observed from any visit to that state (Ma et al., 2021).
Self-Organizing Map-based Replay: States are represented as nodes in a growing network; similar states are merged according to distance and habituation criteria, and statistics (e.g., action/reward averages) are updated with decay factors (Hafez et al., 2023).
Non-associative Bundling: Temporal order is encoded without position markers but by the sequence of noise-injected non-associative additions, yielding separate primacy and recency traces (Reimann, 13 May 2025).

3. Theoretical Properties and Complexity

Complexity and Scalability

Tabular SEM vs. SAEM: In multi-agent RL, state-only episodic memory (SEM) reduces storage cost from $O(c \cdot |U|^n)$ (joint-action SAEM) to $O(c_s)$ , and time per EM-update from $O(|M|)$ to $O(1)$ (Ma et al., 2021).
Memory Compression via Maps: Map-based replay shrinks required nodes by merging similar states, achieving 40–80% memory savings with $<10\%$ performance degradation under typical activation threshold settings (Hafez et al., 2023).

Convergence and Stability

Hadamard Memory: Randomization in the calibration matrix $C_t$ prevents the vanishing/exploding product problem—i.e., successive updates $\prod C_t$ do not concentrate toward $0$ or diverge—enabling long sequence operation (Le et al., 2024).
Overestimation-Resistant Targets: SEM’s monotonic “best-ever” update yields stable targets less prone to overestimation bias, as past poor returns are never replayed (Ma et al., 2021).

4. Readout, Retrieval, and Decision Mechanisms

Greedy and Blended Decoding: In non-associative bundling models, retrieval amounts to measuring mutual information between a cue and both the recency (L-state) and primacy (R-state) bundles, then combining them with task-specific weights (Reimann, 13 May 2025).
Winner-Take-All Working Memory: Iterative working memory models update the current state but enforce a fixed capacity via a sparsifying “Top- $k$ ” operator, consequential for the similarity structure of sequential states (Reser, 2022).
Threshold Policy in Memory Access: In sampling control, the optimal decision to query shared memory is made by comparing a client’s local age variable $y(t)$ to a derived threshold $Y_0^*$ , yielding a stationary, deterministic optimal policy (Ramani et al., 2024):

$\pi^*(x, y)= \begin{cases} 1 & \text{if } y \geq Y_0^* \ 0 & \text{otherwise} \end{cases}$

5. Applications Across Domains

Multi-Agent and Deep RL: State-based EM methods improve sample efficiency and stability, especially in complex cooperative tasks (e.g., SMAC benchmarks), and alleviate catastrophic forgetting (Ma et al., 2021, Hafez et al., 2023, Le et al., 2024).
Continual Learning and Retrieval: For continually evolving corpora, DSI++ employs parameter-efficient, state-updating indices with additional generative rehearsal (“pseudo-query” replay) to mitigate forgetting and preserve access to both old and new knowledge (Mehta et al., 2022).
Cognitive and Theoretical Models: State-based iterative and algebraic mechanisms explicate classical memory effects (recency, primacy, serial position curves) and suggest implementations for both AGI and neuroscientific modeling (Reser, 2022, Reimann, 13 May 2025, Howard, 2022).

6. Extensions and Variants

Directions for further development, as documented in the literature, include:

Soft Update and Priority Rules: Rather than strict $\max$ -based updates, SEM and map-based methods can employ soft or probabilistically weighted updates; buffer/eviction policies can be tuned beyond frequency-based removal (Ma et al., 2021, Hafez et al., 2023).
Learned Representations: Substituting random projections with learned state-embeddings (e.g., via neural networks) allows more expressive or task-specific keys for indexing episodic tables (Ma et al., 2021).
Action-Conditional and Non-cooperative Extensions: Modifying SEM to store (state,action)-based values enables richer policies; extensions to adversarial or mixed-payoff scenarios are plausible (Ma et al., 2021).
Memory State Composition: The explicit algebra of recency (L-state) and primacy (R-state) bundles allows flexible linear combination, enabling adaptation to distinct retrieval, prediction, or planning tasks (Reimann, 13 May 2025).
Stable Architectures for Long-horizon Tasks: Memory-augmented agents with Hadamard updates or scale-invariant context representations handle deep credit assignment and temporal reasoning over hundreds of steps (Le et al., 2024, Howard, 2022).

7. Empirical Outcomes and Open Challenges

Sample Efficiency: Introduction of explicit state-based episodic or map-style memory demonstrably accelerates convergence and raises win rates in challenging RL benchmarks, with median win-rate improvements reaching several-fold on hard tasks (Ma et al., 2021, Hafez et al., 2023, Le et al., 2024).
Memory–Performance Trade-offs: Substantial memory reduction is attainable with mild performance loss if receptive field tuning (activation/habituation thresholds) is appropriately set (Hafez et al., 2023).
Forgetting and Stability: Flatter optimization trajectories and replay mechanisms (e.g., generative “pseudo-query” rehearsal) can preserve prior knowledge and reduce catastrophic forgetting in sequence models (Mehta et al., 2022).
Neurobiological Plausibility: Several mathematically formalized update rules (iterative blending, scale-invariant kernels, non-associative algebras) directly correspond to empirically documented neural memory phenomena (e.g., cortical sustained firing, hippocampal “time cells,” recency/primacy in recall) (Reser, 2022, Howard, 2022, Reimann, 13 May 2025).

Ongoing challenges include jointly optimizing speed, flexibility, and precision in state-based updating under partial observability, adversarial shifts, or non-stationarity, as well as integrating modular and scalable state structures for next-generation artificial agents.