GENESIS: Integrating Episodic and Semantic Memory

Updated 3 July 2026

The paper introduces GENESIS, a unified framework that integrates episodic and semantic memory using generative models and resource-constrained mechanisms.
It details a modular architecture with episodic storage, semantic modeling, dynamic integration, and explicit capacity control based on rate–distortion theory.
Experimental frameworks demonstrate improved generalization, efficient recall, and structured narrative construction through hybrid memory representations.

The Generative Episodic-Semantic Integration System (GENESIS) is a class of computational architectures and theoretical frameworks unifying episodic and semantic memory through generative, resource-constrained mechanisms. GENESIS models formalize memory as the interaction of compressive, lossy encoding, generative reconstruction, and active integration of both specific episodes and structured semantic knowledge. Various implementations—spanning rate–distortion-theoretic latent variable models, lifelong content-addressable expert systems, neuro-inspired object-centric architectures, and knowledge graph-plus-dynamics approaches—capture core properties of human and artificial memory, such as generalization, systematic distortions, temporal narrative construction, object compositionality, and efficient recall. This article synthesizes the architecture, algorithms, mathematical foundations, behavioral phenomena, and experimental evidence for GENESIS frameworks.

1. Unified Architectural Principles

GENESIS models are built upon architectures that explicitly dissociate and interconnect episodic and semantic memory components, generally with four major elements:

Episodic module: Stores and retrieves lossy, compressed, context-rich traces of individual experiences. Implementations range from vector memories addressed by compressed latent codes (D'Alessandro et al., 17 Oct 2025, Pickett et al., 2016, Nagy et al., 2018), content-addressable stores for sequential traces (Pickett et al., 2016), or dynamic graph nodes for episodic events (Jiang et al., 6 Jan 2026).
Semantic module: Encodes statistical regularities, categories, and event schemas. Most models instantiate this as a deep generative model (e.g., a β-VAE, mixture of experts, vector-quantized autoencoder, or object-centric scene generator) that learns the domain’s semantics through generative reconstruction objectives (D'Alessandro et al., 17 Oct 2025, Nagy et al., 2018, Engelcke et al., 2021, Fayyaz et al., 2021).
Integration mechanism: Governs the dynamic interaction between episodic retrieval and semantic completion or generalization. Integration may take the form of RAG-style pipelines (D'Alessandro et al., 17 Oct 2025, Rajesh et al., 10 Nov 2025), recurrent interplay in generative decoding (Fayyaz et al., 2021), or spreading-activation mechanisms over a hybrid episodic-semantic graph (Jiang et al., 6 Jan 2026).
Capacity and resource control: Explicit rate–distortion-theoretic constraints or associative structure ensure finite memory, graceful degradation, and trade-offs between memorability, fidelity, and generalization (D'Alessandro et al., 17 Oct 2025, Nagy et al., 2018).

This systemic organization is exemplified in both the theoretical core and algorithmic workflows of all recent GENESIS frameworks.

2. Mathematical Foundations and Objective Functions

All major GENESIS models share a foundation in latent-variable generative models regularized by capacity (rate) constraints. The principal objective, derived from classical rate–distortion theory (Nagy et al., 2018, D'Alessandro et al., 17 Oct 2025), is: $L_{\text{RD}} = R + \beta D$ where the rate $R$ is typically the Kullback-Leibler divergence between approximate posterior and semantic prior ( $\mathrm{KL}[q_\phi(z|x) \| p(z)]$ ), and the distortion $D$ is the negative log-likelihood of reconstruction under the generative model ( $-\log p_\theta(x|z)$ ). Multi-system architectures such as those in (D'Alessandro et al., 17 Oct 2025) introduce compound losses for each module, e.g.,

$\mathcal{L}_{\text{cortical}} = \mathbb{E}_{q(z|x)}[-\log p(x|z,e)] + \beta_c |\mathrm{KL}[q(z|x)\|p(z)] - C_c|$

$\mathcal{L}_{\text{hippo}} = \mathbb{E}_{q(u|h)}[-\log p(h|u)] + \beta_h |\mathrm{KL}[q(u|h)\|p(u)] - C_h|$

where $C_c, C_h$ are explicit capacity constraints (in nats), and $\beta_c, \beta_h$ control adherence to the capacity regime (D'Alessandro et al., 17 Oct 2025). Rate–distortion trade-offs result in a parametric continuum from episodic veridicality ( $C \rightarrow \infty$ ) to semantic generalization and systematic gist distortions ( $R$ 0) (Nagy et al., 2018).

For sequential domains and growing associative memories, further mathematical components include content-addressable vector stores, k-nearest neighbor algorithms, and key-based associative retrieval, realized with differentiable or tree-based look-up structures supporting $R$ 1 access (Pickett et al., 2016, D'Alessandro et al., 17 Oct 2025).

3. Algorithmic Realizations and Retrieval Mechanisms

Episodic Storage and Retrieval

Episodes are encoded as compressed latent traces and stored in associative memory. Retrieval is performed using query vectors—derived from current percepts, temporal cues, or incomplete memory traces—matched to stored keys via cosine similarity or Euclidean distance. Successful variants utilize attention-weighted aggregation over the top-K matches, reconstructing the perceptual experience by generative decoding (D'Alessandro et al., 17 Oct 2025, Pickett et al., 2016).

Semantic Modeling and Completion

Semantic modules are generative models—typically variational autoencoders (VAEs), β-VAEs, or VQ-VAEs—trained to optimize marginal or conditional likelihoods over inputs, learning disentangled (e.g., color, category) or compositional latent spaces (Nagy et al., 2018, D'Alessandro et al., 17 Oct 2025, Fayyaz et al., 2021). For object-centric integration, clustering or stick-breaking processes generate unordered slots for variable object number (GENESIS-v2) (Engelcke et al., 2021).

Semantic completion at recall fills in incompletely retrieved or partially attended episodic traces by conditional sampling from the