Personal and System Memory

Updated 25 September 2025

Personal and System Memory is a domain that defines methods and architectures for encoding, storing, and retrieving information in both human and AI agents.
It combines granular personal memory (episodic and semantic) with system memory (transient and persistent) to support dialogue continuity, planning, and reasoning.
Advanced techniques like digital trace reconstruction, memoryOS architectures, and multimodal data fusion drive improvements in precision, efficiency, and personalization.

Personal and System Memory encompasses the methodologies, architectures, and algorithms by which both humans and intelligent systems encode, store, retrieve, and utilize information from past experiences, external events, or interactions. This domain spans granular memory components in biological and computational agents, hybrid systems that fuse multimodal inputs with structured knowledge, and frameworks enabling both immediate context retention and long-term personal knowledge management.

1. Definitional Distinctions: Personal vs. System Memory

Personal memory refers to the retention and retrieval of information specific to an individual, embracing both episodic experiences (autobiographical events) and semantic information (facts, skills, traits, preferences). In AI systems, personal memory manifests as user-specific context—dialogue histories, individualized profiles, lifelog data, and learned preferences integrated via explicit mechanisms (dialogs, scripts, retrieval from databases) or implicit means (parametric encoding in neural weights).

System memory, by contrast, denotes the collective state, knowledge, and internal outputs stored and managed by an artificial system, often toward supporting global reasoning, planning, action execution, or historical record-keeping. It encompasses both transient structures (working memory, dialog context) and durable artifacts (intermediate results of chain-of-thought reasoning, system-level episodic logs, and procedural memories) (Wu et al., 22 Apr 2025).

The distinction is analytically actionable and foundational in taxonomies such as the 3D-8Q Memory framework, which classifies memory in AI systems across object (personal/system), form (parametric/non-parametric), and time (short/long-term) axes (Wu et al., 22 Apr 2025). This typology clarifies the heterogeneous roles, storage mechanisms, and retrieval requirements of memory within AI agents and human-centric systems.

2. Computation and Modeling of Personal Memory

Explicit reconstruction of personal memory from digital traces utilizes methods from event segmentation, script theory, and probabilistic evidence aggregation. The approach described by (Kalokyri et al., 2020) leverages Personal Digital Traces (PDTs) harvested from emails, calendar entries, transactions, social media, and location signals to reconstruct coherent episodic narratives via predefined scripts. Each “script” models the prototypical flow of everyday activity (e.g., “Eating_Out”), annotated along six contextual dimensions: who, what, where, when, why, how.

The central algorithm operates by instantiating candidate episodes from PDTs, merging candidates based on temporal/spatial proximity, and scoring episode likelihood using product-of-complement rules for probabilistic evidence:

$\text{score}_t(d,S) = 1 - \prod_{i=0}^{t-1}(1-\text{score}_i(d,S))$

Episodes with multiple corroborating PDTs are ranked higher, permitting robust narrative reconstruction even in noisy or incomplete digital environments. The method demonstrates high precision (notably for “when”/“where” aspects) and augments human recall—users retrieved more events via system-integrated reconstruction than unaided memory.

Reminiscence systems such as those built on ACT-R cognitive architecture further personalize retrieval by modeling activation and utility of memory chunks, dynamically updating parameters (activation, reward, utility) in direct response to users' affective states and verbal outputs (Sakai et al., 2022):

$A_i = \ln \left ( \sum_{j=1}^n t_j^{-d} \right ) + Sa_i + \varepsilon_i$

$U_i^{(n)} = U_i^{(n-1)} + \alpha \left [R_i^{(n)} - U_i^{(n-1)} \right ]$

In this paradigm, the system continuously adapts stimuli and retrieval strategies not only to recall facts but to optimize for emotional well-being and personalized reminiscence therapy.

3. Integration in Dialogue Systems, Assistant Frameworks, and Memory Operating Systems

AI dialogue systems and personal assistants require both immediate retention and long-term contextualization. Several frameworks operationalize this duality:

Incorporation of personal memory into knowledge-grounded conversation leverages latent variable modeling (e.g., $Z^p, Z^k$ ) and variational inference to couple user persona/history with external knowledge selection, closing the loop between memory and representative response via dual learning schemes (Fu et al., 2022).
The MemoryOS architecture (Kang et al., 30 May 2025) implements a hierarchical storage system for AI agents: Short-Term Memory (STM; immediate dialogue context), Mid-Term Memory (MTM; topic-segmented boards), and Long-Term Personal Memory (LPM; evolving user traits and knowledge). Dynamic updating (FIFO and segment “heat scores”) governs migration across layers—segment inclusion and promotion utilize metrics such as cosine/keyword similarity and visitation/recency-weighted heat.
PAC (Pluto and Charon) (Ouyang et al., 20 Aug 2024) enhances edge-device LLM fine-tuning with collaborative memory efficiency—Parallel Adapters allow model updates to be performed exclusively on lightweight adapter layers (decoupled from frozen backbone), enabled by an activation cache. This breaks the resource barrier for personalized LLM adaptation, achieving up to 8.64× speedup and 88.16% reduction in memory footprint compared to conventional PEFT techniques.
AI-native memory systems (e.g., SECOND ME (Wei et al., 11 Mar 2025)) organize user data across raw (L0), natural language (L1), and parametric (L2) layers. Memory is not merely stored, but dynamically parameterized (e.g., $M = f_\theta(D)$ ), allowing for latent, context-rich representations that adapt to evolving user requirements, and supporting context-aware, proactive assistance across digital ecosystems.

4. Multimodal, Augmented, and Contextually Enriched Memory

Memory-centric approaches increasingly rely on multimodal augmentation and structured contextual enrichment. Smart assistants and lifelogging systems blend visual, textual, audio, and sensor data, supported by state-of-the-art models and relational abstractions:

The OmniQuery system (Li et al., 12 Sep 2024) structures and augments raw captured memories (photos, videos, screenshots) using atomic (direct metadata/objects), composite (events inferred via temporal clustering and few-shot LLM prompting), and semantic (habit/pattern-level) context. The retrieval pipeline encompasses multi-source embedding similarity (vector $\rightarrow$ cosine), sliding window composite event detection, and chain-of-thought prompted answer synthesis—yielding a 71.5% accuracy, outperforming standard RAG baselines for complex personal queries.
Grounded memory systems for personal assistants (Ocker et al., 9 May 2025) fuse vision-LLMs for captioning/disambiguation, LLMs for entity normalization, and hybrid knowledge graph/vector embedding stores. The integration supports deep semantic retrieval plus graph expansion (e.g., PageRank) and graph queries for explicit relation extraction, ensuring robust memory traceability and relational reasoning.
Privacy-preserving assistants such as e-ViTA (Chollet et al., 2 Jan 2024) balance locally retained conversation workspace vectors with sensor data fusion (on-device and cloud/edge), applying speaker diarization (speaker embedding clustering) to ensure recall and retrieval are tightly mapped to user identity and context.

5. Benchmarking, Evaluation, and Taxonomies

Evaluating memory effectiveness in AI agents necessitates controlled, scalable benchmarks and principled frameworks:

MemSim (Zhang et al., 30 Sep 2024) offers a Bayesian Relation Network (BRNet) driving efficient simulation of user profiles, applying ancestral sampling for generating diverse, hierarchically structured data:

$P(X_1, X_2, ..., X_n) = \prod_{t=1}^n P(X_t | \text{par}(X_t))$

$\tilde{x}_t \sim P(X_t | \text{par}(X_t))$

Combined with a causal message and QA generation mechanism (enforced by hints), MemSim minimizes LLM hallucinations, supporting datasets such as MemDaily for various memory configuration evals (FullMem, ReceMem, RetrMem, etc.), and grounding metrics via effectiveness (accuracy) and efficiency (adaptation/response time).

The PerLTQA dataset and framework (Du et al., 26 Feb 2024) explicitly separates semantic and episodic personal memory, applying BERT-based classification ( $\sim96\%$ accuracy) for selecting relevant memory type, followed by retrieval and synthesis via LLMs. This aligns personal memory management with best practices in QA system performance and context-sensitive answer generation.
The 3D-8Q taxonomy (Wu et al., 22 Apr 2025) describes memory systems along object, form, and time axes, resulting in eight quadrants categorizing personal/system and parametric/non-parametric/short/long-term memory. This typology underpins structured analysis and systematic evolution of AI memory architectures.

6. Hardware and Architectural Advances in System Memory

Underlying all software and algorithmic approaches are advances and bottlenecks in physical memory architectures. Two recent perspectives have emerged:

Memory-centric computing (Mutlu et al., 1 May 2025) advocates shifting computation directly into memory structures (Processing In Memory, PIM) to mitigate performance, energy, and reliability bottlenecks that conventional processor-centric paradigms face—especially as DRAM scaling hits physical limits (RowHammer, RowPress, VRD, data retention).
System architectures leveraging compute-memory nodes (Liu et al., 28 Aug 2025) propose explicit, tightly coupled local memory (using advanced 2.5D/3D integration) accessible at micrometer scale, with critical data assigned by software to the lowest-latency, highest-bandwidth, lowest-energy tier. Hierarchical tiers (SRAM, HBM, DRAM) explicitly separate hot/cold data placement, reducing energy per bit ( $E_\text{local} \ll E_\text{shared} \ll E_\text{off-chip}$ ) and enabling scalable performance.

7. Open Problems, Future Directions, and Challenges

Open challenges in the field involve:

Scalability and streaming: Developing agile, stream-oriented memory systems that fuse multimodal data continuously, rather than in batches (Wu et al., 22 Apr 2025).
Automation: Enabling self-refining memory models that evolve via both personal and system-side updates across ever-changing contexts and user requirements.
Privacy: Balancing detailed personal memory retention with both individual and collective privacy, especially as agents move toward cross-domain memory sharing.
Hardware-software co-design: Realizing memory-centric computation demands coordination between OS/memory controllers, software frameworks, and physical memory chip design (Mutlu et al., 1 May 2025, Liu et al., 28 Aug 2025).

A plausible implication is that future intelligent agents and assistive systems will be defined by the sophistication, adaptability, and privacy-preserving capabilities of their memory architectures—integrating advances in algorithmic reasoning, multimodal fusion, and hardware design into seamless platforms for lifelong personal and system memory management.