Forgetful but Faithful: A Cognitive Memory Architecture and Benchmark for Privacy-Aware Generative Agents (2512.12856v1)
Abstract: As generative agents become increasingly sophisticated and deployed in long-term interactive scenarios, their memory management capabilities emerge as a critical bottleneck for both performance and privacy. Current approaches either maintain unlimited memory stores, leading to computational intractability and privacy concerns, or employ simplistic forgetting mechanisms that compromise agent coherence and functionality. This paper introduces the Memory-Aware Retention Schema (MaRS), a novel framework for human-centered memory management in generative agents, coupled with six theoretically-grounded forgetting policies that balance performance, privacy, and computational efficiency. We present the Forgetful but Faithful Agent (FiFA) benchmark, a comprehensive evaluation framework that assesses agent performance across narrative coherence, goal completion, social recall accuracy, privacy preservation, and cost efficiency. Through extensive experimentation involving 300 evaluation runs across multiple memory budgets and agent configurations, we demonstrate that our hybrid forgetting policy achieves superior performance (composite score: 0.911) while maintaining computational tractability and privacy guarantees. Our work establishes new benchmarks for memory-budgeted agent evaluation and provides practical guidelines for deploying generative agents in resource-constrained, privacy-sensitive environments. The theoretical foundations, implementation framework, and empirical results contribute to the emerging field of human-centered AI by addressing fundamental challenges in agent memory management that directly impact user trust, system scalability, and regulatory compliance.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Practical Applications
Immediate Applications
The following applications can be deployed now using the paper’s Memory‑Aware Retention Schema (MaRS), forgetting policies, privacy layer, and FiFA benchmark to improve coherence, cost, and privacy in generative agents.
- Privacy‑aware enterprise copilots and CRM assistants (software, sales, customer support)
- Use MaRS’s typed memory (episodic, semantic, social, task) with the Hybrid forgetting policy to retain customer preferences, social context, and active tasks under explicit token budgets; sensitivity scores and provenance support RTBF audits.
- Tools/products/workflows: “Memory Governance SDK” that wraps vector stores; retention dashboards showing why items were kept/evicted; FiFA‑based acceptance tests for vendor selection; per‑tenant policy templates.
- Assumptions/dependencies: Reliable PII/sensitivity detection (DLP classifiers), integration with existing RAG/vector DBs, policy tuning for budget sizes and sensitivity weights.
- Healthcare virtual scribes and patient portals (healthcare)
- Summarize episodic encounters into semantic facts (diagnoses, preferences) and accelerate decay of sensitive social/identity details; maintain audit trails for provenance and consent; optional DP‑aware marginal decisions near retention thresholds.
- Tools/products/workflows: EHR‑integrated “Forgetful Mode” with reflection‑summary pipelines; compliance reports demonstrating deletion/summary events; FiFA metrics (privacy leakage per turn) in QA.
- Assumptions/dependencies: HIPAA/GDPR alignment, clinical NER/ontology mapping for sensitivity scoring, clinician review loops for summaries.
- Contact center bots and wealth management advisors (finance)
- Apply LRU + priority‑decay for time‑sensitive tasks and durable semantic facts; enforce budgets to reduce latency/cost while improving social recall; produce auditable deletion logs for regulators.
- Tools/products/workflows: Retention policy library with “risk profiles” (low/medium/high sensitivity); monthly FiFA audits for leakage and cost efficiency; regulator‑ready provenance exports.
- Assumptions/dependencies: Legal record‑retention rules per jurisdiction; robust provenance capture across tools; mapping of “task urgency” to business SLAs.
- Educational tutors and learning platforms (education)
- Track student goals and progress in task memory; compress episodic dialogues into semantic summaries of skills while pruning stale or sensitive social details; maintain coherence across sessions within budgets.
- Tools/products/workflows: Student memory profiles with adjustable retention sliders; teacher‑visible audit trails explaining forgetting decisions; FiFA coherence and goal‑completion checks in release criteria.
- Assumptions/dependencies: Age‑appropriate privacy policies (COPPA/FERPA), calibrated importance scoring for curriculum goals, high‑quality summarization models.
- Software engineering copilots and issue triage assistants (software/devtools)
- Use task memory for issue dependencies, deadlines, and code artifacts; reflection‑summary to compress long threads; privacy‑weighted eviction for secrets/credentials seen in logs.
- Tools/products/workflows: CI/CD “memory budget” checks; secrets detectors feeding sensitivity scores; FiFA cost metrics to cap token spend per PR/incident.
- Assumptions/dependencies: Secret scanners, repository provenance capture, embedding indices for code/context similarity.
- Personal assistants with user‑controlled forgetting (daily life, consumer software)
- Expose memory types and a “Forget” button per item; show why something is kept (importance/recency/task urgency) and when it will decay; enable DP‑aware randomized retention at user‑selected privacy levels.
- Tools/products/workflows: Memory timeline UI; retention score explanations (recency, activation, sensitivity); downloadable audit history.
- Assumptions/dependencies: Simple sensitivity scoring UX, accessible provenance (email/calendar/tool sources), reliable summarization to preserve utility after deletion.
- Data protection and compliance audits for deployed agents (policy, governance)
- Use FiFA’s leakage and coherence metrics to validate agent behavior under explicit budgets; demonstrate RTBF via MaRS deletion/summary logs; adopt sensitivity‑weighted retention as “data minimization by design.”
- Tools/products/workflows: Internal audit harness running FiFA scenarios; policy parameter catalogs (λ_privacy, age decay, minimum task floors); regulatory reporting of memory events.
- Assumptions/dependencies: Agreement on leakage rubrics, storage of audit logs separate from user content, role‑based access to memory governance tools.
- Procurement and vendor evaluation under memory constraints (industry operations)
- Benchmark competing agent platforms using FiFA across budgets; select Hybrid/reflective policies where composite scores are highest; require provenance‑aware memory APIs in RFPs.
- Tools/products/workflows: Standardized FiFA test suites; scorecards with narrative coherence, social recall, privacy leakage, and token costs; contract clauses on retention governance.
- Assumptions/dependencies: Access to configurable memory policies and audit streams from vendors; reproducible simulation environments.
- Multi‑tenant SaaS with region‑specific retention rules (software/cloud)
- Partition budgets and policies per tenant/region; enforce accelerated decay and summary‑only retention for sensitive social nodes; provide auditable, per‑region deletion proofs.
- Tools/products/workflows: Policy engine with jurisdictional templates; per‑tenant memory partitions; FiFA‑inspired privacy SLA metrics.
- Assumptions/dependencies: Geofenced storage/control planes, tenant isolation guarantees, legal consultation for retention horizons.
- Robotics and smart home agents with privacy‑aware memory (robotics, consumer IoT)
- Typed memory separates episodic sensor traces from semantic home facts; reflection‑summary compresses routine observations; sensitivity weights speed forgetting of personal/social details.
- Tools/products/workflows: On‑device memory budgeting; “private zones” configuration for accelerated decay; audit logs for household admins.
- Assumptions/dependencies: Edge inference constraints, local embeddings and indices, user consent management.
Long‑Term Applications
The following applications require additional research, scaling, standardization, or deeper integration before broad deployment.
- Certified memory governance standards for agents (policy, standards)
- Formalize MaRS‑like schemas and FiFA‑style metrics into industry standards (ISO/IEEE) for “forgetting‑by‑design,” with audited provenance, sensitivity scoring, and DP‑aware retention.
- Tools/products/workflows: Certification programs; conformance test suites; public scorecards.
- Assumptions/dependencies: Multi‑stakeholder consensus, regulator buy‑in, interoperable audit formats.
- Runtime differential privacy frameworks for streaming memory (software, privacy tech)
- Mature DP at the policy boundary (exponential mechanism, composition accounting across sessions) with utility bounds tuned to agent tasks; automated privacy budget management.
- Tools/products/workflows: DP orchestration libraries for memory operations; policy simulators for privacy‑utility trade‑offs.
- Assumptions/dependencies: Strong theoretical guarantees under non‑i.i.d. interaction streams, scalable composition tracking, user‑level privacy preference models.
- Adaptive policy selection via online learning (software, research)
- Layer contextual bandits/meta‑learning over forgetting policies to adapt λ_privacy, decay rates, and summary thresholds per user/task, with regret guarantees.
- Tools/products/workflows: Policy learners observing FiFA‑like feedback; auto‑tuning pipelines that adjust memory budgets.
- Assumptions/dependencies: Stable reward proxies (coherence, goal completion), safe exploration constraints, guardrails against privacy regressions.
- Integrated machine unlearning + memory‑layer forgetting (ML, governance)
- Combine parameter‑level unlearning for training data with MaRS‑level runtime deletion/summarization for interaction logs to deliver end‑to‑end RTBF.
- Tools/products/workflows: Cross‑layer deletion orchestrators; proofs of influence removal; regulator‑ready evidence bundles.
- Assumptions/dependencies: Efficient certified unlearning at scale, robust provenance linking training and runtime artifacts.
- Sector‑specific governance: healthcare and finance deployments (healthcare, finance)
- Hospital‑ and bank‑grade agents with strict retention governance, cryptographic deletion proofs, and domain‑specific sensitivity ontologies.
- Tools/products/workflows: Domain ontologies for sensitivity/importance; red‑team leakage exercises; regulator APIs.
- Assumptions/dependencies: Institutional change management, integration with legacy record‑retention systems, third‑party audits.
- Memory hardware/edge acceleration for governed retention (hardware, robotics, IoT)
- Specialized indices and compression on edge devices to support MaRS policies with low latency and energy budgets; on‑device DP noise generation.
- Tools/products/workflows: Edge memory controllers; hardware‑assisted provenance tagging; secure wipe primitives.
- Assumptions/dependencies: Hardware ecosystem support, standardized interfaces, verifiable deletion.
- Multi‑agent memory ecosystems with governed sharing (software, collaboration tools)
- Shared semantic stores where agents contribute summaries under budgets; governed cross‑agent retrieval respecting provenance and sensitivity constraints.
- Tools/products/workflows: Memory‑sharing protocols; tenant‑aware access control; inter‑agent audit trails.
- Assumptions/dependencies: Access governance models, conflict resolution for provenance, scalable central indices.
- Public‑sector records and FOIA‑aware agents (government, policy)
- Agents that separate public records from private interactions, with explainable forgetting and exportable provenance to meet FOIA/records laws.
- Tools/products/workflows: Records management integrations; “explain‑my‑memory” reports for citizens; policy sandboxes.
- Assumptions/dependencies: Statutory mapping to memory types, clear retention mandates, citizen privacy safeguards.
- Safety‑critical robots and vehicles with intentional forgetting (robotics, automotive)
- Policies that ensure only safety‑relevant traces persist; episodic sensor data summarized to semantic maps; auditable deletion to reduce surveillance risks.
- Tools/products/workflows: Safety case templates incorporating memory governance; incident investigation pipelines with bounded retention.
- Assumptions/dependencies: Regulatory acceptance, rigorous distortion bounds for summaries, integration with functional safety standards (ISO 26262).
- Marketplaces and insurance for privacy‑risk in agents (finance, cybersecurity)
- Products pricing “privacy leakage risk” using FiFA‑like metrics; warranties for memory governance; incident response tied to audit trails.
- Tools/products/workflows: Risk scoring APIs; governance SLAs; insurance underwriting models.
- Assumptions/dependencies: Actuarial data on leakage incidents, standardized metrics, legal frameworks.
- Lifelong learning assistants with stable identity and discretion (consumer, education)
- Agents that accompany users for years, retaining durable semantic knowledge while intentionally forgetting sensitive or obsolete details, with user‑controlled policies.
- Tools/products/workflows: Long‑horizon memory planners; personal retention profiles; migration tools across devices/providers.
- Assumptions/dependencies: Portability of provenance and audits, sustainable storage models, user trust and control paradigms.
- Benchmarking culture shift: memory competence leaderboards (academia, industry)
- Widespread adoption of FiFA‑like suites that report coherence, social recall, privacy leakage, and cost; policy‑aware leaderboards complementing task accuracy.
- Tools/products/workflows: Open datasets/simulators for long‑horizon interactions; standardized judging rubrics; community challenges.
- Assumptions/dependencies: Agreement on metrics and protocols, reproducibility infrastructure, shared baselines.
Collections
Sign up for free to add this paper to one or more collections.