The Right to Be Remembered: Preserving Maximally Truthful Digital Memory in the Age of AI (2510.16206v2)

Published 17 Oct 2025 in cs.AI

Abstract: Since the rapid expansion of LLMs, people have begun to rely on them for information retrieval. While traditional search engines display ranked lists of sources shaped by search engine optimization (SEO), advertising, and personalization, LLMs typically provide a synthesized response that feels singular and authoritative. While both approaches carry risks of bias and omission, LLMs may amplify the effect by collapsing multiple perspectives into one answer, reducing users ability or inclination to compare alternatives. This concentrates power over information in a few LLM vendors whose systems effectively shape what is remembered and what is overlooked. As a result, certain narratives, individuals or groups, may be disproportionately suppressed, while others are disproportionately elevated. Over time, this creates a new threat: the gradual erasure of those with limited digital presence, and the amplification of those already prominent, reshaping collective memory. To address these concerns, this paper presents a concept of the Right To Be Remembered (RTBR) which encompasses minimizing the risk of AI-driven information omission, embracing the right of fair treatment, while ensuring that the generated content would be maximally truthful.

Summary

The paper introduces RTBR as a counterpoint to the right to be forgotten, emphasizing the ethical imperative to preserve digital memory in AI systems.
The paper details how LLMs synthesize information, often amplifying dominant narratives and marginalizing less prominent perspectives, while proposing layered attribution.
The paper advocates for integrating technical and regulatory measures that balance digital memory preservation with privacy concerns to ensure collective historical accuracy.

Preserving Maximally Truthful Digital Memory: The Right to Be Remembered in AI Systems

Introduction

This paper introduces and formalizes the concept of the Right To Be Remembered (RTBR) as a counterpoint to the established "right to be forgotten" in the context of AI-mediated knowledge systems, particularly LLMs. The authors argue that as LLMs become the dominant interface for information retrieval, the preservation of maximally truthful digital memory is not only a technical challenge but also an ethical and societal imperative. The RTBR is positioned as a normative claim: contributions to human knowledge—scientific, cultural, or social—should remain accessible and visible within digital infrastructures, ensuring fair treatment and minimizing AI-driven information omission.

The Dynamics of Digital Memory and Erasure

LLMs, trained on vast corpora, synthesize information rather than retrieve discrete facts, leading to the collapse of multiple perspectives into singular, authoritative responses. This process amplifies the risk of bias, omission, and the gradual erasure of less digitally prominent narratives. The paper highlights empirical evidence of "link rot" and the fragility of digital records, noting that a substantial portion of online content becomes inaccessible within a decade. The consequences are profound: marginalized communities, non-English scholarship, and contributions outside major institutions are disproportionately excluded from AI-mediated knowledge retrieval, entrenching epistemic inequities.

The authors emphasize that remembrance is not a neutral by-product but a process structured by institutional, economic, and technological forces. The RTBR is thus framed as a societal duty to maintain a truthful and inclusive record, recognizing that collective memory sustains accountability, enables progress, and fosters fairness.

Responsibility for Forgetting: Vendors, Models, and Users

The paper delineates the sources of bias and omission in LLM outputs. Vendors exert direct control over training data selection, filtering, moderation, and deployment design, shaping the boundaries of what models can "know." RLHF and other fine-tuning methods embed human judgments, potentially entrenching biases. LLMs themselves, due to their statistical nature, amplify dominant patterns and underproduce rare or marginalized perspectives, exhibiting topic-dependent biases even across architectures. User prompt phrasing further influences outputs, but vendor-imposed constraints remain primary.

The interplay between vendor choices and model tendencies creates a dual responsibility for the erasure or distortion of digital memory. The authors argue that accountability must be demanded from vendors, while technical solutions must address intrinsic model biases.

Ensuring Maximal Truthfulness in AI

Maximal truthfulness is conceptualized as a multi-layered construct encompassing statistical accuracy, epistemic relation to external evidence, and normative honesty. The paper reviews emerging benchmarks (e.g., MASK) that formalize distinctions between model "beliefs" and ground truth, and highlights geometric analyses of neural activations that reveal structural signatures of truthfulness. Notably, truthful responses are encoded in more compact, lower-dimensional activation patterns, while hallucinations are scattered across higher-dimensional manifolds.

The authors advocate for layered provenance and attribution mechanisms, drawing on standards like EKILA and C2PA, to preserve recognition of contributors without compromising efficiency. They note that current solutions capture data lineage and content authenticity but lack systematic epistemic acknowledgment of intellectual labor. Transparency about uncertainty is also emphasized; models should abstain from answering when evidence is insufficient, prioritizing honesty over fluency.

RTBR vs. Right to Erasure: Legal and Technical Tensions

The RTBR is contrasted with the "right to be forgotten," particularly as codified in GDPR Article 17. While traditional systems allow for discrete data deletion, LLMs embed knowledge in parameter space, making erasure technically challenging and potentially damaging to model integrity. The authors argue that, in foundational AI, the collective RTBR—ensuring maximal truthfulness and historical accuracy—should take precedence over individual claims to erasure, especially for deceased individuals whose digital legacy holds societal value.

The phenomenon of "machine unlearning" is discussed as an emerging but imperfect solution for data removal from models, often resulting in degraded utility and coherence. The authors assert that erasing model "memories" risks introducing gaps and distortions into the historical record, undermining the integrity of collective knowledge.

Implications and Future Directions

The paper reframes digital memory preservation as an ethical, technical, and societal imperative. It calls for a shift in AI design toward systems that explicitly optimize for epistemic integrity, incorporating layered attribution, traceable provenance, and uncertainty calibration as fundamental architectural components. Regulatory frameworks must evolve to balance privacy protection with the necessity of preserving collective memory.

Empirical findings on the geometric structure of truthfulness in neural networks invite further research into architectures that align internal representations with verifiable knowledge states. The challenge is to ensure that the archive of human thought remains inclusive, accurate, and representative, preserving meaningful human contributions rather than perpetuating all data indiscriminately.

Conclusion

The integrity of AI-mediated digital memory will increasingly define the boundaries of collective understanding. The RTBR articulates a new ethical foundation: digital memory should be preserved as a public good, and truthfulness in AI depends on the inclusivity and continuity of the human record. The authors argue that RTBR and maximal truthfulness are net positive for both individuals and populations, facilitating optimal human-AI convergence. The paper raises important philosophical questions regarding equal and fair treatment of individual data, the impact of digital immortality, and the future of human behavior in the age of AI, but maintains that the preservation of maximally truthful digital memory is essential to sustaining the conditions of human understanding.

PDF Markdown

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Explain it Like I'm 14

Overview

This paper introduces a new idea called the Right To Be Remembered (RTBR). It argues that as artificial intelligence (especially LLMs, or LLMs) becomes the main way people find information, we need to make sure our digital memories are kept truthful, complete, and fair. The authors worry that AI systems can accidentally leave out important voices and facts—especially from people or places with smaller online footprints—so they propose RTBR to help protect what humanity knows and remembers.

Key questions the paper asks

To make this clear, here are the main questions the authors explore:

How do AI systems (like chatbots) shape what we remember and forget online?
Why do some people, communities, and scientific work disappear from digital memory more than others?
What does “maximally truthful” AI look like, and how can we build it?
How should the “Right to Be Remembered” fit with the “Right to Be Forgotten” (the legal right to erase personal data)?
What design and policy changes are needed so AI keeps memory honest, inclusive, and useful?

Methods and approach (explained simply)

This isn’t a lab experiment paper—it’s a careful, big-picture review and proposal. The authors:

Explain how LLMs work: An LLM is like a super-fast writer trained by reading huge amounts of text. It predicts the next word in a sentence. Most are built using a “transformer,” a kind of model that pays attention to many parts of a text at once (like a reader who can remember both the start and end of a book while reading the middle).
Describe Retrieval-Augmented Generation (RAG): This is like asking a librarian to fetch sources while the AI writes its answer, so it can quote evidence instead of guessing.
Point out real-world problems:
- Link rot: Web pages disappear or move, so sources vanish over time (like books going missing from a library).
- Bias in training data: If the internet has fewer works from some countries, languages, or groups, the AI will know less about them.
- Vendor choices: Companies decide what data to include, how to filter answers, and whether to show multiple viewpoints or just one. These choices shape what gets remembered.
Summarize research about “truth signals” inside models:
- “Truth direction”: Think of it as a kind of internal truth compass—some studies suggest models have activation patterns that tend to point toward correct answers.
- “Local intrinsic dimension”: True answers seem to be “simpler shapes” inside the model’s brain, while made-up answers look more tangled.
Compare RTBR with the “Right to Erasure” (GDPR’s “Right to Be Forgotten”): Erasing data from an AI is very hard because knowledge is blended into the model (like trying to remove one drop of dye from a swimming pool without changing the water). They discuss “machine unlearning,” which tries to remove specific information but can damage the model’s overall usefulness.

Main findings and why they matter

Here are the paper’s main points, explained in everyday language:

AI answers feel authoritative but can hide what’s missing. When a chatbot gives a single, smooth answer, users may not realize some viewpoints or facts were left out. This makes popular narratives stronger and weakens less-visible ones.
Digital memory is fragile. Over time, links break, pages get taken down, older formats become unreadable, and content gets de-ranked. If AI relies on this unstable web, important parts of knowledge can fade away.
Visibility is unequal. Work from certain regions, languages, or communities is underrepresented online. That means AI systems are more likely to overlook it, further reducing its presence in the future.
Vendors have power over memory. Companies choose training data, moderation rules, and interface designs (do you see one answer or several?). These decisions quietly decide who gets remembered.
Truthfulness needs more than facts. The authors say “maximally truthful” AI should include:
- Accuracy: Being factually correct.
- Honesty: Saying what the model “really believes” based on its training, not just pleasing the prompt.
- Provenance: Showing where information comes from (citations and credit).
- Uncertainty: Saying “I don’t know” when evidence is weak.
RTBR vs Right to Erasure: For foundational AI systems (big general-purpose models), the authors argue society’s need to preserve a complete and accurate record should usually outweigh individual requests to erase truthful information—especially for historical and scientific knowledge, and particularly after a person’s death.
Practical design ideas:
- Layered provenance: Give quick answers up front, but let users open a deeper trail of sources and credits underneath (like expanding footnotes).
- Multiple perspectives and calibrated confidence: Show uncertainty and different viewpoints when the topic is complex or debated.
- Retrieval and preservation: Strengthen tools that fetch and protect sources, including work in less dominant languages and older formats.

What this could mean for the future

If we adopt RTBR as a guiding principle:

AI systems would aim to protect our shared memory, not just deliver convenient answers. That means actively preserving diverse voices, crediting contributors, and signaling uncertainty.
Designers and policymakers would treat digital memory as a public good—like clean water or public libraries—something we must maintain for everyone.
Laws (like GDPR) and AI standards might evolve to balance privacy with the need to keep history and knowledge complete and accessible.
Future generations would inherit a richer, more truthful record of human experience, helping science, culture, and community stories continue and grow.

In short, the paper argues that remembering well—fairly, fully, and truthfully—is essential for AI to help humanity learn, correct mistakes, and make better decisions. The Right To Be Remembered is a call to build AI that not only answers questions today but also protects the foundations of knowledge for tomorrow.

View Paper Prompt View All Prompts

Knowledge Gaps

Unresolved gaps and open questions

Below is a concise, actionable list of knowledge gaps, limitations, and open questions the paper leaves unresolved.

Conceptual and definitional gaps

Precise operational definition of “Right to Be Remembered (RTBR)” and its scope (who/what is covered, thresholds for inclusion, duration, and mechanisms of enforcement).
Formalization of “maximal truthfulness” (e.g., a measurable objective function, metrics, and benchmarks that combine accuracy, honesty, provenance, inclusivity, and uncertainty).
Clear criteria for resolving conflicts between RTBR, safety policies, and the suppression of harmful, illegal, or defamatory content.
Framework to distinguish remembrance of contributions from preservation of misinformation, propaganda, or manipulated content.

Empirical evidence and measurement

Longitudinal, quantitative evidence that LLMs cause or accelerate erasure compared to search engines (user behavior, answer diversity, citation persistence, narrative coverage).
Metrics to quantify “visibility inequality” in AI outputs (e.g., a silencing index across languages, geographies, institutions, and communities).
Robust methods to measure the impact of link rot on LLM/RAG answers over time and to evaluate mitigation strategies.
Controlled studies on whether single-answer interfaces reduce users’ comparison of alternatives and increase omission bias.
Cross-lingual and cross-cultural evaluations of remembrance (coverage and fidelity for non-English sources and marginalized communities).
Empirical tests that internal “truth directions” and local intrinsic dimension (LID) signals generalize across models, domains, and adversarial prompts.

Technical architecture and implementation

A concrete system design for layered provenance in LLMs that preserves fast answers while carrying forward scholarly citation chains and credit.
Methods to embed machine-readable epistemic provenance (researchers, labs, communities) into model outputs beyond content authenticity (e.g., C2PA-like for text).
Retrieval/indexing strategies that systematically surface underrepresented, non-digitized, or poorly-indexed sources (including multilingual and “grey literature”).
Standardized metadata, ontologies, and APIs to make archives, institutional repositories, and community memory banks machine-actionable for RAG.
Algorithms to calibrate abstention that reliably detect “unknowns” without disproportionally suppressing minority or rare perspectives.
Integration of internal truthfulness signals (truth direction, LID) into decoding and ranking pipelines for real-time hallucination suppression.
Methods to track and preserve “citation chains” during synthesis (e.g., lineage graphs maintained through prompt, retrieval, and generation).
Tooling to version model outputs and their supporting evidence over time for auditability and historical reconstruction.

Governance, policy, and legal reconciliation

Concrete proposals to reconcile RTBR with GDPR Article 17 (Right to Erasure) beyond high-level prioritization (e.g., scoped exceptions, balancing tests, adjudication processes).
A due-process mechanism for contested remembrance requests (who decides inclusion/exclusion, appeal pathways, transparency obligations).
Post-mortem privacy and consent protocols (family/estate rights, cultural norms, jurisdictional differences) to guide ingestion of deceased individuals’ digital legacies.
Licensing and copyright guidance for integrating archival and proprietary materials (including consent models for non-public or community-held data).
Vendor accountability frameworks and audit standards to ensure RTBR-compliant training, retrieval, and moderation (including third-party oversight).
International harmonization challenges (divergent legal regimes, data sovereignty, and cross-border data flows).

Equity, ethics, and community participation

Mechanisms to prevent RTBR from platforming harmful actors or strategic “visibility gaming” while still preserving legitimate minority narratives.
Community co-governance models (Global South, indigenous, disability, and other groups) to set inclusion rules, correct misrepresentation, and steward their archives.
Ethical guidelines for balancing remembrance with contextual integrity (e.g., sensitive histories, stigmatizing data, and shifting norms over time).
Procedures for soliciting and maintaining corrections, retractions, and counter-narratives to avoid freezing historical errors.
Safeguards against adversarial flooding, censorship, and information operations that exploit RTBR to distort collective memory.

Operational feasibility and sustainability

Cost, compute, and storage assessments for large-scale remembrance infrastructure (archives, multilingual ingestion, ongoing provenance maintenance).
Environmental impact analysis (carbon footprint of persistent archiving, continuous re-indexing, and retraining for remembrance-compliant models).
Incentive design for vendors to adopt remembrance-friendly architectures (business models, regulatory incentives, public procurement standards).
Maintenance and decay handling (backups, mirroring, format migration, and resilience against domain lapses) to counter long-term link rot.

Machine unlearning and model maintenance

Technical pathways to accommodate legally mandated erasures without degrading model integrity (granular unlearning, modular knowledge compartments).
Trade-off analysis between RTBR and unlearning accuracy (utility loss, bias shifts, and emergent gaps) and criteria for proportionality.
Strategies to prevent catastrophic forgetting of marginalized content during routine fine-tuning and model updates.

User experience and behavior

Interface designs that present multiple perspectives and uncertainty signals without overwhelming users or harming utility.
Experiments to determine how provenance depth, credit visibility, and uncertainty cues affect trust, learning, and knowledge retention.
Personalization safeguards to ensure RTBR does not devolve into echo chambers or reification of a user’s prior exposures.

Security and integrity of digital memory

Verification pipelines for authenticity of archived text and multimedia (e.g., extensions of C2PA/EKILA to scholarly and social content).
Robustness to deepfakes, synthetic text laundering, and metadata tampering in provenance chains.
Detection and mitigation of coordinated campaigns that aim to manipulate “what is remembered” for political or commercial gain.

View Paper Prompt View All Prompts

Practical Applications

Immediate Applications

These applications can be deployed today using existing methods, standards, and workflows that the paper synthesizes or recommends.

Bold, layered provenance in AI outputs
- Sectors: software, education, publishing, journalism
- What: Ship LLM UX that defaults to concise answers with a one-click “Show Sources/Attribution” panel listing citations, contributor IDs (e.g., ORCID), and evidence snippets
- Tools/Products/Workflows: C2PA-style metadata embeddings; JSON-LD provenance; DOI/ORCID linking; EKILA-like attribution concepts adapted to text; expandable citations and “citation chain explorer”
- Assumptions/Dependencies: Vendor UX adoption; reliable citation resolution (Crossref/DataCite APIs); legal review for citation display
Diversity-aware answer mode (multi-perspective synthesis)
- Sectors: search, education, media platforms
- What: Add a toggle to present multiple perspectives, including minority/underrepresented sources, rather than a single authoritative answer—especially for contested topics
- Tools/Products/Workflows: Diversity-aware IR re-ranking; fairness constraints in retrieval; source clustering by viewpoint; user-controllable “diversity slider”
- Assumptions/Dependencies: Editorial policy; acceptance of trade-offs between brevity and pluralism; labeled or inferred diversity signals
Calibrated “I don’t know” and abstention in production assistants
- Sectors: healthcare, legal, finance, enterprise support
- What: Enable selective answering with uncertainty thresholds so models abstain when evidence is insufficient and suggest next steps (e.g., consult a human, retrieve primary sources)
- Tools/Products/Workflows: Confidence calibration; selective prediction; “know-what-you-know” probes; guardrails and human-in-the-loop escalation
- Assumptions/Dependencies: Regulatory tolerance for abstention; business KPIs that value safety over coverage; monitoring to prevent over-abstention
Link rot mitigation in retrieval and training pipelines
- Sectors: LLM vendors, publishers, libraries, archives
- What: Snapshot cited pages at ingestion; replace dead links with archival mirrors; maintain durable perma-links
- Tools/Products/Workflows: Internet Archive/Memento APIs; Perma.cc; CI link checkers; canonicalization policies
- Assumptions/Dependencies: License compatibility; archive availability; storage budgets
RTBR-aware RAG: long-tail and non-English coverage by design
- Sectors: software, education, scientific tools
- What: Expand retrieval indices to include non-English, older, and under-digitized materials; weight recall of long-tail sources; expose language/source diversity in results
- Tools/Products/Workflows: Multilingual embedding models; dedicated corpora from institutional repositories; language-aware rerankers
- Assumptions/Dependencies: Access to multilingual/legacy collections; OCR and normalization quality; compute overhead
Attribution-preserving content creation workflows
- Sectors: publishing, newsrooms, marketing, academia
- What: Require AI-assisted content to embed traceable provenance and credit; maintain scholarly-style references in web articles and PDFs
- Tools/Products/Workflows: CMS plugins for C2PA/JSON-LD; template policies; automatic bib generation from DOIs; provenance lints in CI
- Assumptions/Dependencies: Editorial buy-in; reader UX considerations; training for staff
Bias and “memory equity” dashboards for model/data governance
- Sectors: AI governance, MLOps, compliance
- What: Track representation coverage by language, geography, and institution; monitor link rot rates; flag topic areas with sparse evidence
- Tools/Products/Workflows: Data profiling; sampling coverage analytics; evaluation sets across cultures; periodic audits
- Assumptions/Dependencies: Access to data lineage; agreement on equity metrics; privacy-preserving reporting
Internal truthfulness monitors to flag likely hallucinations
- Sectors: enterprise platforms, safety teams
- What: Use model-internal signals (e.g., “truth direction,” local intrinsic dimension) to flag outputs for extra verification before delivery
- Tools/Products/Workflows: Activation probes; LID estimators; secondary verification pass; risk labeling in logs
- Assumptions/Dependencies: Access to model internals or surrogate probes; validation against ground truth; performance impact assessment
Institutional archiving and DOI adoption drives
- Sectors: academia, NGOs, government agencies
- What: Ensure outputs receive DOIs; deposit artifacts in stable repositories; add multilingual abstracts; schema.org markup for discovery
- Tools/Products/Workflows: Crossref/DataCite registration; LOCKSS/Portico; institutional repositories; Save Page Now automations
- Assumptions/Dependencies: Funding and staffing; policy mandates; coordination with libraries/archives
Procurement checklists for AI systems with RTBR requirements
- Sectors: public sector, regulated industries
- What: Require provenance, abstention capabilities, multi-perspective mode, and archiving strategies in RFPs and vendor assessments
- Tools/Products/Workflows: Model cards with RTBR sections; data statements; acceptance tests covering inclusivity and uncertainty
- Assumptions/Dependencies: Policy adoption; vendor ecosystem readiness; auditing capacity
End-user prompt practices for epistemic hygiene
- Sectors: daily life, journalism, education
- What: Provide prompt templates/extensions that request sources, ask for multiple perspectives, and ask the model to state confidence and unknowns
- Tools/Products/Workflows: Browser extensions; prompt libraries; LMS and newsroom playbooks
- Assumptions/Dependencies: User training; compatibility across LLM providers
Data donation and post-mortem digital legacy programs
- Sectors: civic tech, memorial services, libraries
- What: Offer consent-based programs to preserve personal archives for historical research and AI training, with clear governance
- Tools/Products/Workflows: Consent management portals; personal data stores; standardized deposit agreements
- Assumptions/Dependencies: Legal clarity; trust frameworks; ethical review boards

Long-Term Applications

These applications likely require further research, standardization, scaling, or regulatory change before widespread deployment.

RTBR-aligned foundation model training
- Sectors: AI research, vendors
- What: Co-train models on token prediction plus objectives for provenance fidelity, calibrated abstention, and memory equity constraints
- Tools/Products/Workflows: Multi-objective loss functions; representation regularizers tied to “truth direction”; retrieval-grounded training loops
- Assumptions/Dependencies: Access to high-quality, provenance-rich corpora; scalable training methods; open evaluation benchmarks
Global, federated digital memory infrastructure
- Sectors: libraries, archives, standards bodies, AI labs
- What: Build a tamper-evident, open, multilingual repository of human contributions used as a canonical training and retrieval backbone
- Tools/Products/Workflows: Content authenticity (C2PA); persistent identifiers (DOI/ORCID/ISNI); interoperable knowledge graphs; decentralized storage
- Assumptions/Dependencies: International coordination; sustainable funding; governance to prevent capture
Legal reconciliation of RTBR and Right to Erasure
- Sectors: policymakers, regulators, civil society
- What: Define exceptions, safe harbors, and post-mortem norms for preservation; clarify standards for “machine unlearning” obligations
- Tools/Products/Workflows: Model law templates; regulatory guidance; privacy-by-design with archival exceptions
- Assumptions/Dependencies: Political consensus; cross-jurisdiction harmonization; stakeholder engagement
Surgical, provenance-aware machine unlearning
- Sectors: AI safety, research
- What: Develop methods to remove specific personal data without degrading general knowledge or inducing distortions
- Tools/Products/Workflows: Parameter editing; targeted forgetting with constraint satisfaction; inference-time masking with verified provenance checks
- Assumptions/Dependencies: New theory and benchmarks; compute budgets; risk mitigation for collateral forgetting
Memory equity standards and certification
- Sectors: standards bodies, auditors, enterprises
- What: Establish measurable inclusivity/coverage criteria and third-party audits; certify “RTBR-compliant” systems
- Tools/Products/Workflows: ISO-style standards; public scorecards; red-team evaluations across cultures and languages
- Assumptions/Dependencies: Agreement on metrics; independent auditors; incentives for compliance
Education: epistemic literacy by default
- Sectors: K–12, higher ed, professional training
- What: Integrate curricula and tools that teach multi-perspective analysis, source tracing, uncertainty interpretation, and AI limitations
- Tools/Products/Workflows: Classroom assistants that surface primary sources; “source density meters”; debate-mode AI tutors
- Assumptions/Dependencies: Curriculum approvals; teacher training; equitable tech access
Healthcare evidence assistants with archival resilience
- Sectors: healthcare
- What: Clinical AI that always surfaces original trials (including non-English/older studies), registers null results, and abstains when evidence is weak
- Tools/Products/Workflows: RAG tied to clinical registries; multilingual EBM corpora; uncertainty-calibrated recommendations
- Assumptions/Dependencies: Regulatory approval; liability frameworks; integration with EHRs
Finance and legal AI with audit-grade provenance
- Sectors: finance, legal, compliance
- What: Advisory systems that deliver decisions with verifiable document trails and abstain under ambiguity, enabling regulator-ready audits
- Tools/Products/Workflows: Immutable evidence ledgers; per-decision provenance packets; continuous verification pipelines
- Assumptions/Dependencies: Standardized audit formats; regulator endorsement; secure data handling
Consumer-grade Personal Memory Vaults
- Sectors: consumer software, privacy tech
- What: Personal data stores with user-governed licensing to contribute to collective memory and AI training, including post-mortem directives
- Tools/Products/Workflows: Secure personal data pods; granular consent; micro-licensing and revenue-sharing
- Assumptions/Dependencies: Trustworthy identity and consent infrastructure; market demand; privacy guarantees
Data cooperatives to uplift underrepresented corpora
- Sectors: NGOs, cultural institutions, philanthropic funders
- What: Community-led digitization and corpus curation for underrepresented languages and regions with equitable licensing
- Tools/Products/Workflows: Participatory data governance; localized OCR/ASR; multilingual annotation programs
- Assumptions/Dependencies: Funding; community leadership; data sovereignty agreements
Open SDKs for “citation-first” LLM development
- Sectors: developer tooling, open source
- What: Provide libraries that make provenance embedding, abstention, diversity re-ranking, and archival mirroring first-class primitives
- Tools/Products/Workflows: Open-source packages; reference UIs; evaluation harnesses for RTBR metrics
- Assumptions/Dependencies: Community stewardship; compatibility across providers; maintainability
Independent RTBR auditors and marketplaces
- Sectors: assurance, marketplaces, enterprises
- What: Create a market of third-party RTBR audits and continuous monitoring services for LLMs and retrieval systems
- Tools/Products/Workflows: Black-box and white-box audit suites; bias/coverage probes; SLA-backed monitoring
- Assumptions/Dependencies: Clear demand from buyers; standardized reports; access to systems under test

Notes on cross-cutting assumptions and dependencies:

Data rights and licensing must permit preservation and attribution while respecting privacy and cultural data sovereignty.
Many capabilities (e.g., truthfulness probes) are easier with white-box access; black-box alternatives may need proxies and carry higher uncertainty.
Performance, latency, and cost trade-offs are real—provenance and multi-perspective features add overhead that must be engineered carefully.
Organizational incentives must align with safety and inclusivity (e.g., KPIs valuing accuracy and equity, not just speed or engagement).
Standardization (C2PA extensions for text, provenance schemas, memory equity metrics) will accelerate interoperability and adoption.

View Paper Prompt View All Prompts

Glossary

Abstention mechanisms: Design strategies that enable models to withhold answers when evidence is insufficient. "Such abstention mechanisms are essential to a conception of maximal truthfulness that prioritizes honesty over fluency."
Activation space: The high-dimensional space of a neural network’s internal activations used to represent information during processing. "a 'truth direction' in activation space"
Algorithmic de-ranking: Automated lowering of content visibility or ranking by platform algorithms. "algorithmic de-ranking"
Algorithmic omission: Systematic exclusion of certain information caused by algorithmic processes or design. "algorithmic omission"
Attribution trails: Embedded metadata chains that record credit and sourcing for generated content. "metadata and attribution trails can be embedded directly into outputs."
C2PA content authenticity standard: An industry standard for embedding and verifying provenance and authenticity of digital content. "the C2PA content authenticity standard"
Calibrated probabilities: Probabilities adjusted to reflect true confidence levels, often used to signal “knowing” versus “not knowing.” "calibrated probabilities of 'knowing' versus 'not knowing'"
Content moderation takedowns: Removals of online material by platforms to enforce policies or regulations. "content moderation takedowns"
Data controllers: Entities that determine purposes and means of processing personal data under data protection law. "mandates that data controllers erase personal data"
Data lineage: Documentation of the origins, transformations, and flow of data through systems. "Current solutions largely capture data lineage and content authenticity"
Digital Object Identifiers (DOIs): Persistent identifiers used to uniquely reference digital scholarly works. "Digital Object Identifiers (DOIs)"
EKILA: An initiative for synthetic media provenance and attribution in generative art. "initiatives such as EKILA for digital art"
Epistemic humility: A norm of acknowledging uncertainty and limits in knowledge claims. "cultivating norms of epistemic humility"
Epistemic integrity: The alignment of system design and outputs with reliable, verifiable knowledge. "optimize for epistemic integrity"
Epistemic justice: Fair representation and recognition within knowledge systems, avoiding unjust exclusion or bias. "touching on recognition, fairness and epistemic justice in the digital age."
Foundational AI: Base-level AI systems or models that encode broad knowledge and underpin many downstream applications. "in the context of foundational AI, the collective RTBR including ensuring maximal truthfulness and historical accuracy must take precedence"
Foundational models: Large pretrained models that serve as general-purpose bases for varied tasks. "foundational models increasingly shape the epistemic foundations of future generations"
General Data Protection Regulation (GDPR): The EU’s comprehensive data protection law governing personal data processing. "General Data Protection Regulation (GDPR), officially termed the 'Right to Erasure'"
Hallucination: Confident but incorrect or fabricated content generated by AI models. "ideally reducing hallucination and providing verifiable citations."
Jurisprudence: The theory and case law framework guiding legal interpretation and rights. "Emerging from European jurisprudence"
Link rot: The phenomenon of URLs becoming inaccessible over time due to web content decay. "Empirical studies of 'link rot' show that a substantial portion of online content disappears within a few years"
Local intrinsic dimension analysis: A method for characterizing the dimensionality of local regions in model representations. "Using local intrinsic dimension analysis, investigators found that truthful responses are encoded in more compact, lower-dimensional activation patterns"
Machine unlearning: Techniques for removing specific learned information from model parameters post-training. "a new phenomenon: 'machine unlearning'"
Manifolds: Mathematical spaces that locally resemble Euclidean space; used to describe structure in high-dimensional representations. "hallucinations are scattered across higher-dimensional manifolds"
MASK framework: A benchmark to assess whether models state what they “believe,” distinguishing honesty from accuracy. "the MASK framework, which attempts to measure whether models state what they 'believe,'"
Parameter space: The space of all model parameters where learned patterns are stored and represented. "distributed patterns in parameter space"
Probing studies: Analytical techniques that use auxiliary models or tasks to investigate internal representations in neural networks. "Probing studies have identified what has been called a 'truth direction'"
Provenance: The documented origin and history of information or content, enabling traceability and credit. "Provenance can be layered so that immediate answers remain accessible"
Reinforcement learning with human feedback (RLHF): A fine-tuning method that aligns model outputs with human preferences via feedback-driven rewards. "fine-tune models through reinforcement learning with human feedback (RLHF)"
Retrieval-augmented generation (RAG): A method that augments generative models with retrieved evidence to improve factuality. "Retrieval-augmented generation (RAG) was developed to address the limitations"
Right To Be Remembered (RTBR): A proposed right asserting the preservation and fair representation of contributions in digital memory. "this paper presents a concept of the Right To Be Remembered (RTBR)"
Right to be forgotten: A legal concept allowing individuals to have personal data delisted or removed from search results. "commonly known as the 'right to be forgotten'."
Right to Erasure: GDPR Article 17 right requiring deletion of personal data under specified conditions. '"Right to Erasure", which mandates that data controllers erase personal data'
Search engine optimization (SEO): Techniques aimed at improving the visibility and ranking of content in search engines. "search engine optimization (SEO)"
Self-attention mechanism: The transformer component that relates tokens to one another to capture dependencies. "whose self-attention mechanism allows the network to capture both short- and long-range dependencies"
Self-supervised learning objective: A training setup where models learn from unlabeled data by predicting parts of the input (e.g., next tokens). "using a self-supervised learning objective: given a sequence of tokens, the model predicts the probability of the next token"
Transformer architecture: A neural network architecture based on attention mechanisms, widely used for LLMs. "built on the transformer architecture"
Truth direction: A geometric direction in model representation space correlated with factual correctness. '"truth direction" in activation space'

View Paper Prompt View All Prompts

Open Problems

We found no open problems mentioned in this paper.

Continue Learning

Authors (3)

Collections

Tweets

This paper has been mentioned in 3 tweets and received 198 likes.

Upgrade to Pro to view all of the tweets about this paper:

Start a free 7-day Pro trial

The Right to Be Remembered: Preserving Maximally Truthful Digital Memory in the Age of AI (2510.16206v2)

Summary

Preserving Maximally Truthful Digital Memory: The Right to Be Remembered in AI Systems

Introduction

The Dynamics of Digital Memory and Erasure

Responsibility for Forgetting: Vendors, Models, and Users

Ensuring Maximal Truthfulness in AI

RTBR vs. Right to Erasure: Legal and Technical Tensions

Implications and Future Directions

Conclusion

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

Overview

Key questions the paper asks

Methods and approach (explained simply)

Main findings and why they matter

What this could mean for the future

Knowledge Gaps

Unresolved gaps and open questions

Conceptual and definitional gaps

Empirical evidence and measurement

Technical architecture and implementation

Governance, policy, and legal reconciliation

Equity, ethics, and community participation

Operational feasibility and sustainability

Machine unlearning and model maintenance

User experience and behavior

Security and integrity of digital memory

Practical Applications

Immediate Applications

Long-Term Applications

Glossary

Open Problems

Continue Learning

Related Papers

Authors (3)

Collections

Tweets