Inference-Aware & Privacy-Preserving Deletion in Databases

Published 31 Mar 2026 in cs.DB | (2604.00326v1)

Abstract: Deletion is a fundamental database operation, yet modern systems often fail to provide the privacy guarantee that users expect from it. A deleted value may disappear from query results and even from physical storage, yet remain inferable from dependencies, derived data, or traces exposed by the deletion event itself. Meaningful deletion, therefore, requires more than logical removal or physical erasure; it requires a privacy guarantee that limits what remains inferable after deletion. In this paper, we take an inference-centric view of deletion, focusing on two leakage channels: leakage from the post-deletion state and leakage from the deletion pattern itself. We use this lens to distinguish logical, physical, and semantic deletion, organize the design space of deletion operations, and highlight open research challenges for building deletion mechanisms with meaningful privacy guarantees in database systems.

Abstract PDF Upgrade to Chat

Authors (6)

Summary

The paper demonstrates that standard deletion methods leave residual inferential traces due to functional dependencies and derived artifacts.
It proposes a formal framework that quantifies inference risks by modeling both the visible state and deletion patterns within dependency constraints.
The study outlines a design space advocating for calibrated deletion strategies and machine-checkable attestations to meet evolving privacy regulations.

Inference-Centric, Privacy-Preserving Deletion in Databases

Motivation and Problem Scope

The paper "Inference-Aware & Privacy-Preserving Deletion in Databases" (2604.00326) critiques the prevailing treatment of deletion as a storage or query-removal primitive and asserts the necessity for an inference-centric semantic foundation. It demonstrates that the presence of functional, conditional, and statistical dependencies, as well as derived artifacts (e.g., materialized views, ML models), ensures that even after logical or physical deletion, inferential support for the deleted value often persists in the database. The authors systematically dissect two principal leakage channels: residual-state (information inferable from the remaining data and artifacts) and pattern-leakage (information leaked through the auxiliary deletions or traces generated by deletion actions themselves).

This view is motivated by regulatory obligations (GDPR, CCPA, LGPD, etc.) and widely held user expectations, elevating deletion guarantees from operational hygiene to core privacy requirements. The paper makes it explicit that existing DBMS designs, including robust logical (SQL DELETE, cascading deletes, TTL expiry) and physical deletion (VACUUM, compaction, purging), do not soundly map to privacy guarantees against inference by adversaries with auxiliary knowledge of the schema, dependencies, and system observability.

The Taxonomy of Deletion Semantics

The authors decompose the deletion problem into three layers:

Logical deletion: Removes target rows/tuples from the visible database state but does not control for information remaining in dependencies or derived artifacts (e.g., PostgreSQL DELETE semantics).
Physical deletion: Eventually reclaims the underlying storage (e.g., PostgreSQL VACUUM, LSM compaction), but this does not suppress the inferential signals left in non-updated artifacts and surviving data. Delays between logical and physical deletion create a timeline in which exposure persists.
Semantic (inference-aware) deletion: Targets the quantification and control of post-deletion inference, seeking to ensure that the adversary’s belief about the deleted value does not increase due to observable state alterations (including deletion-induced actions).

Residual-state leakage is persistent where even after the cell is set to NULL or the tuple is deleted, dependencies permit the almost-certain reconstruction of the deleted value (e.g., inferring SalaryBand via Level and Manager dependencies). Pattern-leakage is encountered when the set of auxiliary deletions needed to enforce privacy is itself informative (e.g., selective column nullification reveals which dependency path was predictive).

Formal Framework for Post-Deletion Inference

The proposed model explicitly axiomatizes observables $(V, P)$ , where $V$ is the post-deletion visible state (tuples, views, outputs), and $P$ is the pattern of deletions or related traces that an adversary can exploit. The adversary’s inference is characterized via priors and posteriors ( $\pi$ , $\pi'$ ), with guarantees aimed at bounding the posterior update resulting from $(V,P)$ , often referenced against pre-insertion baseline knowledge. The semantics lens $\mathcal{M}$ (dependency model, constraints, or causal structure) determines which inference paths are covered by the privacy contract and highlights explicit versus slack (unmodeled) inference risk.

This model generalizes classical information flow control and statistical inference control frameworks ([needham-inference-control], [adam89-statdb]), allowing for explicit cost/utility-leakage trade-offs via targeted auxiliary deletion or artifact invalidation. The authors advocate for bounded-leakage contracts: semantics where the DBMS ensures that the increase in adversarial confidence is upper bounded, rather than guaranteeing complete erasure (which is often infeasible in dependency-rich environments).

Design Space and Open Research Directions

The paper presents a design space for deletion mechanisms, mapping existing and emerging systems along multiple dimensions:

Rollback Objective: Does deletion aim for pre-insertion equilibrium or weaker rollback targets?
Observables Controlled: Does the mechanism only suppress direct access, or does it also account for leakage via patterns and traces?
Semantics Model: Does the contract cover all dependencies (ideal), or only a monitored subset (model-relative), with unmodeled slack?
Recoverability, Expiry, Implementation Layer: Is deletion reversible, scheduled on-demand, carried out at storage or application layer, and how are maintenance traces exposed?

Most current systems (see DELF, K9db, Lethe) only partially address these axes, tending to over-delete (removing large volumes of related data/artifacts) to avoid residual inference, but rarely offering fine-grained, certified privacy cost-utility trade-offs. The taxonomy identifies significant open technical gaps:

Joint control of residual state and deletion pattern/leakage (Areas 1,2): Ensuring privacy does not come only via aggressive deletion (which destroys utility) but through calibrated pattern management and possibly cover-action injection.
Weighted Dependency Models (Area 3): Moving beyond binary dependency enforcement (where dependencies are either assumed deterministic or disregarded), towards a paradigm where leakage is quantitatively attributed (e.g., probabilistic, learned dependencies).
Evolution and Attestation: As data, dependencies, and artifact topology evolve (dynamic models, new constraints, new derived artifacts), privacy guarantees must be dynamically maintained and composed across system components. The authors call for machine-checkable, composable deletion attestations stating both timeliness and bounding inferential exposure.

Theoretical and Practical Implications

This work supports a shift from best-effort, reactive deletion to proactive, certified, inference-aware privacy mechanisms—aligning technical implementation with the semantics of regulatory and user privacy demands. It points out that reasoning about privacy-preserving deletion in data systems is fundamentally a form of inference control, with rich connections to differential privacy (DP), information flow, and security-by-design, but must also account for database-specific phenomena: dependencies, derived artifacts, and multi-layered observability surfaces.

Practically, realization of these semantics demands advances in:

Automated discovery and maintenance of dependency models (including soft, contextual, and data-driven constraints).
Efficient inference-aware deletion and auxiliary mitigation planning, minimizing utility loss (cf. [Chakraborty2024], [Makhija2025]).
End-to-end attestation languages and protocols, certifying to both users and regulators the semantic privacy of deletions.

Theoretically, the extension to weighted, probabilistic, and workload-scoped dependency contracts raises new complexity-theoretic, learning-theoretic, and compositionality questions. Managing deletion in an ecosystem with evolving, correlated artifacts and adversarial observability represents a significant new research frontier.

Conclusion

The paper reframes deletion from a physical/logical operation to a semantic policy problem, requiring precise modeling of inference risks and explicit, auditable guarantees. This perspective extends and sharpens privacy management in modern database systems by integrating inference control, artifact-aware dependency management, and compliance-oriented attestation. As dependency discovery, ML integration, and data regulation all intensify, the authors’ framework points toward the necessary architectural underpinnings for future deletion-compliant, privacy-centric DBMS design, and highlights critical, unsolved technical challenges for the research community (2604.00326).

Markdown Report Issue