Foundational challenge of quantifying privacy in textual data

Determine a unified, context-aware definition and operational methodology for quantifying privacy sensitivity in textual data, addressing the lack of a unified definition and the inherently contextual nature of privacy that currently impede consistent measurement and evaluation.

Background

The paper motivates its approach by noting that privacy in text lacks a unified definition and is highly contextual, which complicates measurement. While formal frameworks like differential privacy address specific threat models, they do not capture human-perceived sensitivity across diverse contexts. This foundational gap underpins the need for human-aligned evaluators and motivates the authors’ distillation-based method.

Despite proposing a practical evaluator distilled from Mistral Large 3 and validated against human annotations, the broader conceptual issue of how to define and quantify privacy in text in a unified, context-sensitive way remains unresolved.

References

Quantifying privacy in textual data remains an open challenge due to the absence of a unified definition and the inherently contextual nature of privacy \citep{bambauer2022privacy, tesfay2016challenges}.

Distilling Human-Aligned Privacy Sensitivity Assessment from Large Language Models  (2603.29497 - Loiseau et al., 31 Mar 2026) in Introduction, first paragraph