Fine-grained alignment to distinguish malicious re-identification from legitimate analysis

Develop fine-grained alignment mechanisms for LLM-based agents that can distinguish malicious re-identification attempts from legitimate cross-source analytical reasoning, thereby addressing inference-driven linkage without unduly suppressing benign analysis.

Background

The paper evaluates mitigation via a privacy-aware system prompt and finds a concrete privacy–utility trade-off: linkage risk can be substantially reduced, but some models exhibit over-refusal that harms legitimate cross-source reasoning and task performance.

Motivated by this trade-off, the authors explicitly state that creating finer-grained alignment capable of differentiating harmful re-identification from acceptable analysis remains unresolved.

References

More fine-grained alignment mechanisms that distinguish malicious re-identification from legitimate analytical reasoning remain an open challenge.

From Weak Cues to Real Identities: Evaluating Inference-Driven De-Anonymization in LLM Agents  (2603.18382 - Ko et al., 19 Mar 2026) in Section 6 (Limitations)