Ground-truth prevalence of linkable cases in modern text-rich traces

Determine the total number of truly linkable cases in the modern text-rich trace settings examined—specifically the Anthropic Interviewer dataset and anonymized ChatGPT conversation logs—to enable a well-defined population-level linkage success rate.

Background

In modern digital trace settings, the agent retrieves auxiliary evidence during the run, making linkage open-ended and dependent on publicly available corroboration.

Because the true number of linkable instances is not known, the authors report Confirmed Linkage Count (CLC) rather than a population-level success rate, explicitly noting the unknown denominator.

References

As in the AOL case study in Section~\ref{sec:aol_exp}, the total number of truly linkable cases in these modern trace settings is unknown, so a population-level linkage success rate is not well defined.

From Weak Cues to Real Identities: Evaluating Inference-Driven De-Anonymization in LLM Agents  (2603.18382 - Ko et al., 19 Mar 2026) in Subsection 7.3 (Evaluation Metric)