Dice Question Streamline Icon: https://streamlinehq.com

Assess feasibility of UUID reconstruction from name-only prompts in YAGO biographies

Determine whether large language models pretrained on Hubble’s perturbed corpus can reconstruct the UUID attribute in synthetic YAGO biographies when prompted with only the person’s name, thereby establishing if generative reconstruction of unique identifiers is possible without auxiliary context.

Information Square Streamline Icon: https://streamlinehq.com

Background

To paper privacy risks, Hubble inserts synthetic biographies derived from YAGO with multiple personally identifiable attributes, including names, nationalities, birthplaces, emails, occupations, and UUIDs. The authors evaluate both infill (choice-based) and generative attacks to probe memorization of these attributes under varying duplication levels.

They observe distinct memorization patterns across PII types and report that while UUIDs can be selected correctly among candidates and generated with full-prefix prompts, generative reconstruction from a name-only prompt did not succeed, highlighting an unresolved capability boundary for data extraction under minimal context.

References

Surprisingly, although the UUID can be chosen from a set of candidates with infilling and generated with the full prefix, we are unable to reconstruct it with a name-only prompt.

Hubble: a Model Suite to Advance the Study of LLM Memorization (2510.19811 - Wei et al., 22 Oct 2025) in Appendix, Section: Privacy-specific Results, Subsubsection: Direct PII Leakage