Feasibility of defenses that hide the memorization signature without harming utility
Determine whether any training-time modifications can conceal the architecture-invariant memorization signature from membership inference attacks without degrading model performance, or whether there exists a fundamental tradeoff between learning utility and leakage of training data information in gradient-based training of language models.
References
An open question is whether training modifications can hide the signature without destroying model performance, or whether this represents a fundamental tradeoff between learning from data and leaking information about that data.
— Learning the Signature of Memorization in Autoregressive Language Models
(2604.03199 - Ilić et al., 3 Apr 2026) in Discussion, Defense Implications subsection