Feasibility of defenses that hide the memorization signature without harming utility

Determine whether any training-time modifications can conceal the architecture-invariant memorization signature from membership inference attacks without degrading model performance, or whether there exists a fundamental tradeoff between learning utility and leakage of training data information in gradient-based training of language models.

Background

The authors show that membership inference signals transfer across disparate architectures, implying the vulnerability is tied to gradient-based optimization rather than model design, which undermines architecture-specific defenses.

While differential privacy provides formal protections, it significantly reduces utility at practical noise levels. The open question is whether alternative training modifications can effectively mask the signature without similar performance costs, or if a fundamental privacy–utility tradeoff is unavoidable.

References

An open question is whether training modifications can hide the signature without destroying model performance, or whether this represents a fundamental tradeoff between learning from data and leaking information about that data.

Learning the Signature of Memorization in Autoregressive Language Models  (2604.03199 - Ilić et al., 3 Apr 2026) in Discussion, Defense Implications subsection