Lightweight observational-setting provenance tests with comparable token efficiency
Design more computationally lightweight independence tests for attributing text to a specific training run in the observational setting—where only generated text is available—than the shuffled-transcript retraining test statistic obs (Algorithm “Training models on shuffled transcript”), while achieving similar token complexity (i.e., requiring a comparable number of observed tokens from the generated text to attain high power).
References
Particularly in the observational setting, where Alice cannot simply spend more queries to obtain more tokens, designing more lightweight tests than $obs$ with similar token complexity is an important open problem.
— Blackbox Model Provenance via Palimpsestic Membership Inference
(2510.19796 - Kuditipudi et al., 22 Oct 2025) in Section 5 (Discussion)