Minimum-variance unbiased estimator for mean under synthetic contamination

Characterize the minimum-variance unbiased estimator for the mean μ of an arbitrary d-dimensional distribution in the synthetic contamination model where, at each round t, the observed average X_t equals α Y_{t-1} + (1−α) μ + U_t with zero-mean noise U_t, and the estimator Y_t is formed from past observations via nonuniform cross-round weights.

Background

The paper analyzes mean estimation when each round’s data is partially contaminated by synthetic examples generated from the previous estimate, parameterized by contamination rate α. It gives an exact variance formula for uniform weighting and shows this estimator is generally not MVUE, including for distributions where the sample mean is MVUE in i.i.d. settings.

The authors explicitly identify fully characterizing the MVUE in this contamination framework as an open problem, beyond the demonstrated suboptimality of uniform weighting and specific improvements via alternative weighting schemes.

References

Interesting open problems for mean estimation include fully characterizing the minimum variance unbiased estimator, and allowing the mean to depend on a vector of covariates instead of remaining fixed in every round.

Learning from Synthetic Data: Limitations of ERM  (2601.15468 - Amin et al., 21 Jan 2026) in Conclusion