Principled noise and data-augmentation schedules for denoising autoencoders

Develop theoretically grounded schedules for the input corruption noise level over training time and for data augmentation—specifically, the number of independent noisy realizations per clean input—when training denoising autoencoders, with the objective of minimizing the final mean squared reconstruction error at a specified test noise level.

Background

The paper analyzes denoising autoencoders using a high-dimensional teacher–student framework and demonstrates that optimized noise and batch-augmentation schedules can substantially improve performance over constant heuristics. Despite empirical practices such as scheduled denoising and non-uniform sampling in diffusion models, a principled theoretical derivation of optimal noise and augmentation schedules has been lacking.

This problem calls for formal development and justification of optimal corruption-level and augmentation schedules tailored to denoising autoencoders, connecting the schedule to data statistics, architecture, and training dynamics to guarantee improved reconstruction performance.

References

However, identifying principled noise schedules and data augmentation strategies remains largely an open problem.

— A statistical physics framework for optimal learning (2507.07907 - Mignacco et al., 10 Jul 2025) in Section 4.3 (Denoising autoencoder)

Principled noise and data-augmentation schedules for denoising autoencoders

Background

References

Related Problems