Why Heuristic Weighting Works: A Theoretical Analysis of Denoising Score Matching (2508.01597v1)
Abstract: Score matching enables the estimation of the gradient of a data distribution, a key component in denoising diffusion models used to recover clean data from corrupted inputs. In prior work, a heuristic weighting function has been used for the denoising score matching loss without formal justification. In this work, we demonstrate that heteroskedasticity is an inherent property of the denoising score matching objective. This insight leads to a principled derivation of optimal weighting functions for generalized, arbitrary-order denoising score matching losses, without requiring assumptions about the noise distribution. Among these, the first-order formulation is especially relevant to diffusion models. We show that the widely used heuristical weighting function arises as a first-order Taylor approximation to the trace of the expected optimal weighting. We further provide theoretical and empirical comparisons, revealing that the heuristical weighting, despite its simplicity, can achieve lower variance than the optimal weighting with respect to parameter gradients, which can facilitate more stable and efficient training.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.