Frequency estimation via spectrogram-based losses using gradient descent

Determine whether frequency estimation from audio signals can be achieved reliably by gradient-descent optimization that minimizes spectrogram-based loss functions in differentiable synthesizer sound matching, and identify the conditions under which such optimization converges to accurate frequency values.

Background

The paper introduces DiffMoog, a differentiable modular synthesizer and an accompanying sound matching platform optimized via spectral losses. A critical sub-task for sound matching is estimating frequency (pitch) directly through gradient descent on spectrogram-based distances.

The authors note, in agreement with prior work highlighting limitations of spectrally-based audio distances for pitch, that this frequency estimation sub-task remains open. Previous studies have often circumvented the issue by imposing assumptions or using external pitch estimators such as CREPE, whereas the present work does not employ such techniques, underscoring the unresolved nature of gradient-based frequency estimation with spectrogram losses.

References

We note that in the context of sound matching, the sub-task of frequency estimation through gradient descent techniques via minimizing spectrogram-based losses is an intrinsic challenge that remains open, as we discovered through our own experimentation.

— DiffMoog: a Differentiable Modular Synthesizer for Sound Matching (2401.12570 - Uzrad et al., 2024) in Section 1 (Introduction)

Frequency estimation via spectrogram-based losses using gradient descent

Background

References

Related Problems