Differentiable All-pole Filters for Time-varying Audio Systems (2404.07970v4)
Abstract: Infinite impulse response filters are an essential building block of many time-varying audio systems, such as audio effects and synthesisers. However, their recursive structure impedes end-to-end training of these systems using automatic differentiation. Although non-recursive filter approximations like frequency sampling and frame-based processing have been proposed and widely used in previous works, they cannot accurately reflect the gradient of the original system. We alleviate this difficulty by re-expressing a time-varying all-pole filter to backpropagate the gradients through itself, so the filter implementation is not bound to the technical limitations of automatic differentiation frameworks. This implementation can be employed within audio systems containing filters with poles for efficient gradient evaluation. We demonstrate its training efficiency and expressive capabilities for modelling real-world dynamic audio systems on a phaser, time-varying subtractive synthesiser, and compressor. We make our code and audio samples available and provide the trained audio effect and synth models in a VST plugin at https://diffapf.github.io/web/.
- “DDSP: Differentiable digital signal processing,” in International Conference on Learning Representations, 2020.
- “Singing voice synthesis using differentiable LPC and glottal-flow-inspired wavetables,” in Proc. International Society for Music Information Retrieval, 2023, pp. 667–675.
- “Lightweight and interpretable neural modeling of an audio distortion effect using hyperconditioned differentiable biquads,” in ICASSP. IEEE, 2021, pp. 890–894.
- “Style transfer of audio effects with differentiable signal processing,” Journal of the Audio Engineering Society, vol. 70, no. 9, pp. 708–721, 2022.
- “Grey-box modelling of dynamic range compression,” in DAFx, 2022, pp. 304–311.
- “GELP: GAN-excited liner prediction for speech synthesis from mel-spectrogram,” in Proc. INTERSPEECH, 2019, pp. 694–698.
- “Differentiable grey-box modelling of phaser effects using frame-based spectral processing,” in DAFx, 2023.
- “Optimization of cascaded parametric peak and shelving filters with backpropagation algorithm,” in DAFx, 2020, pp. 101–108.
- Shahan Nercessian, “Neural parametric equalizer matching using differentiable biquads,” in DAFx, 2020, pp. 265–272.
- “Direct design of biquad filter cascades with deep learning by sampling random polynomials,” in ICASSP. IEEE, 2022, pp. 3104–3108.
- “Joint estimation of fader and equalizer gains of DJ mixers using convex optimization,” in DAFx, 2022, pp. 312–319.
- J. O. Smith III, Spectral Audio Signal Processing, http://ccrma.stanford.edu/~jos/sasp/, accessed 27/3/24, online book, 2011 edition.
- Linear Prediction of Speech, vol. 12 of Communication and Cybernetics, Springer, Berlin, Heidelberg, 1976.
- “LPCNet: Improving neural speech synthesis through linear prediction,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2019, pp. 5891–5895.
- “SFNet: A computationally efficient source filter model based neural speech synthesis,” IEEE Signal Processing Letters, vol. 27, pp. 1170–1174, 2020.
- “ExcitGlow: Improving a WaveGlow-based neural vocoder with linear prediction analysis,” in Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 2020, pp. 831–836.
- Udo Zolzer, DAFX: Digital Audio Effects, chapter Nonlinear Processing, pp. 110–112, John Wiley & Sons, 2011.
- “Approximating ballistics in a differentiable dynamic range compressor,” in Audio Engineering Society Convention 153. Audio Engineering Society, 2022.
- “DENT-DDSP: Data-efficient noisy speech generator using differentiable digital signal processors for explicit distortion modelling and noise-robust speech recognition,” in Proc. INTERSPEECH, 2022, pp. 3799–3803.
- “dynoNet: A neural network architecture for learning dynamical systems,” International Journal of Adaptive Control and Signal Processing, vol. 35, no. 4, pp. 612–626, 2021.
- J. O. Smith III, Physical Audio Signal Processing, http://ccrma.stanford.edu/~jos/pasp/, accessed 28/2/23, online book, 2010 edition.
- “Sinusoidal frequency estimation by gradient descent,” in ICASSP. IEEE, 2023, pp. 1–5.
- “Time-variant gray-box modeling of a phaser pedal,” in DAFx, 2016, pp. 31–38.
- “One billion audio sounds from GPU-enabled modular synthesis,” in DAFx, 2021, pp. 222–229.
- J. Sleep, “Small Stone Information [Online],” http://generalguitargadgets.com/effects-projects/phase-shifters/small-stone-information/, accessed 26/3/23.
- “Modulation extraction for LFO-driven audio effects,” in DAFx, 2023.
- “Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram,” in ICASSP. IEEE, 2020, pp. 6199–6203.
- “Fréchet audio distance: A reference-free metric for evaluating music enhancement algorithms,” in Proc. INTERSPEECH, 2019, pp. 2350–2354.
- “Profiling audio compressors with deep neural networks,” in Audio Engineering Society Convention 147. Audio Engineering Society, 2019.