Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Differentiable All-pole Filters for Time-varying Audio Systems (2404.07970v4)

Published 11 Apr 2024 in eess.AS, cs.LG, and cs.SD

Abstract: Infinite impulse response filters are an essential building block of many time-varying audio systems, such as audio effects and synthesisers. However, their recursive structure impedes end-to-end training of these systems using automatic differentiation. Although non-recursive filter approximations like frequency sampling and frame-based processing have been proposed and widely used in previous works, they cannot accurately reflect the gradient of the original system. We alleviate this difficulty by re-expressing a time-varying all-pole filter to backpropagate the gradients through itself, so the filter implementation is not bound to the technical limitations of automatic differentiation frameworks. This implementation can be employed within audio systems containing filters with poles for efficient gradient evaluation. We demonstrate its training efficiency and expressive capabilities for modelling real-world dynamic audio systems on a phaser, time-varying subtractive synthesiser, and compressor. We make our code and audio samples available and provide the trained audio effect and synth models in a VST plugin at https://diffapf.github.io/web/.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. “DDSP: Differentiable digital signal processing,” in International Conference on Learning Representations, 2020.
  2. “Singing voice synthesis using differentiable LPC and glottal-flow-inspired wavetables,” in Proc. International Society for Music Information Retrieval, 2023, pp. 667–675.
  3. “Lightweight and interpretable neural modeling of an audio distortion effect using hyperconditioned differentiable biquads,” in ICASSP. IEEE, 2021, pp. 890–894.
  4. “Style transfer of audio effects with differentiable signal processing,” Journal of the Audio Engineering Society, vol. 70, no. 9, pp. 708–721, 2022.
  5. “Grey-box modelling of dynamic range compression,” in DAFx, 2022, pp. 304–311.
  6. “GELP: GAN-excited liner prediction for speech synthesis from mel-spectrogram,” in Proc. INTERSPEECH, 2019, pp. 694–698.
  7. “Differentiable grey-box modelling of phaser effects using frame-based spectral processing,” in DAFx, 2023.
  8. “Optimization of cascaded parametric peak and shelving filters with backpropagation algorithm,” in DAFx, 2020, pp. 101–108.
  9. Shahan Nercessian, “Neural parametric equalizer matching using differentiable biquads,” in DAFx, 2020, pp. 265–272.
  10. “Direct design of biquad filter cascades with deep learning by sampling random polynomials,” in ICASSP. IEEE, 2022, pp. 3104–3108.
  11. “Joint estimation of fader and equalizer gains of DJ mixers using convex optimization,” in DAFx, 2022, pp. 312–319.
  12. J. O. Smith III, Spectral Audio Signal Processing, http://ccrma.stanford.edu/~jos/sasp/, accessed 27/3/24, online book, 2011 edition.
  13. Linear Prediction of Speech, vol. 12 of Communication and Cybernetics, Springer, Berlin, Heidelberg, 1976.
  14. “LPCNet: Improving neural speech synthesis through linear prediction,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2019, pp. 5891–5895.
  15. “SFNet: A computationally efficient source filter model based neural speech synthesis,” IEEE Signal Processing Letters, vol. 27, pp. 1170–1174, 2020.
  16. “ExcitGlow: Improving a WaveGlow-based neural vocoder with linear prediction analysis,” in Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 2020, pp. 831–836.
  17. Udo Zolzer, DAFX: Digital Audio Effects, chapter Nonlinear Processing, pp. 110–112, John Wiley & Sons, 2011.
  18. “Approximating ballistics in a differentiable dynamic range compressor,” in Audio Engineering Society Convention 153. Audio Engineering Society, 2022.
  19. “DENT-DDSP: Data-efficient noisy speech generator using differentiable digital signal processors for explicit distortion modelling and noise-robust speech recognition,” in Proc. INTERSPEECH, 2022, pp. 3799–3803.
  20. “dynoNet: A neural network architecture for learning dynamical systems,” International Journal of Adaptive Control and Signal Processing, vol. 35, no. 4, pp. 612–626, 2021.
  21. J. O. Smith III, Physical Audio Signal Processing, http://ccrma.stanford.edu/~jos/pasp/, accessed 28/2/23, online book, 2010 edition.
  22. “Sinusoidal frequency estimation by gradient descent,” in ICASSP. IEEE, 2023, pp. 1–5.
  23. “Time-variant gray-box modeling of a phaser pedal,” in DAFx, 2016, pp. 31–38.
  24. “One billion audio sounds from GPU-enabled modular synthesis,” in DAFx, 2021, pp. 222–229.
  25. J. Sleep, “Small Stone Information [Online],” http://generalguitargadgets.com/effects-projects/phase-shifters/small-stone-information/, accessed 26/3/23.
  26. “Modulation extraction for LFO-driven audio effects,” in DAFx, 2023.
  27. “Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram,” in ICASSP. IEEE, 2020, pp. 6199–6203.
  28. “Fréchet audio distance: A reference-free metric for evaluating music enhancement algorithms,” in Proc. INTERSPEECH, 2019, pp. 2350–2354.
  29. “Profiling audio compressors with deep neural networks,” in Audio Engineering Society Convention 147. Audio Engineering Society, 2019.

Summary

  • The paper presents a novel backpropagation algorithm for time-varying all-pole filters that integrates recursive audio models into end-to-end learning frameworks.
  • It computes exact gradients using time-domain reversing operations and custom filtering to overcome limitations of traditional frame-based and frequency domain methods.
  • The approach accurately models phaser, synthesizer, and compressor systems, demonstrating improved numerical results and real-time operational viability.

Efficient Differentiable Time-Varying All-Pole Filters for Audio Modeling

Introduction

Infinite Impulse Response (IIR) filters are pivotal in constructing various time-varying audio processing systems, such as phasers, synthesizers, and dynamics processors. Traditional approaches for integrating recursive IIR filters into differentiable signal processing pipelines have faced numerous challenges primarily due to their recursive nature, which is inherently difficult to reconcile with the requirements of automatic differentiation systems. Prior solutions often resorted to frequency domain evaluations or frame-based approximations, which introduce their own set of limitations, including potential artifacts and generalization issues when applied in a real-time, sample-by-sample context. In contrast, this work presents an innovative approach by developing an efficient backpropagation algorithm for a time-varying all-pole filter, overcoming the historical impediments associated with recursive filter structures in end-to-end training contexts.

Proposed Methodology

The core contribution of this paper is the derivation and implementation of a backpropagation algorithm for time-varying all-pole filters that can be incorporated into differentiable digital signal processing workflows without approximation. This approach, termed as the time-domain (TD) method, allows for the exact gradients for each filter parameter to be efficiently computed, facilitating the end-to-end training of systems with recursive elements. The methodology fundamentally reimagines the backpropagation through recursive structures by utilizing reversing operations and custom filtering processes, thereby enabling the use of these filters in a wide array of audio applications with improved forward and backpropagation speed.

Applications and Results

The effectiveness of the proposed time-varying all-pole filter implementation is demonstrated through its application to three typical audio processing systems: a phaser, a time-varying subtractive synthesizer, and a feed-forward compressor. The results underscore the method's ability to accurately and efficiently model complex time-varying behaviors characteristic of analog audio devices. Notably, the systems trained using the proposed approach can seamlessly transition to real-time operation without suffering from the generalization issues often observed with frequency domain or frame-based approximations.

For each application, the training process resulted in models that closely approximated the behavior of the target analog systems, evidencing strong numerical results and significant improvements over traditional approaches. Particularly, the phaser and synthesizer models highlighted the method's capacity for capturing nuanced dynamical changes over time, a critical aspect often lost in frequency-based approximations.

Implications and Future Directions

The implications of this research extend beyond the immediate enhancements in modeling accuracy and computational efficiency for time-varying audio effects. By providing a means to integrate recursive filters directly into differentiable learning frameworks, this work opens new avenues for the exploration of complex audio processing systems within an end-to-end learning context. Future developments might include extending the approach to a broader range of filter types or exploring its application in other domains where recursive signal processing is prevalent.

Moreover, the successful application of the proposed method in an end-to-end trainable plugin format demonstrates its practical viability and sets the stage for further innovations in the development of highly realistic, learnable audio effects and synthesizers. As the field continues to evolve, the intertwining of deep learning techniques with traditional signal processing methodologies promises to yield increasingly sophisticated and versatile audio processing tools.

Conclusion

This paper represents a significant step forward in integrating recursive audio processing structures within end-to-end trainable frameworks. The proposed efficient backpropagation algorithm for time-varying all-pole filters addresses longstanding challenges and opens up new possibilities for modeling and designing advanced audio effects and systems. As the boundaries between traditional signal processing and machine learning continue to blur, the methodologies presented in this work will likely play a foundational role in shaping the future of audio processing research and development.