Differentiable All-pole Filters for Time-varying Audio Systems (2404.07970v4)

Published 11 Apr 2024 in eess.AS, cs.LG, and cs.SD

Abstract: Infinite impulse response filters are an essential building block of many time-varying audio systems, such as audio effects and synthesisers. However, their recursive structure impedes end-to-end training of these systems using automatic differentiation. Although non-recursive filter approximations like frequency sampling and frame-based processing have been proposed and widely used in previous works, they cannot accurately reflect the gradient of the original system. We alleviate this difficulty by re-expressing a time-varying all-pole filter to backpropagate the gradients through itself, so the filter implementation is not bound to the technical limitations of automatic differentiation frameworks. This implementation can be employed within audio systems containing filters with poles for efficient gradient evaluation. We demonstrate its training efficiency and expressive capabilities for modelling real-world dynamic audio systems on a phaser, time-varying subtractive synthesiser, and compressor. We make our code and audio samples available and provide the trained audio effect and synth models in a VST plugin at https://diffapf.github.io/web/.

References (29)

Summary

The paper presents a novel backpropagation algorithm for time-varying all-pole filters that integrates recursive audio models into end-to-end learning frameworks.
It computes exact gradients using time-domain reversing operations and custom filtering to overcome limitations of traditional frame-based and frequency domain methods.
The approach accurately models phaser, synthesizer, and compressor systems, demonstrating improved numerical results and real-time operational viability.

Efficient Differentiable Time-Varying All-Pole Filters for Audio Modeling

Introduction

Infinite Impulse Response (IIR) filters are pivotal in constructing various time-varying audio processing systems, such as phasers, synthesizers, and dynamics processors. Traditional approaches for integrating recursive IIR filters into differentiable signal processing pipelines have faced numerous challenges primarily due to their recursive nature, which is inherently difficult to reconcile with the requirements of automatic differentiation systems. Prior solutions often resorted to frequency domain evaluations or frame-based approximations, which introduce their own set of limitations, including potential artifacts and generalization issues when applied in a real-time, sample-by-sample context. In contrast, this work presents an innovative approach by developing an efficient backpropagation algorithm for a time-varying all-pole filter, overcoming the historical impediments associated with recursive filter structures in end-to-end training contexts.

Proposed Methodology

The core contribution of this paper is the derivation and implementation of a backpropagation algorithm for time-varying all-pole filters that can be incorporated into differentiable digital signal processing workflows without approximation. This approach, termed as the time-domain (TD) method, allows for the exact gradients for each filter parameter to be efficiently computed, facilitating the end-to-end training of systems with recursive elements. The methodology fundamentally reimagines the backpropagation through recursive structures by utilizing reversing operations and custom filtering processes, thereby enabling the use of these filters in a wide array of audio applications with improved forward and backpropagation speed.

Applications and Results

The effectiveness of the proposed time-varying all-pole filter implementation is demonstrated through its application to three typical audio processing systems: a phaser, a time-varying subtractive synthesizer, and a feed-forward compressor. The results underscore the method's ability to accurately and efficiently model complex time-varying behaviors characteristic of analog audio devices. Notably, the systems trained using the proposed approach can seamlessly transition to real-time operation without suffering from the generalization issues often observed with frequency domain or frame-based approximations.

For each application, the training process resulted in models that closely approximated the behavior of the target analog systems, evidencing strong numerical results and significant improvements over traditional approaches. Particularly, the phaser and synthesizer models highlighted the method's capacity for capturing nuanced dynamical changes over time, a critical aspect often lost in frequency-based approximations.

Implications and Future Directions

The implications of this research extend beyond the immediate enhancements in modeling accuracy and computational efficiency for time-varying audio effects. By providing a means to integrate recursive filters directly into differentiable learning frameworks, this work opens new avenues for the exploration of complex audio processing systems within an end-to-end learning context. Future developments might include extending the approach to a broader range of filter types or exploring its application in other domains where recursive signal processing is prevalent.

Moreover, the successful application of the proposed method in an end-to-end trainable plugin format demonstrates its practical viability and sets the stage for further innovations in the development of highly realistic, learnable audio effects and synthesizers. As the field continues to evolve, the intertwining of deep learning techniques with traditional signal processing methodologies promises to yield increasingly sophisticated and versatile audio processing tools.

Conclusion

This paper represents a significant step forward in integrating recursive audio processing structures within end-to-end trainable frameworks. The proposed efficient backpropagation algorithm for time-varying all-pole filters addresses longstanding challenges and opens up new possibilities for modeling and designing advanced audio effects and systems. As the boundaries between traditional signal processing and machine learning continue to blur, the methodologies presented in this work will likely play a foundational role in shaping the future of audio processing research and development.

PDF Markdown

Related Papers

Tweets

https://twitter.com/csteinmetz1/status/1778645434090533244

https://twitter.com/yoyololicon/status/1779060618982502454

https://twitter.com/yoyololicon/status/1778744219202424957

https://twitter.com/benhayesmusic/status/1778784851937591615

https://twitter.com/ArxivSound/status/1778635033995010395

https://twitter.com/fly51fly/status/1779512804074258460