DiffMoog: a Differentiable Modular Synthesizer for Sound Matching (2401.12570v1)
Abstract: This paper presents DiffMoog - a differentiable modular synthesizer with a comprehensive set of modules typically found in commercial instruments. Being differentiable, it allows integration into neural networks, enabling automated sound matching, to replicate a given audio input. Notably, DiffMoog facilitates modulation capabilities (FM/AM), low-frequency oscillators (LFOs), filters, envelope shapers, and the ability for users to create custom signal chains. We introduce an open-source platform that comprises DiffMoog and an end-to-end sound matching framework. This framework utilizes a novel signal-chain loss and an encoder network that self-programs its outputs to predict DiffMoogs parameters based on the user-defined modular architecture. Moreover, we provide insights and lessons learned towards sound matching using differentiable synthesis. Combining robust sound capabilities with a holistic platform, DiffMoog stands as a premier asset for expediting research in audio synthesis and machine learning.
- M. Russ, Sound synthesis and sampling, Taylor & Francis, 2004.
- “Inversynth: Deep estimation of synthesizer parameter configurations from audio signals,” IEEE/ACM Trans. on Audio, Speech, and Language Processing, vol. 27, no. 11, pp. 2385–2396, 2019.
- “Universal audio synthesizer control with normalizing flows,” in International Conference on Digital Audio Effects (DaFX 2019), Birmingham, United Kingdom, Sept. 2019, DaFX 2019.
- “Automatic programming of vst sound synthesizers using deep networks and other techniques,” IEEE Trans. Emerg. Topics Comput. Intell., vol. 2, no. 2, pp. 150–159, April 2018.
- “Ddsp: Differentiable digital signal processing,” in International Conference on Learning Representations, 2020.
- “Synthesizer sound matching with differentiable dsp.,” in ISMIR, 2021, pp. 428–434.
- “Ddx7: Differentiable fm synthesis of musical instrument sounds,” arXiv preprint arXiv:2208.06169, 2022.
- J. Turian and M. Henry, “I’m sorry for your loss: spectrally-based audio distances are bad at pitch,” arXiv preprint, vol. arXiv:2012.04572, December 2020, Published in I Can’t Believe It’s Not Better! (ICBINB) NeurIPS 2020 Workshop.
- “CREPE: A convolutional representation for pitch estimation,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 2018, IEEE, pp. 161–165.
- K. Itoyama and H. G. Okuno, “Parameter estimation of virtual musical instrument synthesizers,” in Proc. 40th Int. Comput. Music Conf., 2014, pp. 1426–1431.
- “Deep synthesizer parameter estimation,” in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019, pp. 3887–3891.
- “Automatic synthesizer preset generation with presetgen,” Journal of New Music Research, vol. 45, no. 2, pp. 124–144, 2016.
- “Neural audio synthesis of musical notes with wavenet autoencoders,” in Proc. of the 34th Int. Conf. on Machine Learning, 2017, pp. 1068–1077.
- “SING: Symbol-to-instrument neural generator,” in Advances in Neural Information Processing Systems, 2018, pp. 9041–9051.
- “Gansynth: Adversarial neural audio synthesis,” arXiv preprint arXiv:1902.08710, 2019.
- “Diffwave: A versatile diffusion model for audio synthesis,” arXiv preprint arXiv:2009.09761, 2020.
- “Rave: A variational autoencoder for fast and high-quality neural audio synthesis,” arXiv preprint arXiv:2111.05011, 2021.
- “Improving synthesizer programming from variational autoencoders latent space,” in 2021 24th International Conference on Digital Audio Effects (DAFx). IEEE, 2021, pp. 276–283.
- “Wgansing: A multi-voice singing voice synthesizer based on the wasserstein-gan,” in 2019 27th European signal processing conference (EUSIPCO). IEEE, 2019, pp. 1–5.
- “Noise2music: Text-conditioned music generation with diffusion models,” arXiv preprint arXiv:2302.03917, 2023.
- “Differentiable wavetable synthesis,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 4598–4602.
- “Differentiable iir filters for machine learning applications,” in Proc. Int. Conf. Digital Audio Effects (eDAFx-20), 2020, pp. 297–303.
- “Differentiable signal processing with black-box audio effects,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 66–70.
- “Automatic multitrack mixing with a differentiable mixing console of neural audio effects,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 71–75.
- “Style transfer of audio effects with differentiable signal processing,” arXiv preprint arXiv:2207.08759, 2022.
- “Automatic dj transitions with differentiable audio effects and generative adversarial networks,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 466–470.
- N. Masuda and D. Saito, “Improving semi-supervised differentiable synthesizer sound matching for practical applications,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 863–875, 2023.
- J. M. Chowning, “The synthesis of complex audio spectra by means of frequency modulation,” Journal of the Audio Engineering Society, vol. 21, no. 7, pp. 526–534, 1973.
- M. Lavengood, “What makes it sound ’80s? the yamaha dx7 electric piano sound,” Journal of Popular Music Studies, vol. 31, no. 3, pp. 73–94, 2019.
- “Categorical reparameterization with gumbel-softmax,” arXiv preprint arXiv:1611.01144, 2016.