Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Vocal Timbre Effects with Differentiable Digital Signal Processing (2306.10886v1)

Published 19 Jun 2023 in cs.SD and eess.AS

Abstract: We explore two approaches to creatively altering vocal timbre using Differentiable Digital Signal Processing (DDSP). The first approach is inspired by classic cross-synthesis techniques. A pretrained DDSP decoder predicts a filter for a noise source and a harmonic distribution, based on pitch and loudness information extracted from the vocal input. Before synthesis, the harmonic distribution is modified by interpolating between the predicted distribution and the harmonics of the input. We provide a real-time implementation of this approach in the form of a Neutone model. In the second approach, autoencoder models are trained on datasets consisting of both vocal and instrument training data. To apply the effect, the trained autoencoder attempts to reconstruct the vocal input. We find that there is a desirable "sweet spot" during training, where the model has learned to reconstruct the phonetic content of the input vocals, but is still affected by the timbre of the instrument mixed into the training data. After further training, that effect disappears. A perceptual evaluation compares the two approaches. We find that the autoencoder in the second approach is able to reconstruct intelligible lyrical content without any explicit phonetic information provided during training.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (11)
  1. “DDSP: Differentiable Digital Signal Processing,” in International Conference on Learning Representations, 2020.
  2. “Spectral modeling synthesis. A sound analysis/synthesis system based on a deterministic plus stochastic decomposition,” Computer Music Journal, vol. 14, no. 4, pp. 12–24, 1990.
  3. “Synthesizer Sound Matching with Differentiable DSP,” in Proc. Intl. Soc. Music Information Retrieval Conf. (ISMIR), 2021.
  4. “Differentiable Wavetable Synthesis,” in Proc. IEEE Intl. Conf. Acoustics, Speech and Signal Proc. (ICASSP). IEEE, may 23 2022.
  5. “Differentiable Artificial Reverberation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 2541–2556, 2022.
  6. “DDSP-based singing vocoders: A new subtractive-based synthesizer and a comprehensive evaluation,” arXiv, 2022.
  7. “Explorations of Singing Voice Synthesis using DDSP,” in 18th Sound and Music Computing Conference, 2021, vol. 2021-June, pp. 183–190.
  8. Soonbeom Choi, “Children’s Song Dataset for Singing Voice Research,” in International Society for Music Information Retrieval Conference, 2020.
  9. “The MUSDB18 corpus for music separation,” Dec. 2017.
  10. “Creating a Multitrack Classical Music Performance Dataset for Multimodal Music Analysis: Challenges, Insights, and Applications,” IEEE Transactions on Multimedia, vol. 21, no. 2, pp. 522–535, Feb. 2019.
  11. “NANSY++: Unified voice synthesis with neural analysis and synthesis,” in Intl. Conf. Learning Representations (ICLR), 2023, Accepted as a poster.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. David Südholt (3 papers)
  2. Cumhur Erkut (6 papers)