Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation (2208.04756v2)

Published 9 Aug 2022 in cs.SD and eess.AS

Abstract: A vocoder is a conditional audio generation model that converts acoustic features such as mel-spectrograms into waveforms. Taking inspiration from Differentiable Digital Signal Processing (DDSP), we propose a new vocoder named SawSing for singing voices. SawSing synthesizes the harmonic part of singing voices by filtering a sawtooth source signal with a linear time-variant finite impulse response filter whose coefficients are estimated from the input mel-spectrogram by a neural network. As this approach enforces phase continuity, SawSing can generate singing voices without the phase-discontinuity glitch of many existing vocoders. Moreover, the source-filter assumption provides an inductive bias that allows SawSing to be trained on a small amount of data. Our experiments show that SawSing converges much faster and outperforms state-of-the-art generative adversarial network and diffusion-based vocoders in a resource-limited scenario with only 3 training recordings and a 3-hour training time.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Da-Yi Wu (6 papers)
  2. Wen-Yi Hsiao (11 papers)
  3. Fu-Rong Yang (2 papers)
  4. Oscar Friedman (1 paper)
  5. Warren Jackson (1 paper)
  6. Scott Bruzenak (1 paper)
  7. Yi-Wen Liu (29 papers)
  8. Yi-Hsuan Yang (89 papers)
Citations (22)