Noise Morphing for Audio Time Stretching (2312.14586v1)
Abstract: This letter introduces an innovative method to enhance the quality of audio time stretching by precisely decomposing a sound into sines, transients, and noise and by improving the processing of the latter component. While there are established methods for time-stretching sines and transients with high quality, the manipulation of noise or residual components has lacked robust solutions in prior research. The proposed method combines sound decomposition with previous techniques for audio spectral resynthesis. The time-stretched noise component is achieved by morphing its time-interpolated spectral magnitude with a white-noise excitation signal. This method stands out for its simplicity, efficiency, and audio quality. The results of a subjective experiment affirm the superiority of this approach over current state-of-the-art methods across all evaluated stretch factors. The proposed technique notably excels in extreme stretching scenarios, signifying a substantial elevation in performance. The proposed method holds promise for a wide range of applications in slow-motion media content, such as music or sports video production.
- E. Moulines and F. Charpentier, “Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones,” Speech Commun., vol. 9, pp. 453–467, Dec. 1990.
- J. Bonada, “Automatic technique in frequency domain for near-lossless time-scale modification of audio,” in Proc. Int. Computer Music Conf., (Berlin, Germany), p. 396–399, Aug. 2000.
- J. Driedger and M. Müller, “A review of time-scale modification of music signals,” Appl. Sci., vol. 6, no. 2, p. 57, 2016.
- E.-P. Damskägg and V. Välimäki, “Audio time stretching using fuzzy classification of spectral bins,” Appl. Sci., vol. 7, p. 1293, Dec. 2017.
- D. Cliff, “Hang the DJ: Automatic sequencing and seamless mixing of dance-music tracks,” HP Lab. Tech. Rep., vol. 104, 2000.
- V. Välimäki, J. Rämö, and F. Esqueda, “Creating endless sounds,” in Proc. 21st Int. Conf. Digital Audio Effects (DAFx), (Aveiro, Portugal), pp. 32–39, Sep. 2018.
- C. Malloy, “Timbral effects: The Paulstretch audio time-stretching algorithm,” J. Acous. Soc. Am., vol. 151, pp. A158–A158, Apr. 2022.
- A. Moinet, Slowdio: Audio Time-Scaling for Slow Motion Sports Videos. PhD thesis, University of Mons, Mons, Belgium, 2013.
- T. Roberts, A. Nicolson, and K. K. Paliwal, “Deep learning-based single-ended quality prediction for time-scale modified audio,” J. Audio Eng. Soc., vol. 69, pp. 644–655, Sept. 2021.
- J. Laroche and M. Dolson, “Phase-vocoder: About this phasiness business,” in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust. (WASPAA), (New Paltz, NY), Oct. 1997.
- L. Fierro and V. Välimäki, “Towards objective evaluation of audio time-scale modification methods,” in Proc. Sound Music Comp. Conf. (SMC), (Torino, Italy), pp. 457–462, Jun. 2020.
- A. Röbel, “A shape-invariant phase vocoder for speech transformation,” in Proc. 13th Int. Conf. Digital Audio Effects (DAFx-10), (Graz, Austria), p. 298–305, Sep. 2010.
- J. Driedger, M. Müller, and S. Ewert, “Improving time-scale modification of music signals using harmonic-percussive separation,” IEEE Signal Process. Lett., vol. 21, pp. 105–109, Jan. 2014.
- J. Driedger and M. Müller, “TSM Toolbox: MATLAB implementations of time-scale modification algorithms,” in Proc. Int. Conf. Digital Audio Effects (DAFx), (Erlangen, Germany), pp. 249–256, Sep. 2014.
- G. Roma, O. Green, and P. A. Tremblay, “Time scale modification of audio using non-negative matrix factorization,” in Proc. Int. Conf. Digital Audio Effects (DAFx), (Birmingham, UK), Sep. 2019.
- W.-H. Liao, A. Roebel, and A. W. Y. Su, “On stretching Gaussian noises with the phase vocoder,” in Proc. 15th Int. Conf. Digital Audio Effects (DAFx), (York, UK), pp. 131–134, Sep. 2012.
- L. Fierro, A. Wright, V. Välimäki, and M. Hämäläinen, “Extreme audio time stretching using neural synthesis,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), (Rhodes Island, Greece), pp. 1–5, Jun. 2023.
- D. Arfib, F. Keiler, U. Zölzer, V. Verfaille, and J. Bonada, “Time-frequency processing,” in DAFX: Digital Audio Effects (U. Zölzer, ed.), pp. 219–278, Chichester, UK: Wiley, 2nd ed., 2011.
- T. S. Verma and T. H. Meng, “An analysis/synthesis tool for transient signals that allows a flexible sines+transients+noise model for audio,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), vol. 6, (Seattle, WA), pp. 3573–3576, May 1998.
- S. N. Levine and J. O. Smith III, “A sines+transients+noise audio representation for data compression and time/pitch scale modifications,” in Proc. Audio Eng. Soc. 105th Conv., (San Francisco, CA), Sep. 1998.
- T. S. Verma and T. H. Y. Meng, “Time scale modification using a sines+transients+noise signal model,” in Proc. Digital Audio Effects Workshop (DAFX’98), (Barcelona, Spain), pp. 49–52, Nov. 1998.
- L. Fierro and V. Välimäki, “Enhanced fuzzy decomposition of sound into sines, transients, and noise,” J. Audio Eng. Soc., vol. 71, pp. 468–480, Jul. 2023.
- X. Serra and J. Smith, “Spectral modeling synthesis: A sound analysis/synthesis system based on a deterministic plus stochastic decomposition,” Computer Music J., vol. 14, no. 4, pp. 12–24, 1990.
- P. Hanna and M. Desainte-Catherine, “Time scale modification of noises using a spectral and statistical model,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), vol. 6, (Hong Kong, China), pp. 181–184, Apr. 2003.
- A. Moinet, T. Dutoit, and P. Latour, “Audio time-scaling for slow motion sports videos,” in Proc. Int. Conf. Digital Audio Effects (DAFx), (Maynooth, Ireland), pp. 2–5, Sep. 2013.
- T. Apel, “Sinusoidality analysis and noise synthesis in phase vocoder based timestretching,” in Proc. Australasian Computer Music Conf., (Melbourne, Australia), pp. 7–12, Jul. 2014.
- E. Cohen, F. Kreuk, and J. Keshet, “Speech time-scale modification with GANs,” IEEE Signal Process. Lett., vol. 29, pp. 1067–1071, Apr. 2022.
- T. S. Verma and T. H. Y. Meng, “Extending spectral modeling synthesis with transient modeling synthesis,” Computer Music J., vol. 24, no. 2, pp. 47–59, 2000.
- D. Fitzgerald, “Harmonic/percussive separation using median filtering,” in Proc. Int. Conf. Digital Audio Effects (DAFx), (Graz, Austria), p. 217–220, Sep. 2010.
- H. Tachibana, N. Ono, and S. Sagayama, “Singing voice enhancement in monaural music signals based on two-stage harmonic/percussive sound separation on multiple resolution spectrograms,” IEEE Trans. Audio Speech Lang. Process., vol. 22, pp. 228–237, Jan. 2014.
- J. Driedger, M. Müller, and S. Disch, “Extending harmonic-percussive separation of audio signals,” in Proc. Int. Conf. Music Inf. Retrieval (ISMIR), (Taipei, Taiwan), pp. 611–616, Oct. 2014.
- J. Laroche and M. Dolson, “Improved phase vocoder time-scale modification of audio,” IEEE Trans. Speech Audio Process., vol. 7, pp. 323–332, May 1999.
- F. Nagel and A. Walther, “A novel transient handling scheme for time stretching algorithms,” in Proc. Audio Eng. Soc. 127th Conv., (New York, NY), Oct. 2009.
- E. Moulines and J. Laroche, “Non-parametric techniques for pitch-scale and time-scale modification of speech,” Speech Commun., vol. 16, pp. 175–205, Feb. 1995.
- IET, “BS.1534: Method for the subjective assessment of intermediate quality levels of coding systems,” Recommendation ITU-R BS.1534-1, International Telecommunication Union, Geneva, Switzerland, 2015.
- M. Schoeffler, S. Bartoschek, F.-R. Stöter, et al., “WebMUSHRA—A comprehensive framework for web-based listening tests,” J. Open Research Software, vol. 6, Feb. 2018.
- C. Mendonça and S. Delikaris-Manias, “Statistical tests with MUSHRA data,” in Proc. 144th Audio Eng. Soc. Conv., (Milan, Italy), May 2018.