Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Parameter Estimation of Sinusoidal Models for Speech and Audio Signals (2401.01255v1)

Published 2 Jan 2024 in eess.AS and eess.SP

Abstract: In this paper, we examine the parameter estimation performance of three well-known sinusoidal models for speech and audio. The first one is the standard Sinusoidal Model (SM), which is based on the Fast Fourier Transform (FFT). The second is the Exponentially Damped Sinusoidal Model (EDSM) which has been proposed in the last decade, and utilizes a subspace method for parameter estimation, and finally the extended adaptive Quasi-Harmonic Model (eaQHM), which has been recently proposed for AM-FM decomposition, and estimates the signal parameters using Least Squares on a set of basis function that are adaptive to the local characteristics of the signal. The parameter estimation of each model is briefly described and its performance is compared to the others in terms of signal reconstruction accuracy versus window size on a variety of synthetic signals and versus the number of sinusoids on real signals. The latter include highly non stationary signals, such as singing voices and guitar solos. The advantages and disadvantages of each model are presented via synthetic signals and then the application on real signals is discussed. Conclusively, eaQHM outperforms EDS in medium-to-large window size analysis, whereas EDSM yields higher reconstruction values for smaller analysis window sizes. Thus, a future research direction appears to be the merge of adaptivity of the eaQHM and parameter estimation robustness of the EDSM in a new paradigm for high-quality analysis and resynthesis of general audio signals.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. E. B. George and M. Smith, “A new Speech Coding Model based on a Least-Squares Sinusoidal Representation,” in ICASSP, Apr 1987, pp. 1641–1644.
  2. “Low-rate speech coding based on the sinusoidal model,” in Advances in Speech Signal Processing, S. Furui and M. M. Sondhi, Eds. Marcel Dekker Inc., New York, 1992.
  3. S. Ahmadi and A. S. Spanias, “Low bit-rate speech coding based on an improved sinusoidal model,” Speech Communication, vol. 34, no. 4, pp. 369 – 390, 2001.
  4. “Speech Analysis/Synthesis based on a Sinusoidal Representation,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 34, pp. 744–754, 1986.
  5. X. Serra, A System for Sound Analysis, Transformation, and Synthesis based on a Deterministic plus Stochastic Decomposition, Ph.D. thesis, Stanford University, 1989.
  6. “Audio signal processing based on sinusoidal analysis/synthesis,” in Applications of Digital Signal Processing to Audio and Acoustics, Mark Kahrs and Karlheinz Brandenburg, Eds., chapter 9, pp. 343–416. Kluwer Academic Publishers, 2002.
  7. J. Laroche Y. Stylianou and E. Moulines, “HNM: A Simple, Effecient Harmonic plus Noise Model for Speech,” in Workshop on Appl. of Signal Proc. to Audio and Acoustics (WASPAA), New Paltz, NY, USA, Oct 1993, pp. 169–172.
  8. “Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model,” IEEE Transactions on Speech and Audio Processing, vol. 5, no. 5, pp. 389–406, Sep 1997.
  9. “Analysis/Synthesis of Speech based on an Adaptive Quasi-Harmonic plus Noise Model,” in ICASSP, 2010.
  10. “Adaptive sinusoidal modeling of percussive musical instrument sounds,” in Proc. European Signal Processing Conference (EUSIPCO), 2013.
  11. “Speech enhancement using state-based estimation and sinusoidal modeling,” The Journal of the Acoustical Society of America, vol. 102, no. 2, pp. 1141–1148, 1997.
  12. J. Jensen and J.H.L. Hansen, “Speech enhancement using a constrained iterative sinusoidal model,” IEEE Transactions on Speech and Audio Processing, vol. 9, no. 7, pp. 731–740, 2001.
  13. “Noisy speech enhancement using harmonic-noise model and codebook-based post-processing,” IEEE Trans. on Audio, Speech and Lang. Processing, vol. 15, no. 4, pp. 1194–1203, 2007.
  14. Y. Stark and J. Tabrikian, “MMSE-based speech enhancement using the harmonic model,” in Electrical and Electronics Engineers in Israel, 2008. IEEEI 2008. IEEE 25th Convention of, 2008, pp. 626–630.
  15. “Shape-Invariant Time-Scale and Pitch Modifications of Speech,” IEEE Trans. on Acoust., Speech and Signal Processing, vol. 40, pp. 497–510, 1992.
  16. J. Laroche Y. Stylianou and E. Moulines, “High-Quality Speech Modification based on a Harmonic + Noise Model.,” Proc. EUROSPEECH, 1995.
  17. Naotoshi Osaka, “Timbre interpolation of sounds using a sinusoidal model,” in Proceedings of the International Computer Music Conference, 1995.
  18. Riccardo Di Federico, “Waveform preserving time stretching and pitch shifting for sinusoidal models of sound,” in In Proceedings of the COST-G6 Digital Audio Effects Workshop, 1998, pp. 44–48.
  19. “Time-scale Modifications based on an Adaptive Harmonic Model,” in ICASSP, 2013.
  20. M. Goodwin, “Matching pursuit with damped sinusoids,” in ICASSP, 1997.
  21. “Sinusoidal modeling using frame-based perceptually weighted matching pursuits,” in ICASSP, 1997.
  22. “Sinusoidal modeling using psychoacoustic-adaptive matching pursuits,” IEEE Signal Processing Letters, vol. 9, no. 8, 2002.
  23. Y. Stylianou, Harmonic plus Noise Models for Speech, combined with Statistical Methods, for Speech and Speaker Modification, Ph.D. thesis, E.N.S.T - Paris, 1996.
  24. “Adaptive AM-FM signal decomposition with application to speech analysis,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, pp. 290–300, 2011.
  25. G. Degottex and Y. Stylianou, “Analysis and synthesis of speech using an adaptive full-band harmonic model,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 10, pp. 2085–2095, 2013.
  26. “Robust exponential modeling of audio signals,” in ICASSP 1998, Seattle, WA, USA, May 1998.
  27. “A perceptual subspace approach for modeling of speech and audio signals with damped sinusoids,” IEEE Transaction on Speech and Audio Processing, vol. 12, no. 2, pp. 121–132, March 2004.
  28. “A new perturbation analysis for signal enumeration in rotational invariance techniques,” IEEE Transaction on Signal Processing, vol. 54, no. 2, pp. 492–504, February 2006.
  29. “Exponential sinusoidal modeling of transitional speech segments,” in ICASSP, 1999.
  30. “An Extension of the Adaptive Quasi-Harmonic Model,” in Proc. IEEE ICASSP, 2012.
  31. “Evaluating how well filtered white noise models the residual from sinusoidal modeling of musical instrument sounds,” in Proc. Workshop on Applications of Signal Processing to Audio and Acoustics, 2013.
  32. “Perceptual audio modeling with exponentially damped sinusoids,” Signal Processing, vol. 85, no. 1, pp. 163–176, January 2005.
  33. “On the Properties of a Time-Varying Quasi-Harmonic Model of Speech,” in Interspeech, 2008.
  34. “Robust full-band adaptive sinusoidal analysis and synthesis of speech,” in Proc. IEEE ICASSP, 2014, In press.
  35. “On the modeling of voiceless stop sounds of speech using adaptive quasi-harmonic models,” in Interspeech, 2012.
  36. Communication Systems, McGraw-Hill Education, 2009.
  37. A. Camacho and J. G. Harris, “A sawtooth waveform inspired pitch estimator for speech and music,” J. Acoust. Soc. Am., vol. 124, pp. 1628–1652, 2008.

Summary

We haven't generated a summary for this paper yet.