Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient bandwidth extension of musical signals using a differentiable harmonic plus noise model (2311.07363v3)

Published 13 Nov 2023 in cs.SD and eess.AS

Abstract: The task of bandwidth extension addresses the generation of missing high frequencies of audio signals based on knowledge of the low-frequency part of the sound. This task applies to various problems, such as audio coding or audio restoration. In this article, we focus on efficient bandwidth extension of monophonic and polyphonic musical signals using a differentiable digital signal processing (DDSP) model. Such a model is composed of a neural network part with relatively few parameters trained to infer the parameters of a differentiable digital signal processing model, which efficiently generates the output full-band audio signal. We first address bandwidth extension of monophonic signals, and then propose two methods to explicitely handle polyphonic signals. The benefits of the proposed models are first demonstrated on monophonic and polyphonic synthetic data against a baseline and a deep-learning-based resnet model. The models are next evaluated on recorded monophonic and polyphonic data, for a wide variety of instruments and musical genres. We show that all proposed models surpass a higher complexity deep learning model for an objective metric computed in the frequency domain. A MUSHRA listening test confirms the superiority of the proposed approach in terms of perceptual quality.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Audio inpainting. IEEE Transactions on Audio, Speech, and Language Processing.
  2. Bandwidth expansion of narrowband speech using non-negative matrix factorization. In Interspeech.
  3. An HMM-based artificial bandwidth extension evaluated by cross-language training and test. In International Conference on Acoustics, Speech and Signal Processing.
  4. A lightweight instrument-agnostic Model for polyphonic note transcription and multipitch estimation. In International Conference on Acoustics, Speech and Signal Processing.
  5. MedleyDB 2.0: New data and a system for sustainable data collection. In International Conference on Music Information Retrieval.
  6. High frequency magnitude spectrogram reconstruction for music mixtures using convolutional autoencoders. In Conference on Digital Audio Effects.
  7. OrchideaSOL: a dataset of extended instrumental techniques for computer-aided orchestration.
  8. Speech enhancement via frequency bandwidth extension using line spectral frequencies. In International Conference on Acoustics, Speech, and Signal Processing. Proceedings.
  9. Spectral band replication, a novel approach in audio coding. In Audio Engineering Society Convention.
  10. DDSP: Differentiable Digital Signal Processing.
  11. Differentiable white-box virtual analog modeling. In International Conference on Digital Audio Effects.
  12. Factors governing the intelligibility of speech sounds. The Journal of the Acoustical Society of America.
  13. Sparsity-based audio declipping methods: selected overview, new algorithms, and large-scale evaluation. IEEE/ACM Transactions on Audio, Speech, and Language Processing.
  14. Waveform modeling using stacked dilated convolutional neural networks for speech bandwidth extension. In Interspeech.
  15. Fast and flexible neural audio synthesis. In International Society for Music Information Retrieval Conference.
  16. Neural waveshaping synthesis.
  17. Crepe: A Convolutional Representation for Pitch Estimation. In International Conference on Acoustics, Speech and Signal Processing.
  18. Audio super resolution using neural networks.
  19. Bandwidth extension of musical audio signals with no side information using dilated convolutional neural networks. In International Conference on Acoustics, Speech and Signal Processing.
  20. Differentiable artificial reverberation. IEEE/ACM Transactions on Audio, Speech, and Language Processing.
  21. DNN-based speech bandwidth expansion and its application to adding high-frequency missing features for automatic speech recognition of narrowband speech. In Interspeech.
  22. A deep neural network approach to speech bandwidth expansion. In International Conference on Acoustics, Speech and Signal Processing.
  23. Speech bandwidth extension using generative adversarial networks. In International Conference on Acoustics, Speech and Signal Processing.
  24. Speech audio super-resolution for speech recognition. In Interspeech.
  25. Deep convolutional networks on the pitch spiral for musical instrument recognition.
  26. Synthesizer sound matching with differentiable DSP. In International Society for Music Information Retrieval Conference.
  27. SBR enhanced audio codecs for digital broadcasting such as ”Digital Radio Mondiale” (DRM). In Audio Engineering Society Convention.
  28. An overview of voice conversion systems. In Speech Communication.
  29. Zero-shot blind audio bandwidth extension. arXiv preprint arXiv:2306.01433.
  30. Solving audio inverse problems with a diffusion model. In International Conference on Acoustics, Speech and Signal Processing.
  31. BEHM-GAN: Bandwidth Extension of Historical Music using Generative Adversarial Networks.
  32. A harmonic bandwidth extension method for audio codecs. In International Conference on Acoustics, Speech and Signal Processing.
  33. A review of deep learning based speech synthesis. Applied Sciences.
  34. Wavenet: A generative model for raw audio. Proceedings of ISCA.
  35. Narrowband to wideband conversion of speech using GMM based transformation. In International Conference on Acoustics, Speech, and Signal Processing. Proceedings.
  36. Joint dictionary training for bandwidth extension of speech signals. In International Conference on Acoustics, Speech and Signal Processing.
  37. webMUSHRA — A Comprehensive Framework for Web-based Listening Tests. Journal of Open Research Software.
  38. Differentiable wavetable synthesis. In International Conference on Acoustics, Speech and Signal Processing.
  39. A study of HMM-based bandwidth extension of speech signals. Signal Processing.
  40. Denoising diffusion implicit models. International Conference on Learning Representations.
  41. Style transfer of audio effects with differentiable signal processing.
  42. Sturm, B. L. (2013). The GTZAN dataset: Its contents, its faults, their effects on evaluation, and its future use.
  43. Bandwidth extension is all you need. In International Conference on Acoustics, Speech and Signal Processing.
  44. On filter generalization for music bandwidth extension using deep neural networks. Journal of Selected Topics in Signal Processing.
  45. Non-negative matrix completion for bandwidth extension: A convex optimization approach. In International Workshop on Machine Learning for Signal Processing.
  46. Restoration of old gramophone recordings. Journal of the Audio Engineering Society.
  47. Audio source separation and speech enhancement. John wiley & sons edition.
  48. Time-frequency loss for CNN based speech super-resolution. In International Conference on Acoustics, Speech and Signal Processing.
  49. Speech bandwidth expansion based on deep neural networks. In Interspeech.
  50. An algorithm to reconstruct wideband speech from narrowband speech based on codebook mapping. In International Conference on Spoken Language Processing.
Citations (1)

Summary

We haven't generated a summary for this paper yet.