Papers
Topics
Authors
Recent
AI Research Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 81 tok/s
Gemini 2.5 Pro 42 tok/s Pro
GPT-5 Medium 23 tok/s Pro
GPT-5 High 20 tok/s Pro
GPT-4o 103 tok/s Pro
Kimi K2 188 tok/s Pro
GPT OSS 120B 454 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Mono-to-stereo through parametric stereo generation (2306.14647v1)

Published 26 Jun 2023 in cs.SD, cs.LG, and eess.AS

Abstract: Generating a stereophonic presentation from a monophonic audio signal is a challenging open task, especially if the goal is to obtain a realistic spatial imaging with a specific panning of sound elements. In this work, we propose to convert mono to stereo by means of predicting parametric stereo (PS) parameters using both nearest neighbor and deep network approaches. In combination with PS, we also propose to model the task with generative approaches, allowing to synthesize multiple and equally-plausible stereo renditions from the same mono signal. To achieve this, we consider both autoregressive and masked token modelling approaches. We provide evidence that the proposed PS-based models outperform a competitive classical decorrelation baseline and that, within a PS prediction framework, modern generative models outshine equivalent non-generative counterparts. Overall, our work positions both PS and generative modelling as strong and appealing methodologies for mono-to-stereo upmixing. A discussion of the limitations of these approaches is also provided.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. M. R. Schroeder, “An artificial stereophonic effect obtained from a single audio signal,” Journal of the Audio Engineering Society, vol. 6, no. 2, p. 74–79, 1958.
  2. B. B. Bauer, “Some techniques toward better stereophonic perspective,” IEEE Trans. on Audio, vol. 11, p. 88–92, 1963.
  3. R. Orban, “A rational technique for synthesizing pseudo-stereo from monophonic sources,” Journal of the Audio Engineering Society, vol. 18, no. 2, p. 157–164, 1970.
  4. C. Faller, “Pseudostereophony revisited,” in Proc. of the Audio Engineering Society Conv. (AES), 2005, p. 118.
  5. M. Fink, S. Kraft, and U. Zölzer, “Downmix-compatible conversion from mono to stereo in time- and frequency-domain,” in Proc. of the Int. Conf. on Digital Audio Effects (DAFx), 2015.
  6. C. Uhle and P. Gampp, “Mono-to-stereo upmixing,” in Proc. of the Audio Engineering Society Conv. (AES), 2016, p. 140.
  7. M. Lagrange, L. G. Martins, and G. Tzanetakis, “Semi-automatic mono to stereo up-mixing using sound source formation,” in Proc. of the Audio Engineering Society Conv. (AES), 2007, p. 122.
  8. D. Fitzgerald, “Upmixing from mono - A source separation approach,” in Proc. of the Int. Conf. on Digital Signal Processing (DSP), 2011.
  9. A. Delgado Castro and J. Szymanski, “Semi-automatic mono-to-stereo upmixing via separation of note events,” in Proc. of the AES Conf. on Immersive and Interactive Audio, 2019, p. 12.
  10. J. Pons, S. Pascual, G. Cengarle, and J. Serrà, “Upsampling artifacts in neural audio synthesis,” in Proc. of the IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2021, p. 3005–3009.
  11. E. Cano, D. FitzGerald, A. Liutkus, M. D. Plumbley, and F.-R. Stöter, “Musical source separation: An introduction,” IEEE Signal Processing Magazine, vol. 36, no. 1, pp. 31–40, 2018.
  12. C. J. Steinmetz, J. Pons, S. Pascual, and J. Serrà, “Automatic multitrack mixing with a differentiable mixing console of neural audio effects,” in Proc. of the IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2021, p. 7175.
  13. C. J. Chun, S. H. Jeong, S. Y. Park, and H. K. Kim, “Extension of monaural to stereophonic sound based on deep neural networks,” in Proc. of the Audio Engineering Society Conv. (AES), 2015, p. 139.
  14. H. Purnhagen, “Low complexity parametric stereo coding in MPEG-4,” in Proc. of the Int. Conf. on Digital Audio Effects (DAFx), 2004, p. 163–168.
  15. J. Breebaart, S. van de Par, A. Kohlrausch, and E. Schuijers, “Parametric coding of stereo audio,” EURASIP Journal on Advances in Signal Processing, vol. 2005, no. 9, p. 1305–1322, 2005.
  16. A. Radford, K. Narashiman, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training,” Technical Report, OpenAI, 2018.
  17. H. Chang, H. Zhang, L. Jiang, C. Liu, and W. T. Freeman, “MaskGIT: masked generative image transformer,” in Proc. of the IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR), 2022, p. 11315–11325.
  18. R. J. Williams and D. Zipser, “A learning algorithm for continually running fully recurrent neural networks,” Neural Computation, vol. 1, no. 2, p. 270–280, 1989.
  19. X. I. Chen, N. Mishra, M. Rohaninejad, and P. Abbeel, “PixelSNAIL: an improved autoregressive generative model,” in Proc. of the Int. Conf. on Machine learning (ICML), 2018, p. 864–872.
  20. J. Ho and T. Salimans, “Classifier-free diffusion guidance,” in Proc. of the NeurIPS Workshop on Deep Generative Models and Downstream Applications, 2021.
  21. H. Chang, H. Zhang, J. Barber, A. J. Maschinot, J. Lezama, L. Jiang, M.-H. Yang, K. Murphy, W. T. Freeman, M. Rubinstein, Y. Li, and D. Krishnan, “Muse: text-to-image generation via masked generative transformers,” ArXiv: 2301.00704, 2023.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.