Mono-to-stereo through parametric stereo generation (2306.14647v1)
Abstract: Generating a stereophonic presentation from a monophonic audio signal is a challenging open task, especially if the goal is to obtain a realistic spatial imaging with a specific panning of sound elements. In this work, we propose to convert mono to stereo by means of predicting parametric stereo (PS) parameters using both nearest neighbor and deep network approaches. In combination with PS, we also propose to model the task with generative approaches, allowing to synthesize multiple and equally-plausible stereo renditions from the same mono signal. To achieve this, we consider both autoregressive and masked token modelling approaches. We provide evidence that the proposed PS-based models outperform a competitive classical decorrelation baseline and that, within a PS prediction framework, modern generative models outshine equivalent non-generative counterparts. Overall, our work positions both PS and generative modelling as strong and appealing methodologies for mono-to-stereo upmixing. A discussion of the limitations of these approaches is also provided.
- M. R. Schroeder, “An artificial stereophonic effect obtained from a single audio signal,” Journal of the Audio Engineering Society, vol. 6, no. 2, p. 74–79, 1958.
- B. B. Bauer, “Some techniques toward better stereophonic perspective,” IEEE Trans. on Audio, vol. 11, p. 88–92, 1963.
- R. Orban, “A rational technique for synthesizing pseudo-stereo from monophonic sources,” Journal of the Audio Engineering Society, vol. 18, no. 2, p. 157–164, 1970.
- C. Faller, “Pseudostereophony revisited,” in Proc. of the Audio Engineering Society Conv. (AES), 2005, p. 118.
- M. Fink, S. Kraft, and U. Zölzer, “Downmix-compatible conversion from mono to stereo in time- and frequency-domain,” in Proc. of the Int. Conf. on Digital Audio Effects (DAFx), 2015.
- C. Uhle and P. Gampp, “Mono-to-stereo upmixing,” in Proc. of the Audio Engineering Society Conv. (AES), 2016, p. 140.
- M. Lagrange, L. G. Martins, and G. Tzanetakis, “Semi-automatic mono to stereo up-mixing using sound source formation,” in Proc. of the Audio Engineering Society Conv. (AES), 2007, p. 122.
- D. Fitzgerald, “Upmixing from mono - A source separation approach,” in Proc. of the Int. Conf. on Digital Signal Processing (DSP), 2011.
- A. Delgado Castro and J. Szymanski, “Semi-automatic mono-to-stereo upmixing via separation of note events,” in Proc. of the AES Conf. on Immersive and Interactive Audio, 2019, p. 12.
- J. Pons, S. Pascual, G. Cengarle, and J. Serrà, “Upsampling artifacts in neural audio synthesis,” in Proc. of the IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2021, p. 3005–3009.
- E. Cano, D. FitzGerald, A. Liutkus, M. D. Plumbley, and F.-R. Stöter, “Musical source separation: An introduction,” IEEE Signal Processing Magazine, vol. 36, no. 1, pp. 31–40, 2018.
- C. J. Steinmetz, J. Pons, S. Pascual, and J. Serrà, “Automatic multitrack mixing with a differentiable mixing console of neural audio effects,” in Proc. of the IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2021, p. 7175.
- C. J. Chun, S. H. Jeong, S. Y. Park, and H. K. Kim, “Extension of monaural to stereophonic sound based on deep neural networks,” in Proc. of the Audio Engineering Society Conv. (AES), 2015, p. 139.
- H. Purnhagen, “Low complexity parametric stereo coding in MPEG-4,” in Proc. of the Int. Conf. on Digital Audio Effects (DAFx), 2004, p. 163–168.
- J. Breebaart, S. van de Par, A. Kohlrausch, and E. Schuijers, “Parametric coding of stereo audio,” EURASIP Journal on Advances in Signal Processing, vol. 2005, no. 9, p. 1305–1322, 2005.
- A. Radford, K. Narashiman, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training,” Technical Report, OpenAI, 2018.
- H. Chang, H. Zhang, L. Jiang, C. Liu, and W. T. Freeman, “MaskGIT: masked generative image transformer,” in Proc. of the IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR), 2022, p. 11315–11325.
- R. J. Williams and D. Zipser, “A learning algorithm for continually running fully recurrent neural networks,” Neural Computation, vol. 1, no. 2, p. 270–280, 1989.
- X. I. Chen, N. Mishra, M. Rohaninejad, and P. Abbeel, “PixelSNAIL: an improved autoregressive generative model,” in Proc. of the Int. Conf. on Machine learning (ICML), 2018, p. 864–872.
- J. Ho and T. Salimans, “Classifier-free diffusion guidance,” in Proc. of the NeurIPS Workshop on Deep Generative Models and Downstream Applications, 2021.
- H. Chang, H. Zhang, J. Barber, A. J. Maschinot, J. Lezama, L. Jiang, M.-H. Yang, K. Murphy, W. T. Freeman, M. Rubinstein, Y. Li, and D. Krishnan, “Muse: text-to-image generation via masked generative transformers,” ArXiv: 2301.00704, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Collections
Sign up for free to add this paper to one or more collections.