Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Active Restoration of Lost Audio Signals Using Machine Learning and Latent Information (2111.10891v4)

Published 21 Nov 2021 in eess.AS, cs.LG, and cs.SD

Abstract: Digital audio signal reconstruction of a lost or corrupt segment using deep learning algorithms has been explored intensively in recent years. Nevertheless, prior traditional methods with linear interpolation, phase coding and tone insertion techniques are still in vogue. However, we found no research work on reconstructing audio signals with the fusion of dithering, steganography, and machine learning regressors. Therefore, this paper proposes the combination of steganography, halftoning (dithering), and state-of-the-art shallow and deep learning methods. The results (including comparing the SPAIN, Autoregressive, deep learning-based, graph-based, and other methods) are evaluated with three different metrics. The observations from the results show that the proposed solution is effective and can enhance the reconstruction of audio signals performed by the side information (e.g., Latent representation) steganography provides. Moreover, this paper proposes a novel framework for reconstruction from heavily compressed embedded audio data using halftoning (i.e., dithering) and machine learning, which we termed the HCR (halftone-based compression and reconstruction). This work may trigger interest in optimising this approach and/or transferring it to different domains (i.e., image reconstruction). Compared to existing methods, we show improvement in the inpainting performance in terms of signal-to-noise ratio (SNR), the objective difference grade (ODG) and Hansen's audio quality metric. In particular, our proposed framework outperformed the learning-based methods (D2WGAN and SG) and the traditional statistical algorithms (e.g., SPAIN, TDC, WCP).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. P.P. Ebner and A. Eltelt, “Audio inpainting with generative adversarial network,” arXiv preprint arXiv:2003.07704, 2020.
  2. A. Marafioti, N. Perraudin, N. Holighaus, and P. Majdak, “A context encoder for audio inpainting,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 12, pp. 2362–2372, 2019.
  3. G. M. Khan and N. M. Khan, “Real-time lossy audio signal reconstruction using novel sliding based multi-instance linear regression/random forest and enhanced CGPANN,” Neural Processing Letters, pp. 1–29, 2020.
  4. B.-K. Lee and J.-H. Chang, “Packet loss concealment based on deep neural networks for digital speech transmission,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 2, pp. 378–387, 2015.
  5. N. M. Khan and G. M. Khan, “Audio signal reconstruction using cartesian genetic programming evolved artificial neural network (CGPANN),” in 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 2017, pp. 568–573.
  6. R. Sperschneider, J. Sukowski, and G. Marković, “Delay-less frequency domain packet-loss concealment for tonal audio signals,” in 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP). IEEE, 2015, pp. 766–770.
  7. O. Mokrý, P. Záviška, P. Rajmic and V. Veselý, “Introducing SPAIN (SParse Audio INpainter),” 2019 27th European Signal Processing Conference (EUSIPCO), 2019, pp. 1-5, doi: 10.23919/EUSIPCO.2019.8902560.
  8. S. Kitic, N. Bertin, and R. Gribonval, “Sparsity and cosparsity for audio declipping: A flexible non-convex approach,” In Proc: 12th International Conference on Latent Variable Analysis and Signal Separation, Aug 2015, Liberec, Czech Republic.
  9. A. J. E. M. Janssen, R. N. J. Veldhuis, and L. B. Vries, “Adaptive interpolation of discrete-time signals that can be modeled as autoregressive processes,” IEEE Trans. Acoustics, Speech and Signal Processing, vol. 34, no. 2, pp. 317–330, 1986.
  10. M. Hasannezhad, W.-P. Zhu, and B. Champagne, “A novel low-complexity attention-driven composite model for speech enhancement,” in 2021 IEEE In- ternational Symposium on Circuits and Systems (ISCAS). IEEE, 2021, pp. 1–5.
  11. R.W. Floyd, and L. Steinberg, “An Adaptive Algorithm for Spatial Greyscale,” In: Proceedings of the Society of Information Display, vol(17)(2), pp. 75-77, 1976.
  12. T.H. Kim, Tae-Hoon, S.I. Park, “Deep context-aware descreening and rescreening of halftone images”. ACM Transactions on Graphics, vol. 37, no. 4, pp. 1–12, 2018.
  13. Y. Li, J.B. Huang, N. Ahuja, M.H. Yang, “Deep Joint Image Filtering.” In: B. Leibe, J. Matas, N. Sebe, M. Welling. (eds) Computer Vision – ECCV’16. 2016. Lecture Notes in Computer Science, vol 9908. Springer, Cham.
  14. S.K. Dasari, A. Cheddad, P. Andersson, “Random Forest Surrogate Models to Support Design Space Exploration in Aerospace Use-Case,” In: MacIntyre J., Maglogiannis I., Iliadis L., Pimenidis E. (Eds.) Artificial Intelligence Applications and Innovations. AIAI 2019, Vol 559, pp. 532–544, 2019, Springer.
  15. L. Sun, J. Du, L. Dai, and C. Lee, “Multiple-target deep learning for LSTM-RNN based speech enhancement,” In Proc: Hands-free Speech Communications and Microphone Arrays (HSCMA’17), pp. 136-140. IEEE, San Francisco (2017).
  16. P. Yogarajah, J. Condell, K. Curran, P. McKevitt, and A. Cheddad, “A dynamic threshold approach for skin tone detection in colour images,” International Journal of Biometrics, vol. 4, no. 1, pp. 38, 2012.
  17. J.M. Lilly, “Element Analysis: A Wavelet-Based Method for Analysing Time-Localized Events in Noisy Time Series.” Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 473, no. 2200 (April 30, 2017): 20160776. https://doi.org/10.1098/rspa.2016.0776.
  18. A. Chambolle and T. Pock, “A first-order primal-dual algorithm for convex problems with applications to imaging,” J. Math. Imag. Vis., vol. 40, pp. 120–145, 2011.
  19. P. Combettes and J. Pesquet, “Proximal splitting methods in signal processing,” Fixed-Point Algorithms Inverse Problems Sci. Eng., vol. 49, pp. 185–212, 2011.
  20. A. Adler, V. Emiya, M. G. Jafari, M. Elad, R. Gribonval, and M. D. Plumbley, “Audio Inpainting,” IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 3, pp. 922–932, Mar. 2012.
Citations (1)

Summary

We haven't generated a summary for this paper yet.