Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Monaural speech enhancement on drone via Adapter based transfer learning (2405.10022v1)

Published 16 May 2024 in eess.AS and cs.SD

Abstract: Monaural Speech enhancement on drones is challenging because the ego-noise from the rotating motors and propellers leads to extremely low signal-to-noise ratios at onboard microphones. Although recent masking-based deep neural network methods excel in monaural speech enhancement, they struggle in the challenging drone noise scenario. Furthermore, existing drone noise datasets are limited, causing models to overfit. Considering the harmonic nature of drone noise, this paper proposes a frequency domain bottleneck adapter to enable transfer learning. Specifically, the adapter's parameters are trained on drone noise while retaining the parameters of the pre-trained Frequency Recurrent Convolutional Recurrent Network (FRCRN) fixed. Evaluation results demonstrate the proposed method can effectively enhance speech quality. Moreover, it is a more efficient alternative to fine-tuning models for various drone types, which typically requires substantial computational resources.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. “Robust acoustic source localization of emergency signals from micro air vehicles,” in IEEE/RSJ Int. Conf. Intell. Robots Syst. IEEE, 2012, pp. 4737–4742.
  2. “Deep learning models for single-channel speech enhancement on drones,” IEEE Access, vol. 11, pp. 22993–23007, 2023.
  3. “Dregon: Dataset and methods for uav-embedded sound source localization,” in IEEE/RSJ Int. Conf. Intell. Robots Syst. IEEE, 2018, pp. 1–8.
  4. L. Wang and A. Cavallaro, “Ear in the sky: Ego-noise reduction for auditory micro aerial vehicles,” in IEEE Int. Conf. Adv. Video Signal Based Surveill. IEEE, 2016, pp. 152–158.
  5. L. Wang and A. Cavallaro, “Acoustic sensing from a multi-rotor drone,” IEEE Sens. J., vol. 18, no. 11, pp. 4570–4582, 2018.
  6. I. Cohen, “Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging,” IEEE Trans. Speech Audio Process., vol. 11, no. 5, pp. 466–475, 2003.
  7. S. S. Haykin, Adaptive filter theory, Pearson Education India, 2002.
  8. “Icassp 2022 deep noise suppression challenge,” in IEEE Int. Conf. Acoust. Speech Signal Process., 2022.
  9. K. Tan and D. Wang, “A convolutional recurrent neural network for real-time speech enhancement.,” in Interspeech, 2018, vol. 2018, pp. 3229–3233.
  10. “Dccrn: Deep complex convolution recurrent network for phase-aware speech enhancement,” arXiv preprint arXiv:2008.00264, 2020.
  11. “Frcrn: Boosting feature representation using frequency recurrence for monaural speech enhancement,” in IEEE Int. Conf. Acoust. Speech Signal Process. IEEE, 2022, pp. 9281–9285.
  12. L. Wang and A. Cavallaro, “Deep learning assisted time-frequency processing for speech enhancement on drones,” IEEE Trans. Emerg. Topics Comput. Intell., vol. 5, no. 6, pp. 871–881, 2020.
  13. “Aira-uas: an evaluation corpus for audio processing in unmanned aerial system,” in 2018 International Conference on Unmanned Aircraft Systems (ICUAS). IEEE, 2018, pp. 836–845.
  14. “Audio-visual sensing from a quadcopter: dataset and baselines for source localization and sound enhancement,” in IEEE/RSJ Int. Conf. Intell. Robots Syst. IEEE, 2019, pp. 5320–5325.
  15. “Learning multiple visual domains with residual adapters,” Advances in Neural Information Processing Systems (NeurIPS), vol. 30, 2017.
  16. “Parameter-efficient transfer learning for nlp,” in International conference on machine learning (ICML). PMLR, 2019, pp. 2790–2799.
  17. G. Sinibaldi and L. Marino, “Experimental analysis on the noise of propellers for small uav,” Appl. Acoust., vol. 74, no. 1, pp. 79–88, 2013.
  18. “Spherical array based drone noise measurements and modelling for drone noise reduction via propeller phase control,” in IEEE Workshop Appl. Signal Process. Audio Acoust. IEEE, 2021, pp. 286–290.
  19. “Complex ratio masking for monaural speech separation,” IEEE/ACM Trans. Audio Speech Lang. Process., vol. 24, no. 3, pp. 483–492, 2015.
  20. “Sdr–half-baked or well done?,” in IEEE Int. Conf. Acoust. Speech Signal Process. IEEE, 2019, pp. 626–630.
  21. “Librispeech: an asr corpus based on public domain audio books,” in IEEE Int. Conf. Acoust. Speech Signal Process. IEEE, 2015, pp. 5206–5210.
  22. “Perceptual evaluation of speech quality (pesq)-a new method for speech quality assessment of telephone networks and codecs,” in IEEE Int. Conf. Acoust. Speech Signal Process. IEEE, 2001, vol. 2, pp. 749–752.
  23. J. Jensen and C. H. Taal, “An algorithm for predicting the intelligibility of speech masked by modulated noise maskers,” IEEE/ACM Trans. Audio Speech Lang. Process., vol. 24, no. 11, pp. 2009–2022, 2016.

Summary

We haven't generated a summary for this paper yet.