Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Autonomous Soundscape Augmentation with Multimodal Fusion of Visual and Participant-linked Inputs (2303.08342v2)

Published 15 Mar 2023 in cs.SD and eess.AS

Abstract: Autonomous soundscape augmentation systems typically use trained models to pick optimal maskers to effect a desired perceptual change. While acoustic information is paramount to such systems, contextual information, including participant demographics and the visual environment, also influences acoustic perception. Hence, we propose modular modifications to an existing attention-based deep neural network, to allow early, mid-level, and late feature fusion of participant-linked, visual, and acoustic features. Ablation studies on module configurations and corresponding fusion methods using the ARAUS dataset show that contextual features improve the model performance in a statistically significant manner on the normalized ISO Pleasantness, to a mean squared error of $0.1194\pm0.0012$ for the best-performing all-modality model, against $0.1217\pm0.0009$ for the audio-only model. Soundscape augmentation systems can thereby leverage multimodal inputs for improved performance. We also investigate the impact of individual participant-linked factors using trained models to illustrate improvements in model explainability.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. K. M. De Paiva Vianna, M. R. Alves Cardoso, and R. M. C. Rodrigues, “Noise pollution and annoyance: An urban soundscapes study,” Noise Heal., vol. 17, no. 76, pp. 125–133, 2015.
  2. J. Kang, et al., “Towards soundscape indices,” in 23rd Int. Congr. Acoust., 2019, pp. 2488–2495.
  3. B. De Coensel, S. Vanwetswinkel, and D. Botteldooren, “Effects of natural sounds on the perception of road traffic noise,” JASA Express Lett., vol. 129, no. 4, pp. 148–153, 2011.
  4. T. Van Renterghem, et al., “Interactive soundscape augmentation by natural sounds in a noise polluted urban park,” Landsc. Urban Plan., vol. 194, p. 103705, 2020.
  5. A. Jahani, S. Kalantary, and A. Alitavoli, “An application of artificial intelligence techniques in prediction of birds soundscape impact on tourists’ mental restoration in natural urban areas,” Urban For. Urban Green., vol. 61, no. February, 2021.
  6. T. Wong, et al., “Deployment of an IoT System for Adaptive In-Situ Soundscape Augmentation,” in Proc. Inter-Noise, 2022.
  7. A. Mitchell, et al., “Investigating urban soundscapes of the COVID-19 lockdown: A predictive soundscape modeling approach,” J. Acoust. Soc. Am., vol. 150, no. 6, pp. 4474–4488, 2021.
  8. W. Yang and J. Kang, “Acoustic comfort evaluation in urban open public spaces,” Appl. Acoust., vol. 66, no. 2, pp. 211–229, 2005.
  9. F. Aletta, et al., “The relationship between noise sensitivity and soundscape appraisal of care professionals in their work environment: a case study in Nursing Homes in Flanders, Belgium,” in Proc. Euro-Noise, 2018.
  10. E. Ratcliffe, “Sound and Soundscape in Restorative Natural Environments: A Narrative Literature Review.” Front. Psychol., vol. 12, p. 570563, 2021.
  11. A. Mitchell, et al., “The Soundscape Indices (SSID) Protocol: A Method for Urban Soundscape Surveys — Questionnaires with Acoustical and Contextual Information,” Appl. Sci., vol. 10, no. 2397, pp. 1–27, 2020.
  12. A. Preis and H. Hafke-dys, “Audio-visual interactions in environment assessment,” Sci. Total. Environ., vol. 523, pp. 191–200, 2015.
  13. V. Puyana Romero, et al., “Modelling the soundscape quality of urban waterfronts by artificial neural networks,” Appl. Acoust., vol. 111, pp. 121–128, 2016.
  14. J. K. A. Tan, et al., “The effects of visual landscape and traffic type on soundscape perception in high-rise residential estates of an urban city,” Appl. Acoust., vol. 189, p. 108580, 2022.
  15. K. N. Watcharasupat, et al., “Autonomous In-Situ Soundscape Augmentation via Joint Selection of Masker and Gain,” IEEE Signal Process. Lett., pp. 1–5, 2022.
  16. T. Baltrusaitis, C. Ahuja, and L. P. Morency, “Multimodal Machine Learning: A Survey and Taxonomy,” IEEE Transactions Pattern Analysis and Mach. Intell., vol. 41, no. 2, pp. 423–443, 2019.
  17. S. Okazaki, Q. Kong, and T. Yoshinaga, “A Multi-Modal Fusion Approach for Audio-Visual Scene Classification Enchanced by CLIP Variants,” in 6th Workshop Detect. Classif. Acoust. Scenes Events, 2021, pp. 1–4.
  18. J. Naranjo-Alcazar, et al., “Squeeze-Excitation Convolutional Recurrent Neural Networks for Audio-Visual Scene Classification,” in 6th Workshop Detect. Classif. Acoust. Scenes Events, 2021, pp. 16–20.
  19. D. Priyasad, et al., “Attention Driven Fusion for Multi-Modal Emotion Recognition,” in Proc. IEEE ICASSP, 2020, pp. 3227–3231.
  20. H. Ma, et al., “AttnSense: Multi-level attention mechanism for multimodal human activity recognition,” in Int. Jt. Conf. Artif. Intell., 2019, pp. 3109–3115.
  21. M. Lionello, F. Aletta, and J. Kang, “A systematic review of prediction models for the experience of urban soundscapes,” Appl. Acoust., vol. 170, p. 107479, 2020.
  22. N. Huang and M. Elhilali, “Auditory salience using natural soundscapes,” The J. Acoust. Soc. Am., vol. 141, no. 3, pp. 2163–2176, 2017.
  23. K. Ooi, et al., “Probably Pleasant? A Neural-Probabilistic Approach to Automatic Masker Selection for Urban Soundscape Augmentation,” in Proc. IEEE ICASSP 2022, 2022, p. 5.
  24. S. Chen and Q. Jin, “Multi-modal dimensional emotion recognition using recurrent neural networks,” Proc. 5th Int. Workshop Audio/Visual Emot. Chall., pp. 49–56, 2015.
  25. K. Ooi, et al., “ARAUS: A Large-Scale Dataset and Baseline Models of Affective Responses to Augmented Urban Soundscapes,” Tech. Rep., 2022. [Online]. Available: https://arxiv.org/abs/2207.01078v2
  26. B. De Coensel, K. Sun, and D. Botteldooren, “Urban Soundscapes of the World: Selection and reproduction of urban acoustic environments with soundscape in mind,” in Proc. Inter-Noise, 2017.
  27. K. Ooi, et al., “Automation of binaural headphone audio calibration on an artificial head,” MethodsX, vol. 8, no. February, pp. 1–12, 2021.
  28. N. D. Weinstein, “Individual differences in reactions to noise: A longitudinal study in a college dormitory,” J. Appl. Psychol., vol. 63, no. 4, pp. 458–466, 1978.
  29. G. Gamst, et al., “Development and Validation of Brief Measures of Positive and Negative Affect: The PANAS Scales,” J. Pers. Soc. Psychol., vol. 54, no. 6, pp. 1063–1070, 1988.
  30. M.-T. Luong, H. Pham, and C. D. Manning, “Effective Approaches to Attention-based Neural Machine Translation,” in Proc. Conf. Empir. Methods Nat. Lang. Process., 2015, pp. 471–482.
  31. P. Ricciardi, et al., “Sound quality indicators for urban places in Paris cross-validated by Milan data,” J. Acoust. Soc. Am., vol. 138, no. 4, pp. 2337–2348, 2015.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com