Multi-level graph learning for audio event classification and human-perceived annoyance rating prediction (2312.09952v1)
Abstract: WHO's report on environmental noise estimates that 22 M people suffer from chronic annoyance related to noise caused by audio events (AEs) from various sources. Annoyance may lead to health issues and adverse effects on metabolic and cognitive systems. In cities, monitoring noise levels does not provide insights into noticeable AEs, let alone their relations to annoyance. To create annoyance-related monitoring, this paper proposes a graph-based model to identify AEs in a soundscape, and explore relations between diverse AEs and human-perceived annoyance rating (AR). Specifically, this paper proposes a lightweight multi-level graph learning (MLGL) based on local and global semantic graphs to simultaneously perform audio event classification (AEC) and human annoyance rating prediction (ARP). Experiments show that: 1) MLGL with 4.1 M parameters improves AEC and ARP results by using semantic node information in local and global context aware graphs; 2) MLGL captures relations between coarse and fine-grained AEs and AR well; 3) Statistical analysis of MLGL results shows that some AEs from different sources significantly correlate with AR, which is consistent with previous research on human perception of these sound sources.
- World Health Organization, Burden of disease from environmental noise: Quantification of healthy life years lost in Europe, WHO Regional Office for Europe, 2011.
- “Environmental noise and the cardiovascular system,” Journal of the American College of Cardiology, vol. 71, no. 6, pp. 688–697, 2018.
- “A model for the perception of environmental sound based on notice-events,” The Journal of the Acoustical Society of America, vol. 126, no. 2, pp. 656–665, 2009.
- World Health Organization, Environmental noise guidelines for the European region, World Health Organization. Regional Office for Europe, 2018.
- “Polyphonic audio event detection: multi-label or multi-class multi-task classification problem?,” in Proc. of ICASSP, 2022, pp. 8877–8881.
- “Joint prediction of audio event and annoyance rating in an urban soundscape by hierarchical graph representation learning,” in Proc. of INTERSPEECH, 2023, pp. 331–335.
- “Overview and evaluation of sound event localization and detection in DCASE 2019,” IEEE/ACM TASLP, vol. 29, pp. 684–698, 2020.
- “PANNs: Large-scale pretrained audio neural networks for audio pattern recognition,” IEEE/ACM TASLP, vol. 28, pp. 2880–2894, 2020.
- “Sound event detection via dilated convolutional recurrent neural networks,” in Proc. of ICASSP, 2020, pp. 286–290.
- “AudioSet: An ontology and human-labeled dataset for audio events,” in Proc. of ICASSP, 2017, pp. 776–780.
- “AST: Audio Spectrogram Transformer,” in Proc. of INTERSPEECH, 2021, pp. 571–575.
- “Machine listening for park soundscape quality assessment,” Acta Acustica united with Acustica, vol. 104, no. 1, pp. 121–130, 2018.
- “AI-based soundscape analysis: Jointly identifying sound sources and predicting annoyancea),” The Journal of the Acoustical Society of America, vol. 154, no. 5, pp. 3145–3157, 11 2023.
- “Effects of soundscape complexity on urban noise annoyance ratings: A large-scale online listening experiment,” International Journal of ERPH, vol. 19, no. 22, pp. 14, 2022.
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. of ICLR, 2015.
- “Gated graph sequence neural networks,” in Proc. of ICLR, 2016.
- “Attention is all you need,” in Proc. of NeurIPS, 2017, pp. 5998–6008.
- “Dropout: A simple way to prevent neural networks from overfitting,” The journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.
- L. Ilya and H. Frank, “Decoupled weight decay regularization,” in Proc. of ICLR, 2019.
- “Metrics for polyphonic sound event detection,” Applied Sciences, vol. 6, no. 6, pp. 162, 2016.
- R. W. Emerson, “Regression analysis and adjusted R2.,” Journal of Visual Impairment & Blindness, vol. 114, no. 4, pp. 332–334, 2020.
- “Language modeling with gated convolutional networks,” in Proc. of ICML, 2017, pp. 933–941.
- “Pearson correlation coefficient,” Noise Reduction in Speech Processing, pp. 1–4, 2009.
- “The variance of spearman’s rho in normal samples,” Biometrika, vol. 48, no. 1/2, pp. 19–28, 1961.