Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Decoding Envelope and Frequency-Following EEG Responses to Continuous Speech Using Deep Neural Networks (2312.09768v1)

Published 15 Dec 2023 in eess.AS and cs.SD

Abstract: The electroencephalogram (EEG) offers a non-invasive means by which a listener's auditory system may be monitored during continuous speech perception. Reliable auditory-EEG decoders could facilitate the objective diagnosis of hearing disorders, or find applications in cognitively-steered hearing aids. Previously, we developed decoders for the ICASSP Auditory EEG Signal Processing Grand Challenge (SPGC). These decoders aimed to solve the match-mismatch task: given a short temporal segment of EEG recordings, and two candidate speech segments, the task is to identify which of the two speech segments is temporally aligned, or matched, with the EEG segment. The decoders made use of cortical responses to the speech envelope, as well as speech-related frequency-following responses, to relate the EEG recordings to the speech stimuli. Here we comprehensively document the methods by which the decoders were developed. We extend our previous analysis by exploring the association between speaker characteristics (pitch and sex) and classification accuracy, and provide a full statistical analysis of the final performance of the decoders as evaluated on a heldout portion of the dataset. Finally, the generalisation capabilities of the decoders are characterised, by evaluating them using an entirely different dataset which contains EEG recorded under a variety of speech-listening conditions. The results show that the match-mismatch decoders achieve accurate and robust classification accuracies, and they can even serve as auditory attention decoders without additional training.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Colin Cherry, “Some experiments on the recognition of speech, with one and with two ears,” The Journal of the Acoustical Society of America, vol. 25, no. 5, pp. 975–979, 2005.
  2. “Neural responses to uninterrupted natural speech can be extracted with precise temporal resolution,” European Journal of Neuroscience, vol. 31, no. 1, pp. 189–193, 2010.
  3. “Neural coding of continuous speech in auditory cortex during monaural and dichotic listening,” Journal of Neurophysiology, vol. 107, no. 1, pp. 78–89, 2012.
  4. “Continuous speech processing,” Current Opinion in Physiology, vol. 18, pp. 25–31, 2020.
  5. “Emergence of neural encoding of auditory objects while listening to competing speakers,” Proceedings of the National Academy of Sciences, vol. 109, no. 29, pp. 11854–11859, 2012.
  6. James O'Sullivan et al., “Attentional selection in a cocktail party environment can be decoded from single-trial EEG,” Cerebral Cortex, vol. 25, no. 7, pp. 1697–1706, 2014.
  7. Simon Geirnaert et al., “Electroencephalography-based auditory attention decoding: Toward neurosteered hearing devices,” IEEE Signal Processing Magazine, vol. 38, no. 4, pp. 89–102, 2021.
  8. “Neural speech tracking in the theta and in the delta frequency band differentially encode clarity and comprehension of speech in noise,” The Journal of Neuroscience, vol. 39, no. 29, pp. 5750–5759, 2019.
  9. Damien Lesenfants et al., “Predicting individual speech intelligibility from the cortical tracking of acoustic- and phonetic-level speech representations,” Hearing Research, vol. 380, pp. 1–9, 2019.
  10. Hugo Weissbart et al., “Cortical tracking of surprisal during continuous speech comprehension,” Journal of Cognitive Neuroscience, vol. 32, no. 1, pp. 155–166, 2020.
  11. Michael Broderick et al., “Dissociable electrophysiological measures of natural language processing reveal differences in speech comprehension strategy in healthy ageing,” Scientific Reports, vol. 11, no. 1, 2021.
  12. Antonio Forte et al., “The human auditory brainstem response to running speech reveals a subcortical mechanism for selective attention,” eLife, vol. 6, pp. e27203, 2017.
  13. Ingo Hertrich et al., “Magnetic brain activity phase-locked to the envelope, the syllable onsets, and the fundamental frequency of a perceived speech signal,” Psychophysiology, vol. 49, no. 3, pp. 322–334, 2011.
  14. Joshua Kulasingham et al., “High gamma cortical processing of continuous speech in younger and older listeners,” NeuroImage, vol. 222, pp. 117291, 2020.
  15. “Envelope and spectral frequency-following responses to vowel sounds,” Hearing Research, vol. 245, no. 1-2, pp. 35–47, 2008.
  16. Mikolaj Kegler et al., “The neural response at the fundamental frequency of speech is modulated by word-level acoustic and linguistic information,” Frontiers in Neuroscience, vol. 16, 2022.
  17. Octave Etard et al., “Decoding of selective attention to continuous speech from the human auditory brainstem response,” NeuroImage, vol. 200, pp. 1–11, 2019.
  18. Alina Schüller et al., “Early subcortical response at the fundamental frequency of continuous speech measured with MEG,” BioRxiv, 2023.
  19. Jana Van Canneyt et al., “Neural tracking of the fundamental frequency of the voice: The effect of voice characteristics,” European Journal of Neuroscience, vol. 53, no. 11, pp. 3640–3653, 2021.
  20. Corentin Puffay et al., “Relating EEG to continuous speech using deep neural networks: A review,” Journal of Neural Engineering, in press 2023.
  21. Bernd Accou et al., “Predicting speech intelligibility from EEG in a non-linear classification paradigm,” Journal of Neural Engineering, vol. 18, no. 6, pp. 066008, 2021.
  22. Bernd Accou et al., “Decoding of the speech envelope from EEG using the VLAAI deep neural network,” Scientific Reports, vol. 13, no. 1, 2023.
  23. Jalilpour Monesi et al., “ICASSP 2023: auditory EEG challenge,” IEEE Signal Processing Society SigPort, 2022.
  24. Mike Thornton et al., “Relating EEG recordings to speech using envelope tracking and the speech-FFR,” in Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, pp. 1–2.
  25. “Auditory stimulus-response modeling with a match-mismatch task,” Journal of Neural Engineering, vol. 18, no. 4, pp. 046040, May 2021.
  26. “Extracting Different Levels of Speech Information from EEG Using an LSTM-Based Model,” in Proceedings of Interspeech, 2021, pp. 526–530.
  27. “Modeling the relationship between acoustic stimulus and EEG with a dilated convolutional neural network,” in Proceedings of the 2020 28th European Signal Processing Conference (EUSIPCO), 2021, pp. 1175–1179.
  28. “Robust neural tracking of linguistic speech representations using a convolutional neural network,” Journal of Neural Engineering, vol. 20, no. 4, pp. 046040, Aug. 2023.
  29. “Enhancing the eeg speech match mismatch tasks with word boundaries,” in Proceedings of Interspeech, 2023.
  30. Corentin Puffay et al., “Relating the fundamental frequency of speech with EEG using a dilated convolutional network,” in Proceedings of Interspeech 2022, 2022, pp. 4038–4042.
  31. Bernd Accou et al., “SparrKULee: A speech-evoked auditory response repository of the KU Leuven, containing EEG of 85 participants,” BioRxiv, 2023.
  32. “EEG dataset for ‘decoding of selective attention to continuous speech from the human auditory brainstem response’ and ‘neural speech tracking in the theta and in the delta frequency band differentially encode clarity and comprehension of speech in noise’,”, doi: 10.5281/ZENODO.7778289 2022.
  33. Wouter Biesmans et al., “Auditory-inspired speech envelope extraction methods for improved EEG-based auditory attention detection in a cocktail party scenario,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 25, no. 5, pp. 402–412, 2017.
  34. Taishih Chi et al., “Multiresolution spectrotemporal analysis of complex sounds,” The Journal of the Acoustical Society of America, vol. 118, no. 2, pp. 887–906, 2005.
  35. Mikolaj Kegler, “pyNSL,” https://github.com/MKegler/pyNSL, 2021.
  36. “Brain pipe,” https://github.com/exporl/brain_pipe, 2023.
  37. Ben Somers et al., “A generic EEG artifact removal algorithm based on the multi-channel Wiener filter,” Journal of Neural Engineering, vol. 15, no. 3, pp. 036007, 2018.
  38. “Convolutional neural networks demystified: a matched filtering perspective-based tutorial,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 53, no. 6, pp. 3614–3628, 2023.
  39. “Understanding the basis of graph convolutional neural networks via an intuitive matched filtering approach [lecture notes],” IEEE Signal Processing Magazine, vol. 40, no. 2, pp. 155–165, 2023.
  40. Christopher Bishop, Pattern Recognition and Machine Learning, Information Science and Statistics. Springer-Verlag, 2006.
  41. “Adam: A method for stochastic optimization,” in Proceedings of the 3rd International Conference on Learning Representations, ICLR, 2015.
  42. “An LSTM based architecture to relate speech stimulus to EEG,” in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 941–945.
  43. “An interpretable performance metric for auditory attention decoding algorithms in a context of neuro-steered gain control,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 28, no. 1, pp. 307–317, 2020.
  44. “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010, vol. 9, pp. 249–256.
  45. Adam Paszke et al., “PyTorch: an imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems 32, pp. 8024–8035. 2019.
  46. “Praat, a system for doing phonetics by computer,” Glot International, vol. 5, pp. 341–345, 2001.
Citations (1)

Summary

We haven't generated a summary for this paper yet.