Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Relating EEG to continuous speech using deep neural networks: a review (2302.01736v4)

Published 3 Feb 2023 in eess.AS and cs.SD

Abstract: Objective. When a person listens to continuous speech, a corresponding response is elicited in the brain and can be recorded using electroencephalography (EEG). Linear models are presently used to relate the EEG recording to the corresponding speech signal. The ability of linear models to find a mapping between these two signals is used as a measure of neural tracking of speech. Such models are limited as they assume linearity in the EEG-speech relationship, which omits the nonlinear dynamics of the brain. As an alternative, deep learning models have recently been used to relate EEG to continuous speech. Approach. This paper reviews and comments on deep-learning-based studies that relate EEG to continuous speech in single- or multiple-speakers paradigms. We point out recurrent methodological pitfalls and the need for a standard benchmark of model analysis. Main results. We gathered 29 studies. The main methodological issues we found are biased cross-validations, data leakage leading to over-fitted models, or disproportionate data size compared to the model's complexity. In addition, we address requirements for a standard benchmark model analysis, such as public datasets, common evaluation metrics, and good practices for the match-mismatch task. Significance. We present a review paper summarizing the main deep-learning-based studies that relate EEG to speech while addressing methodological pitfalls and important considerations for this newly expanding field. Our study is particularly relevant given the growing application of deep learning in EEG-speech decoding.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. Predicting speech intelligibility from eeg using a dilated convolutional network. ArXiv, abs/2105.06844, 2021a.
  2. Modeling the relationship between acoustic stimulus and eeg with a dilated convolutional neural network. 2020 28th European Signal Processing Conference (EUSIPCO), pages 1175–1179, 2021b.
  3. Decoding of the speech envelope from eeg using the vlaai deep neural network. Scientific Reports, 13(1):812, Jan 2023. ISSN 2045-2322. doi: 10.1038/s41598-022-27332-2. URL https://doi.org/10.1038/s41598-022-27332-2.
  4. Learning subject-invariant representations from speech-evoked eeg using variational autoencoders. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1256–1260, 2022. doi: 10.1109/ICASSP43922.2022.9747297.
  5. A Large Auditory EEG decoding dataset, 2023a. URL https://doi.org/10.48804/K3VSND.
  6. ICASSP 2023 AUDITORY EEG DECODING CHALLENGE. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2023b. doi: TBD.
  7. Electrophysiological correlates of semantic dissimilarity reflect the comprehension of natural, narrative speech. Current Biology, 28(5):803–809.e3, 2018. ISSN 0960-9822. doi: https://doi.org/10.1016/j.cub.2018.01.080. URL https://www.sciencedirect.com/science/article/pii/S0960982218301465.
  8. Brain-informed speech separation (biss) for enhancement of target speaker in multitalker speech perception. NeuroImage, 223:117282, 2020. ISSN 1053-8119. doi: https://doi.org/10.1016/j.neuroimage.2020.117282. URL https://www.sciencedirect.com/science/article/pii/S1053811920307680.
  9. Comparison of two-talker attention decoding from eeg with nonlinear neural networks and linear methods. Scientific Reports, 9, 08 2019. doi: 10.1038/s41598-019-47795-0.
  10. Exceeding chance level by chance: The caveat of theoretical chance levels in brain signal classification and statistical assessment of decoding accuracy. Journal of Neuroscience Methods, 250:126–136, 2015. ISSN 0165-0270. doi: https://doi.org/10.1016/j.jneumeth.2015.01.010. URL https://www.sciencedirect.com/science/article/pii/S0165027015000114. Cutting-edge EEG Methods.
  11. Linear modeling of neurophysiological responses to speech and other continuous stimuli: Methodological considerations for applied research. Frontiers in Neuroscience, 15, 11 2021. doi: 10.3389/fnins.2021.705621.
  12. Auditory attention detection dataset kuleuven, August 2019. URL https://doi.org/10.5281/zenodo.3377911. This research work was carried out at the ESAT and ExpORL Laboratories of KU Leuven, in the frame of KU Leuven Special Research Fund BOF/ STG-14-005, OT/14/119 and C14/16/057. The work has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement No 637424).
  13. Auditory stimulus-response modeling with a match-mismatch task. Journal of Neural Engineering, 18(4):046040, aug 2021. ISSN 1741-2560. doi: 10.1088/1741-2552/abf771. URL https://iopscience.iop.org/article/10.1088/1741-2552/abf771.
  14. Decoding the auditory brain with canonical component analysis. NeuroImage, 172:206–216, 2018. ISSN 1053-8119. doi: https://doi.org/10.1016/j.neuroimage.2018.01.033. URL https://www.sciencedirect.com/science/article/pii/S1053811918300338.
  15. Multiway canonical correlation analysis of brain data. NeuroImage, 186:728–740, 2019. ISSN 1053-8119. doi: https://doi.org/10.1016/j.neuroimage.2018.11.026. URL https://www.sciencedirect.com/science/article/pii/S1053811918321049.
  16. Machine learning for decoding listeners’ attention from electroencephalography evoked by continuous speech. European Journal of Neuroscience, 51(5):1234–1241, 3 2020. ISSN 14609568. doi: 10.1111/ejn.13790. URL https://pubmed.ncbi.nlm.nih.gov/29205588/.
  17. Low-frequency cortical entrainment to speech reflects phoneme-level processing. Current biology : CB, 25, 09 2015. doi: 10.1016/j.cub.2015.08.030.
  18. Emergence of neural encoding of auditory objects while listening to competing speakers. Proceedings of the National Academy of Sciences, 109(29):11854–11859, 2012. doi: 10.1073/pnas.1205381109. URL https://www.pnas.org/doi/abs/10.1073/pnas.1205381109.
  19. Neural speech tracking in the theta and in the delta frequency band differentially encode clarity and comprehension of speech in noise. The Journal of Neuroscience, 39:1828–18, 05 2019. doi: 10.1523/JNEUROSCI.1828-18.2019.
  20. Noise-robust cortical tracking of attended speech in real-world acoustic scenes. NeuroImage, 156, 04 2017. doi: 10.1016/j.neuroimage.2017.04.026.
  21. EEG and audio dataset for auditory attention decoding, March 2018. URL https://doi.org/10.5281/zenodo.1199011.
  22. Fast eeg-based decoding of the directional focus of auditory attention using common spatial patterns. IEEE Transactions on Biomedical Engineering, 68(5):1557–1568, 2021a. doi: 10.1109/TBME.2020.3033446.
  23. Unsupervised Self-Adaptive Auditory Attention Decoding. IEEE Journal on Biomedical and Health Informatics, 25(10):3955–3966, 2021b. doi: 10.1109/JBHI.2021.3075631.
  24. Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014. URL https://proceedings.neurips.cc/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf.
  25. Speaker-independent brain enhanced speech denoising. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1310–1314, 2021. doi: 10.1109/ICASSP39728.2021.9414969.
  26. A. Hyvärinen and E. Oja. Independent component analysis: algorithms and applications. Neural Networks, 13(4):411–430, 2000. ISSN 0893-6080. doi: https://doi.org/10.1016/S0893-6080(00)00026-5. URL https://www.sciencedirect.com/science/article/pii/S0893608000000265.
  27. Deep multiway canonical correlation analysis for multi-subject eeg normalization. ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1245–1249, 2021.
  28. Deep canonical correlation analysis for decoding the auditory brain. In 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pages 3505–3508, 2020. doi: 10.1109/EMBC44109.2020.9176208.
  29. On loss functions for supervised monaural time-domain speech enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing, PP:1–1, 01 2020. doi: 10.1109/TASLP.2020.2968738.
  30. State-of-the-art speech recognition using eeg and towards decoding of speech spectrum from eeg, 2019. URL https://arxiv.org/abs/1908.05743.
  31. Speech synthesis using eeg. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1235–1238, 2020. doi: 10.1109/ICASSP40776.2020.9053340.
  32. Generating eeg features from acoustic features. In 2020 28th European Signal Processing Conference (EUSIPCO), pages 1100–1104, 2021a. doi: 10.23919/Eusipco47968.2020.9287498.
  33. Advancing speech synthesis using eeg. In 2021 10th International IEEE/EMBS Conference on Neural Engineering (NER), pages 199–204, 2021b. doi: 10.1109/NER49283.2021.9441306.
  34. Extracting the auditory attention in a dual-speaker scenario from eeg using a joint cnn-lstm model. Frontiers in Physiology, 12, 2021.
  35. EEGNet: a compact convolutional neural network for EEG-based brain–computer interfaces. Journal of Neural Engineering, 15(5):056013, jul 2018. doi: 10.1088/1741-2552/aace8c. URL https://doi.org/10.1088%2F1741-2552%2Faace8c.
  36. Auditory attention decoding from electroencephalography based on long short-term memory networks. Biomedical Signal Processing and Control, 70:102966, 2021.
  37. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1412–1421, Lisbon, Portugal, September 2015. Association for Computational Linguistics. doi: 10.18653/v1/D15-1166. URL https://aclanthology.org/D15-1166.
  38. Speed-accuracy response models: Scoring rules based on response time and accuracy. Psychometrika, 4, 10 2012. doi: 10.1007/s11336-012-9288-y.
  39. An LSTM Based Architecture to Relate Speech Stimulus to Eeg. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2020-May(637424):941–945, 2020. ISSN 15206149. doi: 10.1109/ICASSP40776.2020.9054000.
  40. Extracting different levels of speech information from eeg using an lstm-based model, 2021. URL https://arxiv.org/abs/2106.09622.
  41. Sequential attention-based detection of semantic incongruities from eeg while listening to speech. In 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pages 268–271, 2020. doi: 10.1109/EMBC44109.2020.9175338.
  42. Attentional selection in a cocktail party environment can be decoded from single-trial eeg. Cerebral cortex (New York, N.Y. : 1991), 25, 01 2014a. doi: 10.1093/cercor/bht355.
  43. Attentional Selection in a Cocktail Party Environment Can Be Decoded from Single-Trial EEG. Cerebral Cortex, 25(7):1697–1706, 01 2014b. ISSN 1047-3211. doi: 10.1093/cercor/bht355. URL https://doi.org/10.1093/cercor/bht355.
  44. Librispeech: An asr corpus based on public domain audio books. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5206–5210, 2015. doi: 10.1109/ICASSP.2015.7178964.
  45. Film: Visual reasoning with a general conditioning layer. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), Apr. 2018. doi: 10.1609/aaai.v32i1.11671. URL https://ojs.aaai.org/index.php/AAAI/article/view/11671.
  46. Relating the fundamental frequency of speech with EEG using a dilated convolutional network. 23rd annual conference of the International Speech Communication Association (ISCA) - Interspeech 2022, pages 4038–4042, 2022. doi: 10.21437/Interspeech.2022-315.
  47. Robust neural tracking of linguistic speech representations using a convolutional neural network. bioRxiv, 2023. doi: 10.1101/2023.03.30.534911. URL https://www.biorxiv.org/content/early/2023/03/31/2023.03.30.534911.
  48. Deep correlation analysis for audio-eeg decoding. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 29:2742–2753, 2021. doi: 10.1109/TNSRE.2021.3129790.
  49. Sdr – half-baked or well done? ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 626–630, 2019.
  50. Native language and stimuli signal prediction from eeg. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3902–3906, 2019. doi: 10.1109/ICASSP.2019.8682563.
  51. Keyword-spotting and speech onset detection in eeg-based brain computer interfaces. In 2021 10th International IEEE/EMBS Conference on Neural Engineering (NER), pages 519–522, 2021. doi: 10.1109/NER49283.2021.9441118.
  52. A novel technique for identifying attentional selection in a dichotic environment. In 2016 IEEE Annual India Conference (INDICON), pages 1–5. IEEE, 2016.
  53. A generic EEG artifact removal algorithm based on the multi-channel Wiener filter. Journal of Neural Engineering, 15(3):036007, jun 2018. ISSN 1741-2560. doi: 10.1088/1741-2552/aaac92. URL https://iopscience.iop.org/article/10.1088/1741-2552/aaac92.
  54. Auditory attention detection with eeg channel attention. In 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 5804–5807, 11 2021. doi: 10.1109/EMBC46164.2021.9630508.
  55. Robust decoding of the speech envelope from EEG recordings through deep neural networks. Journal of Neural Engineering, 19(4):046007, jul 2022a. doi: 10.1088/1741-2552/ac7976. URL https://doi.org/10.1088/1741-2552/ac7976.
  56. Robust decoding of the speech envelope from eeg recordings through deep neural networks. Journal of Neural Engineering, 19(4):046007, 2022b.
  57. Auditory attention tracking states in a cocktail party environment can be decoded by deep convolutional neural networks. Journal of Neural Engineering, 17, 05 2020. doi: 10.1088/1741-2552/ab92b2.
  58. Representation learning with contrastive predictive coding. ArXiv, abs/1807.03748, 2018.
  59. Eeg-based detection of the locus of auditory attention with convolutional neural networks. eLife, 10:e56481, apr 2021. ISSN 2050-084X. doi: 10.7554/eLife.56481. URL https://doi.org/10.7554/eLife.56481.
  60. Speech Intelligibility Predicted from Neural Entrainment of the Speech Envelope. JARO - Journal of the Association for Research in Otolaryngology, 19(2):181–191, 2018. ISSN 14387573. doi: 10.1007/s10162-018-0654-z.
  61. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017a. URL https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
  62. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017b. URL https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
  63. EEG Dataset for ’Cortical Tracking of Surprisal during Continuous Speech Comprehension’, September 2022. URL https://doi.org/10.5281/zenodo.7086168.
  64. Decoding selective auditory attention with eeg using a transformer model. Methods, 2022a.
  65. Auditory attention decoding from eeg-based mandarin speech envelope reconstruction. Hearing Research, 422:108552, 2022b.
  66. Supervised binaural source separation using auditory attention detection in realistic scenarios. Applied Acoustics, 175:107826, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Corentin Puffay (4 papers)
  2. Bernd Accou (5 papers)
  3. Lies Bollens (2 papers)
  4. Mohammad Jalilpour Monesi (5 papers)
  5. Jonas Vanthornhout (6 papers)
  6. Hugo Van hamme (59 papers)
  7. Tom Francart (21 papers)
Citations (31)

Summary

We haven't generated a summary for this paper yet.