Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
81 tokens/sec
Gemini 2.5 Pro Premium
33 tokens/sec
GPT-5 Medium
31 tokens/sec
GPT-5 High Premium
22 tokens/sec
GPT-4o
78 tokens/sec
DeepSeek R1 via Azure Premium
92 tokens/sec
GPT OSS 120B via Groq Premium
436 tokens/sec
Kimi K2 via Groq Premium
209 tokens/sec
2000 character limit reached

Towards Decoding Brain Activity During Passive Listening of Speech (2402.16996v1)

Published 26 Feb 2024 in cs.HC, cs.LG, cs.SD, eess.AS, and q-bio.NC

Abstract: The aim of the study is to investigate the complex mechanisms of speech perception and ultimately decode the electrical changes in the brain accruing while listening to speech. We attempt to decode heard speech from intracranial electroencephalographic (iEEG) data using deep learning methods. The goal is to aid the advancement of brain-computer interface (BCI) technology for speech synthesis, and, hopefully, to provide an additional perspective on the cognitive processes of speech perception. This approach diverges from the conventional focus on speech production and instead chooses to investigate neural representations of perceived speech. This angle opened up a complex perspective, potentially allowing us to study more sophisticated neural patterns. Leveraging the power of deep learning models, the research aimed to establish a connection between these intricate neural activities and the corresponding speech sounds. Despite the approach not having achieved a breakthrough yet, the research sheds light on the potential of decoding neural activity during speech perception. Our current efforts can serve as a foundation, and we are optimistic about the potential of expanding and improving upon this work to move closer towards more advanced BCIs, better understanding of processes underlying perceived speech and its relation to spoken speech.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Towards reconstructing intelligible speech from the human auditory cortex. Scientific Reports 9, 874.
  2. Towards reconstructing intelligible speech from the human auditory cortex. Scientific reports 9, 1–11.
  3. Neural reuse: A fundamental organizational principle of the brain. Behavioral and brain sciences 33, 245–266.
  4. Speech synthesis from neural decoding of spoken sentences. Nature 568, 493–498.
  5. Towards a practical lip-to-speech conversion system using deep neural networks and mobile application frontend. CoRR abs/2104.14467. URL: https://arxiv.org/abs/2104.14467, arXiv:2104.14467.
  6. Learning representations from eeg with deep recurrent-convolutional neural networks. International conference on learning representations .
  7. Open multimodal iEEG-fMRI dataset from naturalistic stimulation with a short audiovisual film. Scientific Data 9. URL: https://doi.org/10.1038/s41597-022-01173-0, doi:10.1038/s41597-022-01173-0.
  8. Decoding speech perception by native and non-native speakers using single-trial electrophysiological data. PLoS ONE 8. doi:10.1371/journal.pone.0068261.
  9. Brain–computer interfaces for communication and control. Communications of the ACM 54, 60–66.
  10. Electrocorticographic gamma activity during word production in spoken and sign language. Neurology 57, 2045–2053.
  11. Towards Ultrasound Tongue Image prediction from EEG during speech production, in: Proc. Interspeech, Dublin, Ireland.
  12. Mapping human laryngeal motor cortex during vocalization. Cerebral Cortex 30, 6254–6269.
  13. The brain basis of language processing: from structure to function. Trends in Cognitive Sciences 15, 459–466. URL: http://dx.doi.org/10.1016/j.tics.2011.06.004, doi:10.1016/j.tics.2011.06.004.
  14. The motor theory of speech perception reviewed. Psychonomic bulletin & review 13, 361–377.
  15. Robust speech recognition: Bridging the gap between human and machine performance. Expert Systems with Applications 103, 50–60.
  16. Autoencoder-based articulatory-to-acoustic mapping for ultrasound silent speech interfaces. CoRR abs/1904.05259. URL: http://arxiv.org/abs/1904.05259, arXiv:1904.05259.
  17. Cortical interactions underlying the production of speech sounds. Journal of communication disorders 39, 350–365.
  18. The generation and propagation of the human alpha rhythm. Proceedings of the National Academy of Sciences 116, 23772–23782.
  19. The superior temporal sulcus is crucial for social communication. The Superior Temporal Sulcus is Crucial for Social Communication 5, 721–727.
  20. Brain-to-text: Decoding spoken phrases from phone representations in the brain. Frontiers in neuroscience 9, 217.
  21. The cortical organization of speech processing: Feedback control and predictive coding the context of a dual-stream model. Journal of Communication Disorders 39, 393–402.
  22. The cortical organization of speech processing. Nature reviews neuroscience 8, 393–402.
  23. The motor theory of speech perception revised. Cognition 21, 1–36.
  24. A state-of-the-art review of eeg-based imagined speech decoding. Frontiers in Human Neuroscience 16, 867281.
  25. A review of classification algorithms for eeg-based brain–computer interfaces: a 10-year update. Journal of neural engineering 15, 031005.
  26. Brain-computer interface: applications to speech decoding and synthesis to augment communication. Neurotherapeutics 19, 263–273.
  27. Phonetic feature encoding in human superior temporal gyrus. Science 343, 1006–1010.
  28. Hierarchical organization of human auditory cortex: evidence from acoustic invariance in the response to intelligible speech. Journal of Cognitive Neuroscience 33, 1–13. URL: http://dx.doi.org/10.1162/jocn.2010.21506, doi:10.1162/jocn.2010.21506.
  29. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 .
  30. Reconstructing speech from human auditory cortex. PLoS biology 10, e1001251.
  31. Spoken language comprehension—an experimental approach to disordered and normal processing. Neuropsychologia 49, 811–821.
  32. A critical review of the role of the left inferior frontal gyrus in language processing. Trends in Cognitive Sciences 20, 256–267. URL: http://dx.doi.org/10.1016/j.tics.2012.02.009, doi:10.1016/j.tics.2012.02.009.
  33. Motor cortex maps articulatory features of speech sounds. Proceedings of the National Academy of Sciences 103, 7865–7870.
  34. Mirrors in the Brain: How Our Minds Share Actions, Emotions. doi:10.1093/oso/9780199217984.001.0001.
  35. Deep learning with convolutional neural networks for eeg decoding and visualization. Human brain mapping 38, 5391–5420.
  36. Natural speech reveals the semantic maps that tile human cerebral cortex. Nature Neuroscience 19, 1037–1042.
  37. How the human brain recognizes speech in the context of changing speakers. Journal of Neuroscience 30, 629–638.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets