Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NeuSpeech: Decode Neural signal as Speech (2403.01748v3)

Published 4 Mar 2024 in cs.CL and cs.AI

Abstract: Decoding language from brain dynamics is an important open direction in the realm of brain-computer interface (BCI), especially considering the rapid growth of LLMs. Compared to invasive-based signals which require electrode implantation surgery, non-invasive neural signals (e.g. EEG, MEG) have attracted increasing attention considering their safety and generality. However, the exploration is not adequate in three aspects: 1) previous methods mainly focus on EEG but none of the previous works address this problem on MEG with better signal quality; 2) prior works have predominantly used $teacher-forcing"$ during generative decoding, which is impractical; 3) prior works are mostly $BART-based"$ not fully auto-regressive, which performs better in other sequence tasks. In this paper, we explore the brain-to-text translation of MEG signals in a speech-decoding formation. Here we are the first to investigate a cross-attention-based ``whisper" model for generating text directly from MEG signals without teacher forcing. Our model achieves impressive BLEU-1 scores of 60.30 and 52.89 without pretraining $&$ teacher-forcing on two major datasets ($\textit{GWilliams}$ and $\textit{Schoffelen}$). This paper conducts a comprehensive review to understand how speech decoding formation performs on the neural decoding tasks, including pretraining initialization, training $&$ evaluation set splitting, augmentation, and scaling law. Code is available at https://github.com/NeuSpeech/NeuSpeech1$.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Speech synthesis from neural decoding of spoken sentences. Nature, 568(7753):493–498, April 2019. ISSN 1476-4687. doi: 10.1038/s41586-019-1119-1. URL http://dx.doi.org/10.1038/s41586-019-1119-1.
  2. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in Neural Information Processing Systems, 33:12449–12460, 2020.
  3. Dreamdiffusion: Generating high-quality images from brain eeg signals. arXiv preprint arXiv:2306.16934, 2023.
  4. Interpretable many-class decoding for meg. NeuroImage, 282:120396, 2023.
  5. Decoding imagined and spoken phrases from non-invasive neural (meg) signals. Frontiers in neuroscience, 14:290, 2020.
  6. Dewave: Discrete encoding of eeg waves for eeg to text translation. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=WaLI8slhLw.
  7. Decoding speech perception from non-invasive brain recordings. Nature Machine Intelligence, 5(10):1097–1107, October 2023. ISSN 2522-5839. doi: 10.1038/s42256-023-00714-5. URL http://dx.doi.org/10.1038/s42256-023-00714-5.
  8. A high-performance brain-to-sentence decoder for logosyllabic language. 2023.
  9. Trials and tribulations when attempting to decode semantic representations from meg responses to written text. Language, Cognition and Neuroscience, pp.  1–12, 2023.
  10. Introducing meg-masc a high-quality magneto-encephalography dataset for evaluating natural speech processing. Scientific Data, 10(1), December 2023. ISSN 2052-4463. doi: 10.1038/s41597-023-02752-5. URL http://dx.doi.org/10.1038/s41597-023-02752-5.
  11. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  16000–16009, 2022.
  12. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp.  7871–7880, 2020. doi: 10.18653/v1/2020.acl-main.703. URL https://www.aclweb.org/anthology/2020.acl-main.703.
  13. Lin, C.-Y. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pp.  74–81, 2004.
  14. Decoding and synthesizing tonal language speech from brain activity. Science Advances, 9(23), June 2023. ISSN 2375-2548. doi: 10.1126/sciadv.adh0478. URL http://dx.doi.org/10.1126/sciadv.adh0478.
  15. Generalizable spelling using a speech neuroprosthesis in an individual with severe limb and vocal paralysis. Nature Communications, 13(1), November 2022. ISSN 2041-1723. doi: 10.1038/s41467-022-33611-3. URL http://dx.doi.org/10.1038/s41467-022-33611-3.
  16. A high-performance neuroprosthesis for speech decoding and avatar control. Nature, 620(7976):1037–1046, August 2023. ISSN 1476-4687. doi: 10.1038/s41586-023-06443-4. URL http://dx.doi.org/10.1038/s41586-023-06443-4.
  17. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp.  311–318, 2002.
  18. Robust speech recognition via large-scale weak supervision. In International Conference on Machine Learning, pp.  28492–28518. PMLR, 2023.
  19. A 204-subject multimodal neuroimaging dataset to study language processing. Scientific Data, 6(1), April 2019. ISSN 2052-4463. doi: 10.1038/s41597-019-0020-y. URL http://dx.doi.org/10.1038/s41597-019-0020-y.
  20. Semantic reconstruction of continuous language from non-invasive brain recordings. September 2022. doi: 10.1101/2022.09.29.509744. URL http://dx.doi.org/10.1101/2022.09.29.509744.
  21. Stimulus speech decoding from human cortex with generative adversarial network transfer learning. In 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI). IEEE, April 2020. doi: 10.1109/isbi45749.2020.9098589. URL http://dx.doi.org/10.1109/isbi45749.2020.9098589.
  22. Open vocabulary electroencephalography-to-text decoding and zero-shot sentiment classification. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp.  5350–5358, 2022.
  23. High-performance brain-to-text communication via handwriting. Nature, 593(7858):249–254, May 2021. ISSN 1476-4687. doi: 10.1038/s41586-021-03506-2. URL http://dx.doi.org/10.1038/s41586-021-03506-2.
  24. A high-performance speech neuroprosthesis. January 2023. doi: 10.1101/2023.01.21.524489. URL http://dx.doi.org/10.1101/2023.01.21.524489.
  25. Unicorn: Unified cognitive signal reconstruction bridging cognitive signals and human language. arXiv preprint arXiv:2307.05355, 2023.
  26. Adaptive budget allocation for parameter-efficient fine-tuning. arXiv preprint arXiv:2303.10512, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Yiqian Yang (12 papers)
  2. Yiqun Duan (34 papers)
  3. Qiang Zhang (466 papers)
  4. Renjing Xu (72 papers)
  5. Hui Xiong (244 papers)
  6. Hyejeong Jo (5 papers)
  7. Jinni Zhou (16 papers)
  8. Won Hee Lee (5 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com