NeuSpeech: Decode Neural signal as Speech (2403.01748v3)
Abstract: Decoding language from brain dynamics is an important open direction in the realm of brain-computer interface (BCI), especially considering the rapid growth of LLMs. Compared to invasive-based signals which require electrode implantation surgery, non-invasive neural signals (e.g. EEG, MEG) have attracted increasing attention considering their safety and generality. However, the exploration is not adequate in three aspects: 1) previous methods mainly focus on EEG but none of the previous works address this problem on MEG with better signal quality; 2) prior works have predominantly used $teacher-forcing"$ during generative decoding, which is impractical; 3) prior works are mostly $
BART-based"$ not fully auto-regressive, which performs better in other sequence tasks. In this paper, we explore the brain-to-text translation of MEG signals in a speech-decoding formation. Here we are the first to investigate a cross-attention-based ``whisper" model for generating text directly from MEG signals without teacher forcing. Our model achieves impressive BLEU-1 scores of 60.30 and 52.89 without pretraining $&$ teacher-forcing on two major datasets ($\textit{GWilliams}$ and $\textit{Schoffelen}$). This paper conducts a comprehensive review to understand how speech decoding formation performs on the neural decoding tasks, including pretraining initialization, training $&$ evaluation set splitting, augmentation, and scaling law. Code is available at https://github.com/NeuSpeech/NeuSpeech1$.
- Speech synthesis from neural decoding of spoken sentences. Nature, 568(7753):493–498, April 2019. ISSN 1476-4687. doi: 10.1038/s41586-019-1119-1. URL http://dx.doi.org/10.1038/s41586-019-1119-1.
- wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in Neural Information Processing Systems, 33:12449–12460, 2020.
- Dreamdiffusion: Generating high-quality images from brain eeg signals. arXiv preprint arXiv:2306.16934, 2023.
- Interpretable many-class decoding for meg. NeuroImage, 282:120396, 2023.
- Decoding imagined and spoken phrases from non-invasive neural (meg) signals. Frontiers in neuroscience, 14:290, 2020.
- Dewave: Discrete encoding of eeg waves for eeg to text translation. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=WaLI8slhLw.
- Decoding speech perception from non-invasive brain recordings. Nature Machine Intelligence, 5(10):1097–1107, October 2023. ISSN 2522-5839. doi: 10.1038/s42256-023-00714-5. URL http://dx.doi.org/10.1038/s42256-023-00714-5.
- A high-performance brain-to-sentence decoder for logosyllabic language. 2023.
- Trials and tribulations when attempting to decode semantic representations from meg responses to written text. Language, Cognition and Neuroscience, pp. 1–12, 2023.
- Introducing meg-masc a high-quality magneto-encephalography dataset for evaluating natural speech processing. Scientific Data, 10(1), December 2023. ISSN 2052-4463. doi: 10.1038/s41597-023-02752-5. URL http://dx.doi.org/10.1038/s41597-023-02752-5.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16000–16009, 2022.
- BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7871–7880, 2020. doi: 10.18653/v1/2020.acl-main.703. URL https://www.aclweb.org/anthology/2020.acl-main.703.
- Lin, C.-Y. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pp. 74–81, 2004.
- Decoding and synthesizing tonal language speech from brain activity. Science Advances, 9(23), June 2023. ISSN 2375-2548. doi: 10.1126/sciadv.adh0478. URL http://dx.doi.org/10.1126/sciadv.adh0478.
- Generalizable spelling using a speech neuroprosthesis in an individual with severe limb and vocal paralysis. Nature Communications, 13(1), November 2022. ISSN 2041-1723. doi: 10.1038/s41467-022-33611-3. URL http://dx.doi.org/10.1038/s41467-022-33611-3.
- A high-performance neuroprosthesis for speech decoding and avatar control. Nature, 620(7976):1037–1046, August 2023. ISSN 1476-4687. doi: 10.1038/s41586-023-06443-4. URL http://dx.doi.org/10.1038/s41586-023-06443-4.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp. 311–318, 2002.
- Robust speech recognition via large-scale weak supervision. In International Conference on Machine Learning, pp. 28492–28518. PMLR, 2023.
- A 204-subject multimodal neuroimaging dataset to study language processing. Scientific Data, 6(1), April 2019. ISSN 2052-4463. doi: 10.1038/s41597-019-0020-y. URL http://dx.doi.org/10.1038/s41597-019-0020-y.
- Semantic reconstruction of continuous language from non-invasive brain recordings. September 2022. doi: 10.1101/2022.09.29.509744. URL http://dx.doi.org/10.1101/2022.09.29.509744.
- Stimulus speech decoding from human cortex with generative adversarial network transfer learning. In 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI). IEEE, April 2020. doi: 10.1109/isbi45749.2020.9098589. URL http://dx.doi.org/10.1109/isbi45749.2020.9098589.
- Open vocabulary electroencephalography-to-text decoding and zero-shot sentiment classification. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 5350–5358, 2022.
- High-performance brain-to-text communication via handwriting. Nature, 593(7858):249–254, May 2021. ISSN 1476-4687. doi: 10.1038/s41586-021-03506-2. URL http://dx.doi.org/10.1038/s41586-021-03506-2.
- A high-performance speech neuroprosthesis. January 2023. doi: 10.1101/2023.01.21.524489. URL http://dx.doi.org/10.1101/2023.01.21.524489.
- Unicorn: Unified cognitive signal reconstruction bridging cognitive signals and human language. arXiv preprint arXiv:2307.05355, 2023.
- Adaptive budget allocation for parameter-efficient fine-tuning. arXiv preprint arXiv:2303.10512, 2023.
- Yiqian Yang (12 papers)
- Yiqun Duan (34 papers)
- Qiang Zhang (466 papers)
- Renjing Xu (72 papers)
- Hui Xiong (244 papers)
- Hyejeong Jo (5 papers)
- Jinni Zhou (16 papers)
- Won Hee Lee (5 papers)