Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Silent Speech Decoding System from EEG and EMG with Heterogenous Electrode Configurations (2506.13835v1)

Published 16 Jun 2025 in q-bio.QM, cs.LG, and q-bio.NC

Abstract: Silent speech decoding, which performs unvocalized human speech recognition from electroencephalography/electromyography (EEG/EMG), increases accessibility for speech-impaired humans. However, data collection is difficult and performed using varying experimental setups, making it nontrivial to collect a large, homogeneous dataset. In this study we introduce neural networks that can handle EEG/EMG with heterogeneous electrode placements and show strong performance in silent speech decoding via multi-task training on large-scale EEG/EMG datasets. We achieve improved word classification accuracy in both healthy participants (95.3%), and a speech-impaired patient (54.5%), substantially outperforming models trained on single-subject data (70.1% and 13.2%). Moreover, our models also show gains in cross-language calibration performance. This increase in accuracy suggests the feasibility of developing practical silent speech decoding systems, particularly for speech-impaired patients.

Summary

  • The paper introduces multi-task neural networks that decode silent speech from heterogeneous EEG/EMG data using novel tokenization techniques.
  • The model, combining HTNet and Conformer architectures, achieved 95.3% accuracy in healthy subjects and 54.5% in a neurodegenerative patient using varied electrode configurations.
  • This non-invasive system offers promising implications for real-world communication aids for individuals with speech disabilities, advancing transferability in BMIs.

Silent Speech Decoding from EEG and EMG: An Analysis of Heterogeneous Electrode Configurations

The paper under review presents an innovative approach to the field of silent speech decoding using electroencephalography (EEG) and electromyography (EMG) signals. This paper addresses a critical challenge in providing communication solutions for individuals with speech disabilities, such as those with amyotrophic lateral sclerosis (ALS) or those who have undergone laryngectomy. In contrast to invasive brain-machine interfaces (BMIs), which necessitate surgical procedures, the non-invasive methods utilized in this research offer a more accessible alternative.

Key Contributions

The primary contribution of this paper is the development of neural networks capable of handling EEG/EMG data from heterogeneous electrode configurations through multi-task training. This is achieved by introducing several tokenization techniques—such as global average pooling, electrode-specific, subject-specific, and a novel on-the-fly kernel—that enable the transformation of data from different electrode configurations into a consistent format suitable for deep neural network (DNN) processing.

In terms of implementation, the researchers constructed a model combining HTNet and Conformer architectures, which not only accommodates diverse electrode arrangements but also leverages large-scale datasets for training. This model architecture includes a robust tokenizer and position encoder, followed by a Conformer that processes EEG and EMG tokens into latents used for task-specific predictions.

Results

The experimental evaluations demonstrate substantial improvements in word classification accuracy across both healthy individuals and a patient with a neurodegenerative condition. Notably, the cross-validated accuracies achieved are 95.3% for healthy participants and 54.5% for the patient when utilizing a mixed dataset ('all'), which includes various types of EEG/EMG data.

The results emphasize the efficacy of the diverse tokenization methodologies and underscore the difficulties in EEG/EMG signal processing linked to varying electrode placements. Additionally, the paper highlights the potential for models pretrained on large datasets to provide substantial zero-shot performance and calibration advantages on unseen subjects, including cross-language calibration capabilities.

Implications and Future Directions

The implications of this research are twofold: practical and theoretical. Practically, the findings suggest a significant step towards developing accessible, reliable silent speech decoding systems, particularly for populations unable to vocalize due to physiological limitations. Theoretically, the paper advances understanding of transferability in BMIs through multi-subject, multi-configuration EEG/EMG data handling.

Moving forward, the research can be expanded by exploring multilingual pretraining datasets to enhance cross-linguistic applicability. Further, collecting more substantial datasets from patients with varying speech impairments could allow for refining model robustness and usability in real-world scenarios. Integration with LLMs could also facilitate more effective silent speech sentence decoding, thus broadening the application spectrum of non-invasive brain-machine interface systems.

In conclusion, this work significantly progresses towards practical applications of silent speech decoding, setting a groundwork for further exploration in handling complex, heterogeneous EEG/EMG data.