Brain Prompt GPT: Neural-to-Text Decoding

Updated 6 April 2026

Brain Prompt GPT is a neural decoding framework that converts brain imaging data into open-vocabulary text using prompt-conditioned large language models.
It utilizes a two-stage process where text-derived optimal prompts are aligned with brain-derived prompts through contrastive loss for accurate neural-to-text translation.
Empirical results show improved semantic metrics, indicating BP-GPT’s potential for advancing brain–computer interfaces and neurological diagnostic applications.

Brain Prompt GPT (BP-GPT) is a class of neural decoding frameworks that use brain-derived representations as prompts for LLMs, enabling open-vocabulary and contextually rich text generation directly from brain imaging data. BP-GPT was introduced as a solution to the limitations of small-set or closed-vocabulary brain–computer interface (BCI) decoders by leveraging pretrained autoregressive LLMs in a prompt-based architecture for direct, continuous neural-to-text translation. Core to BP-GPT’s methodology is the contrastive alignment of brain- and text-derived prompts, facilitating robust encoding of semantic information from brain activity into the prompt space of LLMs (Chen et al., 2024, Chen et al., 21 Feb 2025). Recent BP-GPT variants have been extended to multi-modal neural data and have inspired architectural generalizations across neurological disease prediction and multi-modal neural–language modeling (Yang et al., 2024, Xu et al., 12 Apr 2025).

1. Conceptual Foundations and Motivation

BP-GPT emerged to bypass the inherent limitations of existing BCI decoders that operate over small, manually curated vocabularies or rely on direct mapping from neural activity to text space without leveraging the generative prior of pretrained LLMs. In contrast, BP-GPT reconceptualizes neural decoding as an open-vocabulary, prompt-conditioned text generation problem: neural measurement windows (e.g., fMRI, MEG) are mapped into the high-dimensional prompt space used by LLMs such as GPT-2, effectively harnessing LLMs’ linguistic priors for rich semantic reconstruction. This design enables:

End-to-end, open-set text generation with no vocabulary restriction.
Modality-bridging by aligning noisy, high-dimensional brain signals with optimal, text-derived prompts.
Direct exploitation of large-scale LLMs’ semantic, syntactic, and contextual modeling capacity.
Modularity, as more powerful LLMs or new prompt architectures can be substituted into the framework (Chen et al., 2024, Chen et al., 21 Feb 2025).

2. Core Architectural Elements and Mathematical Framework

A canonical BP-GPT system comprises two principal stages:

Stage 1: Text-to-Text “Optimal Prompt” Learning

Ground-truth text stimulus for each neural data window is encoded (typically with a frozen BERT or Llama encoder).
The encoded text representation is mapped (via a learnable transformation) into a sequence of prompt embeddings $\mathit{P}^T_i \in \mathbb{R}^{k \times d}$ .
These text-derived prompts are prepended to an LLM (e.g., GPT-2) and the model is trained (usually with cross-entropy loss) to reconstruct the original text.

Stage 2: Brain-to-Text Prompt Alignment and Decoding

Brain-derived signals for each window (e.g., fMRI volume $x^B_i$ ) are encoded via a neural encoder, typically a multi-layer Transformer, into a “brain prompt” $\mathit{P}^B_i$ .
This brain prompt is used as the prefix for the LLM, which autoregressively generates text.
Crucially, $\mathit{P}^B_i$ is optimized to be contrastively aligned to the “optimal” $\mathit{P}^T_i$ (cosine similarity-based contrastive loss), reducing domain gap.

Total loss (for brain-to-text training):

$L = L_\mathrm{brain} + \alpha L_C$

where $L_\mathrm{brain}$ is the autoregressive text decoding loss and $L_C$ is the prompt alignment loss with temperature $\tau$ and trade-off $\alpha$ (Chen et al., 2024, Chen et al., 21 Feb 2025).

3. Training Regimen and Inference Pipeline

Data Preparation: Brain signals are windowed (e.g., 20 s non-overlapping segments for fMRI), with each segment paired with its corresponding transcribed text. Only pre-defined ROIs (commonly bilateral auditory cortex in fMRI studies) are used for voxel selection.
Two-Stage Supervision: Training proceeds in succession—first, the text-to-text path is optimized to yield text prompts that are maximally predictive for the given stimulus; then, the brain-to-text encoder is trained to predict those prompts from corresponding neural data, while the decoder LLM is optionally fine-tuned for modality transfer.
Inference: At test time, the trained brain encoder outputs a prompt which is injected as a prefix into the LLM. Text is generated token-by-token. Stopping criteria include explicit word count prediction (via a word-rate model) or insertion of a special token “\$” at window boundaries (Chen et al., 2024, Chen et al., 21 Feb 2025).

4. Quantitative Performance and Empirical Analysis

BP-GPT, as evaluated on continuous open-vocabulary auditory fMRI decoding tasks, achieves substantial improvements on semantic-oriented metrics:

Subject	BLEU-1	METEOR	BERTScore
Tang et al. UTS01	0.2331	0.1621	0.8077
BP-GPT UTS01	0.2159	0.2082	0.8320
$x^B_i$ 0 (%)	–7.4	+28.4	+3.1

Analogous gains are observed for multiple subjects and across datasets. Although BLEU-1 drops modestly, METEOR scores increase by as much as +4.61 absolute points (up to +28% relative), and BERTScore improves by up to +3.1 points; all are statistically significant at $x^B_i$ 1 (paired t-test over windows) (Chen et al., 2024, Chen et al., 21 Feb 2025).

Ablation studies show that:

Removal of the contrastive prompt alignment loss significantly degrades METEOR and BERTScore.
Fine-tuning the LLM further boosts all metrics by 1–2% relative.
Use of special stopping tokens outperforms word-rate estimators (+7–10% on METEOR).
Increasing prompt length $x^B_i$ 2 (e.g., $x^B_i$ 3) monotonically improves performance.

5. Extensions, Limitations, and Generalizations

Primary Limitations:

The temporal granularity of brain signals (e.g., 20 s windowing for fMRI) blurs rapid transitions, and absence of punctuation necessitates heuristic segmentation.
ROIs are typically restricted to auditory cortex, omitting higher-order language areas.
Datasets remain limited, e.g., $x^B_i$ 4 for main BP-GPT evaluation (Chen et al., 2024, Chen et al., 21 Feb 2025).

Extensions and Generalizations:

BP-GPT principles have been adapted for disease identification in neurological populations via prompt-enhanced GNNs (“BrainPrompt”), where LLM-derived ROI, subject, and disease prompts are injected into graph neural models for joint multimodal integration (Xu et al., 12 Apr 2025).
Multi-modal LLM frameworks (e.g., NeuGPT) extend BP-GPT ideas to cover EEG, MEG, ECoG, fNIRS, and fMRI, using modality-marked discrete token representations and Transformer fusion, improving text-decoding metrics (BLEU-1 and ROUGE-1F) in MEG-to-text translation (Yang et al., 2024).

Limitations and ongoing challenges:

Robust generalization across modalities and higher spatial-temporal resolution brain signals (EEG/MEG) remains to be demonstrated.
BP-GPT so far relies on fixed text encoders; fine-tuning text encoders or integrating self-supervised contrastive alignment may increase transferability.
Automated or dynamic prompt engineering, beyond fixed-length embeddings, is an area for further research.

6. Scientific and Practical Significance

BP-GPT’s core advance is the direct and modular use of neural representations as LLM prompts, which enables:

Open-vocabulary, semantically aware decoding from high-dimensional brain data.
Modular interfacing, where newer, larger-capacity LLMs may be substituted without retraining upstream encoders.
A generic template for aligning non-linguistic data streams with pre-trained generative models, with potential applications in real-time, high-level BCI, neurological diagnostics, and cognitive neuroscience.

A plausible implication is that, as larger-scale LLMs and multi-modal transformers propagate, BP-GPT–style architectures may converge with next-generation BCIs for direct, online semantic decoding, and for broader applications in clinical and cognitive assessment (Chen et al., 2024, Chen et al., 21 Feb 2025, Yang et al., 2024, Xu et al., 12 Apr 2025).