Auditory Brain Passage Retrieval
- Auditory Brain Passage Retrieval is a neural IR technique that maps EEG signals from listening tasks to passage-level representations without textual queries.
- It employs a dual-encoder architecture with an EEG encoder using transformer layers and a frozen BERT text encoder, comparing embeddings via normalized dot product.
- This approach improves accessibility for users with disabilities and shows significant gains in retrieval metrics like MRR and Hit@k over traditional text-based baselines.
Auditory Brain Passage Retrieval (ABPR) is the research area and applied methodology in which electroencephalography (EEG) signals, specifically captured during auditory perception (i.e., listening to speech), are directly mapped to passage-level representations for Information Retrieval (IR) without intermediate textual query translation. ABPR builds on the broader architectural foundation of Brain Passage Retrieval (BPR) but utilizes auditory stimuli rather than the previously dominant visual presentations. This line of inquiry addresses the cognitive and accessibility challenges in query formulation, enabling users—especially those with visual/motor disabilities or using voice-driven interfaces—to retrieve relevant passages purely via brain activity accompanying listening tasks (McGuire et al., 20 Jan 2026).
1. Conceptual Foundations and Motivation
Traditional IR systems depend on explicit text queries, a process prone to inaccuracies due to cognitive translation overhead and limited accessibility for specific populations. BPR circumvents this by projecting brain signals and text passages into a shared embedding space where relevance can be assessed by direct similarity, thus reflecting the user's latent information need more naturally (McGuire et al., 20 Jan 2026).
Previous BPR research exclusively used EEG acquired during visual (reading) tasks (McGuire et al., 2024). However, this neglected the needs of visually impaired users or voice-based interaction settings. Introducing auditory EEG as a query modality addresses these gaps, directly supporting scenarios where users are listening rather than reading, making BPR systems more widely applicable (McGuire et al., 20 Jan 2026).
2. Methodological Framework
ABPR is implemented as a dual-encoder dense retrieval system. The architecture comprises two encoders:
- EEG Encoder (): Maps preprocessed, word-synchronous EEG segments (where is the number of words and the flattened features with channels and timepoints) into -dimensional query embeddings. is transformed via a learned projection, processed by stacked Transformer layers, and pooled to yield a single normalized vector (McGuire et al., 20 Jan 2026).
- Text Encoder (): Adopts a frozen BERT-base-uncased model. Each candidate passage is encoded into the same 0-dimensional space by extracting hidden states from BERT (optionally pooled via several strategies).
Query-passage relevance is assessed by normalized dot product:
1
Both encoders are trained with a cross-modal InfoNCE-style contrastive loss, where in-batch negatives act as distractors (McGuire et al., 20 Jan 2026).
3. Pooling Strategies for Query Construction
Four pooling operations are compared for both EEG and text streams:
- CLS Pooling: Prepends a learned [CLS] token; the final embedding is taken as the query/passage representation.
- Mean Pooling: Computes the mean across sequence outputs.
- Max Pooling: Element-wise maximum across sequence positions.
- Multi-vector Pooling (ColBERT-style): Retains all sequence-level vectors for late interaction comparison.
CLS pooling achieves the highest retrieval performance across modalities and configurations (McGuire et al., 20 Jan 2026).
4. Datasets and Experimental Protocol
Two datasets are leveraged for systematic comparison:
| Dataset | Subjects | Modality | Trials (train/validation) | Task | Preprocessing |
|---|---|---|---|---|---|
| Alice | 49 | Auditory (listening) | 1,200 / 300 | Listen to "Alice in Wonderland", word-aligned EEG, Inverse Cloze Task masking | Band-pass filtering, ICA, epoching |
| Nieuwland | 51 | Visual (reading) | 1,200 / 300 | Read narrative passages, word-by-word | Same as Alice pipeline |
Each trial comprises a query span (30% of a passage, selected and masked with 2) and a candidate passage (McGuire et al., 20 Jan 2026). Lexical overlap between the two datasets is minimal (Jaccard 3), ensuring that cross-sensory performance reflects conceptual learning, not surface cues.
5. Cross-Sensory Training and Evaluation Metrics
ABPR applies both single-modality and cross-sensory training (combining auditory and visual datasets). The training objective maximizes similarity between genuine (EEG, passage) pairs and minimizes it for in-batch negatives:
4
where 5.
Performance is assessed using Mean Reciprocal Rank (MRR) and Hit@k on the held-out test set:
6
Empirical results (CLS pooling):
| Model | MRR | Hit@1 | Hit@10 |
|---|---|---|---|
| Audio-only (CLS) | 0.362 | 0.220 | 0.668 |
| Visual-only (CLS) | 0.139 | 0.074 | 0.262 |
| Cross-sensory (audio eval) | 0.474 | 0.314 | 0.858 |
| Cross-sensory (visual eval) | 0.256 | 0.141 | 0.515 |
Cross-sensory training achieves 31% gain in MRR, 43% gain in Hit@1, and 28% gain in Hit@10 over the best single-modality baseline. Combined auditory EEG models surpass the BM25 lexical baseline on auditory queries: MRR 0.474 (EEG) vs. 0.428 (BM25) (McGuire et al., 20 Jan 2026).
6. Practical and Theoretical Implications
- Accessibility and Interface Design: ABPR enables voice-based and non-traditional IR, directly benefiting visually impaired and motor-impaired users, and provides robust performance even under severe passage masking (where BM25 performance degrades) (McGuire et al., 20 Jan 2026).
- Robustness: The model is insensitive to low lexical overlap, making it less brittle to vocabulary shift than text-based retrieval.
- Neural Query Semantics: Auditory EEG queries encode semantic intent at the neural level, surpassing word-level literal matching by conventional IR methods.
- Data Efficiency: Cross-sensory training leverages scarce data by sharing information across modalities, outperforming auditory- or visual-only models.
7. Limitations and Future Research Directions
- Task Paradigm: Current ABPR systems operate under passive comprehension (listening/reading), rather than real-world active search/query formation; this may limit generalizability to spontaneous or interactive search (McGuire et al., 20 Jan 2026).
- Dataset Diversity: Results are based on two non-overlapping text sources, so sensory or text-diversity confounds cannot be fully excluded.
- Model Optimization: The text encoder is frozen; joint fine-tuning or adapter-based training could yield additional improvements.
- Scalability: Real-time inference and on-device deployment are currently unexplored; further validation with MEG, fNIRS, or real-world corpora remains open.
- Active Querying: Recording EEG during active, spontaneous queries and expanding to other neuroimaging modalities (MEG, fNIRS) are high-priority directions.
In summary, Auditory Brain Passage Retrieval demonstrates that EEG recorded during natural listening can be used as a direct, modality-agnostic query for neural IR, exhibiting robust cross-modal generalization and establishing neural queries as competitive or superior to strong text-based baselines. This validates neural IR pipelines for practical, accessible interface scenarios and suggests broad avenues for further multimodal, real-time, and task-adaptive advancements (McGuire et al., 20 Jan 2026).