InMind: Cognitive & Neural Decoding Framework
- InMind is a dual-focus framework that evaluates LLMs on individualized reasoning in social deduction games and decodes visual signals from fMRI using a subject–object disentanglement approach.
- The LLM component measures how well models capture personalized, temporally anchored reasoning through tasks like player identification, trace attribution, and role inference in structured game settings.
- The $i$MIND neural pipeline employs a self-supervised ViT-MAE, orthonormal basis factorization, and dual decoding to achieve state-of-the-art cross-subject generalization and interpretable neural representation.
InMind encompasses both a cognitively anchored evaluation framework for LLMs in capturing and applying individualized human reasoning styles in social deduction games (SDGs), and the MIND neural decoding architecture for subject-invariant decoding of visual signals from fMRI. Both approaches address distinct dimensions of subjectivity and individuation in cognition—one in behavioral strategy, the other in neural representation and decoding—exemplifying convergent themes in computational cognitive science and neural engineering (Li et al., 22 Aug 2025, Yin et al., 22 Sep 2025).
1. Conceptual Foundation and Motivation
InMind (LLM): The InMind framework emerges from limitations in existing theory-of-mind (ToM) benchmarks that typically restrict evaluation to global plausibility of intent judgments or false-belief attribution. These settings fail to probe whether LLMs genuinely internalize the style and trajectory of a specific individual's reasoning as leveraged in real-time, sequential social contexts. SDGs like Avalon, with their transparency of utterances, sequential moves, and evolving private/public states, present a testbed for observing and evaluating actual individualized reasoning behaviors (Li et al., 22 Aug 2025).
MIND (Neural Decoding): In parallel, the MIND (Insightful Multi-subject Invariant Neural Decoding) model is introduced to mitigate the challenge that cross-subject neural decoding from fMRI is dominated by subject-specific variability, which both limits generalization and occludes interpretation of brain-based visual processing. The MIND architecture is designed to explicitly factorize, decode, and interpret both individual- and object-level components of neural representations, enabling scalable, interpretable, and generalizable neural decoding within and across subjects (Yin et al., 22 Sep 2025).
2. InMind LLM Framework: Methodology and Task Structure
The InMind framework is built around a formal structured game representation: where designates the mapping from players to roles, encodes round-level (utterances, game state, and strategy trace), and is the global session reflection. Data is collected under two "modes": Observer (no direct participation) and Participant (active player).
Dual-layer cognitive annotations are attached: (1) round-level strategy traces capturing quasi-veridical records of evolving beliefs, intentions, and inference processes; and (2) a high-level reflective summary , interpreting key events and meta-strategic assessments.
Evaluation is operationalized as a two-stage pipeline:
- Capturing: Profile induction from observer-mode annotation using a “ProfilePrompt”—free-form profiles summarizing temporally diffuse reasoning styles.
- Applying: Given a subject profile 0 and a participant-mode session, the model is evaluated on four tasks:
| Task | Description | Metric(s) |
|---|---|---|
| Player Identification | Rank the true subject in anonymized session | Top-1 accuracy 2 |
| Reflection Alignment | Fill in masked IDs in post-game reflection | Exact match 3 |
| Trace Attribution | Map masked trace segments to correct IDs per round | Match accuracy; adaptation 4 |
| Role Inference | Infer player roles at each round, strict or grouped labeling | Strict/group accuracy |
General-purpose LLMs and reasoning-enhanced models (DeepSeek-R1, QwQ, O3-mini) are compared under zero-shot prompting regimens with enforced structured output, and all data is Mandarin voice-chat transcribed (Li et al., 22 Aug 2025).
3. 5MIND Neural Decoding Pipeline: Architecture and Objectives
The 6MIND model is a three-stage end-to-end pipeline for multi-subject fMRI decoding:
- Self-supervised ViT-MAE Pretraining The input is a flattened, uniformly padded voxel vector 7, split into 8 non-overlapping 64-voxel patches. The model employs a 12-layer ViT encoder and an 8-layer transformer decoder. Masked patches (75%) are reconstructed with a voxel-wise mean-squared error loss:
9
yielding shared neural features 0, with 1.
- Subject–Object Disentanglement Each patch embedding is factorized via a learned orthonormal basis 2, with explicit decomposition:
3
where 4 and 5, 6.
- Dual Decoding: Biometric and Semantic
- Biometric (subject ID): Pool 7, apply linear classifier to output logits for 8 subjects.
- Semantic (object classification): Freeze CLIP visual features; cross-attend CLIP tokens with 9 via multi-head attention, and classify pooled outputs.
The objective function in the dual decoding phase combines: 0 with classification (cross-entropy or binary cross-entropy) for subject and object, and orthonormality regularization on 1 (2) (Yin et al., 22 Sep 2025).
4. Empirical Evaluation and Comparative Results
InMind (LLM):
The primary case study is on Avalon (6-player, Mandarin, transcribed voice chat). Dataset statistics: 30 sessions, 884 utterances, 160 traces, and 30 reflections. Quantitative results highlight several constraints:
- Player identification: General LLMs achieve top-1 accuracy near baseline (0.16 for GPT-4o), DeepSeek-R1 marginally higher (0.24). Top-3 accuracy remains modest. BERT-based cosine matching performs comparably, indicating frequent reliance on surface lexical cues.
- Reflection alignment: With explicit trace input, models reach 380% accuracy; without, accuracy drops to 430%, showing strong anchoring dependency.
- Trace attribution: Incremental gains from prior-trace context are small or negative (e.g., 5 GPT-4o), indicating limited true adaptive reasoning.
- Role inference: Strict match accuracy is 630–40%, relaxed grouping 760–70%. Reasoning-enhanced models outperform general-purpose LLMs, but performance remains far from ceiling (Li et al., 22 Aug 2025).
8MIND (Neural Decoding):
Utilizing the NSD dataset (8 subjects, 10,000 images/subject, 9 train, 0 test), 1MIND delivers:
| Method | mAP | AUC | Hamming | Subject-ID ACC | Generalization (mAP) |
|---|---|---|---|---|---|
| Single-subject ViT/MLP | .24–.26 | .82–.85 | — | — | — |
| CLIP-MUSED | .258 | .877 | — | — | — |
| 2MIND (fMRI only) | .310 | .913 | .027 | .999 | .784 (holdout) |
| 3MIND (fMRI+image) | .784 | .984 | .012 | .999 | .790 (all) |
This demonstrates state-of-the-art performance, elimination of scalability limits, and robust cross-subject generalization (Yin et al., 22 Sep 2025).
5. Key Insights and Theoretical Contributions
InMind (LLM):
- Standard LLMs default to lexical mimicry and shallow pattern recognition rather than temporally consistent, individualized reasoning. Reflection and role inference tasks cannot be satisfied unless models receive explicit round-level traces, i.e., temporal anchoring is not learned in the absence of direct cues.
- Reasoning-enhanced models, notably DeepSeek-R1, manifest partial style-sensitive reasoning: backward inference, hedging/certainty modulation, and context-consistent assignments improve, but absolute scores remain low. Performance gains under grouped scoring suggest that coarse-grained cognitive traits are more easily aligned than precise role attribution.
4MIND (Neural Decoding):
- Produces interpretable voxel–object activation fingerprints by tracing Grad-CAM attributions through ViT layers, yielding visualization of region-selective activations (e.g., consistent ventral stream responses to “horse”/“bird,” strong social stimulus activation for “person”).
- Reveals clear subject-specific attention dynamics during rapid (3 s) visual exposure: shared and residual attention maps highlight both universally salient object detection (e.g., “chair”) and idiosyncratic focal patterns (e.g., “cup” receiving elevated attention by specific subjects correlated with high recognition probability).
- Clustering voxels by mean and standard deviation of activation exposes functional specialization: “bystanders,” “discriminators,” and “supporters” for semantic decoding roles (Yin et al., 22 Sep 2025).
6. Limitations and Prospects
LLM Evaluation:
- Current LLMs lack the ability to internalize individualized reasoning styles without explicit temporal and strategic annotation. Dynamic adaptation remains shallow, with models frequently treating sequential rounds as independent.
- Future work in InMind aims to scale to diverse SDGs, automate strategy profile induction to mitigate annotation bias, and integrate memory/belief tracking modules to maintain cross-round coherence. Extension to cooperation and negotiation scenarios is proposed as crucial domains for contextually adaptive inference (Li et al., 22 Aug 2025).
Neural Decoding:
- 5MIND establishes a foundation for more interpretable and generalizable neural decoding, but practical constraints (e.g., requirement of large fMRI datasets, pre-defined object basis dimensionality) persist.
- A plausible implication is that learned subject–object disentanglement and interpretability frameworks could inform future low-shot or online neural decoding domains, and illuminate the neural basis of visual and attentional idiosyncrasy at population scale (Yin et al., 22 Sep 2025).
References:
(Li et al., 22 Aug 2025) InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles (Yin et al., 22 Sep 2025) 6MIND: Insightful Multi-subject Invariant Neural Decoding