Mentalese: Symbolic Thought Language

Updated 2 December 2025

Mentalese is a hypothesized mental language characterized by discrete, contentful tokens and systematic rules that enable flexible reasoning and abstraction.
Recent models like NLoTM and ORION operationalize Mentalese using neural and symbolic techniques, achieving high out-of-distribution accuracy and significant computational efficiency.
Large language models reveal latent mentalese codes in belief tracking and social reasoning, offering actionable insights for transparent, efficient AI system design.

Mentalese denotes a hypothesized symbolic, compositional mental language in which cognitive processes unfold, as articulated by the Language of Thought Hypothesis (LOTH). It is characterized by discrete, contentful tokens (mental primitives) and syntactic rules for composing “mental sentences.” Recent advances in machine learning have enabled the formalization, empirical investigation, and partial operationalization of Mentalese in neural architectures, LLMs, and reasoning systems.

1. Theoretical Foundations: Language of Thought Hypothesis

The Language of Thought Hypothesis, proposed by Fodor (1975), posits that cognition occurs over a language-like system—Mentalese—distinct from natural language (Wu et al., 2 Feb 2024, Tanmay et al., 28 Nov 2025). According to LOTH, the mental lexicon comprises atomic symbols encoding basic concepts or properties (“mental words”) that combine compositionally into structures governed by systematic rules (“mental grammar”). This compositionality and systematicity enable flexible reasoning, generalization, and conceptual abstraction. Mentalese is thus discrete, symbolic, and admits hierarchical and rule-based assembly of representations. Empirical cognitive science points to the acquisition of structured conceptual representations from both linguistic and non-linguistic (e.g., visual) experiences, supporting the emergence of LoTH-like mental coding.

2. Formalization in Neural and Algorithmic Models

Recent machine learning research has instantiated Mentalese in discrete neural architectures and symbolic reasoning formats:

Neural Language of Thought Models (NLoTM): NLoTM demonstrates how a neural system can discover and operate over discrete, composable representations aligned with objects and properties in visual scenes (Wu et al., 2 Feb 2024). Using a Semantic Vector-Quantized Variational Autoencoder (SVQ-VAE), input data are mapped into contentful slots, each partitioned into factor-level codebook entries (proxies for Mentalese primitives). An Autoregressive LoT Prior generates sequences of these tokens—“mental sentences”—that encode compositional scene structure. The learned tokens support high out-of-distribution accuracy, hierarchically organized abstraction, and interpretability.
ORION and Structured Symbolic Reasoning: ORION introduces a symbolic “chain-of-thought” (CoT) format modeled after Mentalese in the context of mathematical and planning tasks (Tanmay et al., 28 Nov 2025). Each reasoning step is serialized as “OPERATION:expression;” (with operations e.g., SET, CALC, EQ, SOLVE, ANS), forming compact, executable reasoning chains. This approach yields 4–16× compression relative to standard natural-language CoT and matches or exceeds prior models in accuracy at substantially lower computational cost.

3. Internal Mentalese in LLMs

Empirical work identifies latent symbolic subspaces within the activations of LLMs, interpreted as internal “mentalese” governing social reasoning, belief-tracking, and Theory-of-Mind (ToM) tasks (Zhu et al., 28 Feb 2024). Probing the intermediate representations of models such as Mistral-7B reveals highly linearly-decoded codes for self and other beliefs. Manipulating these latent states via targeted interventions reliably induces changes in social inference behaviors, demonstrating both the functional and potentially causal role of these internal mentalese codes in model reasoning.

Model/Framework	Mentalese Instantiation	Modality	Purpose
NLoTM	Discrete codebooks, “mental sentences”	Visual scenes	Scene generation, compositional abstraction, generalization
ORION	Symbolic CoT (“OP:exp;” syntax)	Math/text	Compressed, verifiable, and efficient symbolic reasoning
Mistral-7B (LLM probing)	Internal ToM/belief codes	Text	Tracking and manipulating beliefs (Theory-of-Mind)

4. Properties and Empirical Evaluation

Mentalese, in its modern implementations, is defined by several critical properties:

Discreteness and Symbolicity: Neural and algorithmic models use codebooks or symbolic operators, enforcing discrete, contentful tokens (Wu et al., 2 Feb 2024, Tanmay et al., 28 Nov 2025).
Compositionality: Representations can be systematically recombined for new conceptual configurations. NLoTM achieves 99.1% out-of-distribution accuracy on tasks requiring novel combinations of primitives, vastly outperforming patchwise or continuous alternatives (Wu et al., 2 Feb 2024).
Executability and Faithfulness: In ORION, each symbolic step is necessary and sufficient for deriving the answer, and sequences are verifiable by formal semantics (Tanmay et al., 28 Nov 2025).
Compression and Efficiency: Symbolic Mentalese expressions are up to 16× shorter and reduce inference/training cost by an order of magnitude without significant loss of accuracy (Tanmay et al., 28 Nov 2025).
Transparency and Steerability: Internal belief codes in LLMs admit linear readout and targeted intervention, revealing their internal “mentalese” structure (Zhu et al., 28 Feb 2024).

5. Model Architectures and Training Procedures

NLoTM: Utilizes the SVQ-VAE for object-centric and factorized coding, with codebooks serving as Mentalese primitives. The Autoregressive LoT Prior generates compositional token sequences, enabling scene synthesis and facilitating generalization beyond training distributions (Wu et al., 2 Feb 2024).
ORION: Employs a two-stage pipeline: supervised fine-tuning on explicit Mentalese traces, followed by reinforcement learning (RLVR) with correctness-based reward and Shorter Length Preference Optimization (SLPO). SLPO dynamically rewards correct, concise traces, yielding optimal efficiency without arbitrary length constraints (Tanmay et al., 28 Nov 2025).
LLM ToM Probing: Feature extraction relies on attention head activations, with linear probes mapping activations to belief states. Interventions adjust activation patterns in the belief-coding subspace, modulating behavior on ToM tasks (Zhu et al., 28 Feb 2024).

6. Broader Implications and Open Research Questions

The operationalization of Mentalese in neural systems provides empirical support for the existence of structured, symbolic-like internal languages in both artificial and potentially biological cognition. Key implications and questions include:

The possibility of bridging low-level neural representations with high-level symbolic reasoning, supporting theories of human conceptual abstraction and transfer (Wu et al., 2 Feb 2024).
Demonstration that neural models, when appropriately structured or probed, possess interpretable, compositional codes amenable to intervention and verification (Zhu et al., 28 Feb 2024).
Open challenges in extending hybrid schemes for continuous and relational factors, aligning learned codes with neural (biological) representations, and scaling symbolic grammars to richer domains with inductive biases for causal and relational reasoning (Wu et al., 2 Feb 2024).
The prospect of harnessing Mentalese for transparent, efficient, and verifiable AI systems that replicate core human cognitive properties (Tanmay et al., 28 Nov 2025).

7. Empirical Benchmarks and Comparative Performance

Across mathematical reasoning, scene generation, and social cognition, Mentalese-based systems delineate the current frontier for compositional abstraction and efficiency. For example:

ORION achieves up to 5× lower inference latency and 7–9× lower training cost versus baseline CoT models, while maintaining 90–98% of their accuracy (Tanmay et al., 28 Nov 2025).
NLoTM exceeds previous object-centric models in generation fidelity (FID = 43.1 vs. 60–65 on CLEVR-Hard) and out-of-distribution accuracy (up to 99.1%) (Wu et al., 2 Feb 2024).
Internal belief spaces in LLMs achieve up to ~95% accuracy for self-belief and ~80% for other-belief decoding, supporting high diagnostic and intervention utility in ToM (Zhu et al., 28 Feb 2024).

Mentalese thus anchors both a theoretical construct and a practical methodology for compositional cognition within both artificial and natural agents.