Three-Tiered Understanding Framework
- Three-Tiered Conception of Understanding is a hierarchical framework defining distinct levels, from basic feature mapping to advanced mechanistic reasoning.
- Tier 1 focuses on surface description, Tier 2 on reliable causal inference, and Tier 3 on principled representation and algorithmic justification.
- Evaluation metrics such as accuracy, consistency, and verifiability guide the assessment of both human cognition and AI systems.
A three-tiered conception of understanding organizes the notion of “understanding” into a hierarchy of levels, each marking a qualitatively distinct capability or epistemic achievement. This framework is operationalized across philosophy, cognitive science, and the evaluation of artificial intelligence and LLMs. Despite terminological diversity, contemporary research consistently anchors these tiers in successively deeper forms of explanation, representational structure, and computational capacity. Across theoretical, empirical, and mechanistic perspectives, the three-tiered scheme establishes precise conditions under which systems—whether human, scientific, or artificial—are said to move from surface mapping to principled, mechanistic, or algorithmic grasp of phenomena.
1. Taxonomies of the Three Tiers
Multiple lines of research have formalized the three-tiered structure, each mapping the tiers to distinct forms of understanding:
| Source | Tier 1 | Tier 2 | Tier 3 |
|---|---|---|---|
| Storks et al. (TRIP) | Scene Description | Causal Inference | Evidentiary Justification |
| Beckmann & Queloz | Conceptual Understanding | State-of-the-World Understanding | Principled Understanding |
| Moore | Understanding-as-Mapping | Understanding-as-Reliability | Understanding-as-Representation |
| Hubert | Understanding-What | Understanding-Why | Understanding-How (Mechanistic) |
| Graham & Granger (G-A) | CFG/Stack: Sequence Prediction | IXG/Nested Stack: Syntax | CSG/Multi-Stack: Symbolic Logic |
Each taxonomy articulates transitions from superficial or “surface-level” abilities—mere discrimination of plausible scenarios, input-output mapping, or surface grammaticality—toward progressively more robust causal, mechanistic, or computationally principled capacities (Storks et al., 2021, Beckmann et al., 7 Jul 2025, Moore, 2022, Hubert, 2021, Graham et al., 5 Mar 2025).
2. Formal Criteria and Mechanistic Realizations
The tiers are usefully distinguished by their formal characterizations and associated evaluation protocols.
- Tier 1: Mapping, Description, or Feature Extraction. At this level, a system either maps inputs to outputs (function ) or recognizes salient features as directions in a latent space, yielding behavioral performance or surface factual discrimination. No claim is made about internal unification or compositional structure. Empirical evaluation typically uses behavioral benchmarks or plausibility judgments (Storks et al., 2021, Moore, 2022, Beckmann et al., 7 Jul 2025).
- Tier 2: Reliability, Causal Inference, or State Tracking. This tier requires not just correct mapping, but consistent and robust performance across a class of queries, typically measured as reliability (with a performance threshold) (Moore, 2022). Mechanistically, models develop contingent factual connections and internal mechanisms for causal inference—e.g., conflict detection in narratives, MLP factual projections in LLMs, or nested stack memory supporting syntactic hierarchy (Beckmann et al., 7 Jul 2025, Graham et al., 5 Mar 2025). Systems must explain why some outcome is warranted, often by identifying breakpoints or contradictions in narrative or causal structure (Storks et al., 2021, Hubert, 2021).
- Tier 3: Representation, Justification, Mechanistic or Principled Reasoning. Full understanding is only attributed when the system's internal representations align structurally with human-like representations (quantified by a distance ), or when mechanistic, compositional circuits realize general principles (Moore, 2022, Beckmann et al., 7 Jul 2025). At this level, a system no longer relies on rote or surface reliability but generates correct outputs via general algorithms, circuit-based explanations, or model-based manipulations of internal state—enabling counterfactuals, abstraction, and transfer (Hubert, 2021, Graham et al., 5 Mar 2025). In empirical settings, this is tested via verifiability conditions (complete chain-of-evidence logic), causal interventions, circuit discovery, or advanced grammar benchmarks.
3. Cognitive and Computational Correspondences
The three-tiered conception is anchored in both philosophical (epistemic) and computational (mechanistic or formal-grammar) traditions:
- Cognitive Analogs: Tier 1 parallels surface event recognition (“What happened?”); Tier 2, causal chain construction (“Why did it happen?”); Tier 3, micro-level mechanistic explanation (“How, in virtue of system structure, did this outcome arise?”) (Storks et al., 2021, Hubert, 2021).
- Computational Hierarchies: In grammar/automata theory, Tier 1 corresponds to context-free grammars (single stack), Tier 2 to indexed grammars (nested stacks), and Tier 3 to context-sensitive grammars or linear-bounded automata (multi-stack or bounded-tape) (Graham et al., 5 Mar 2025). Functional realization in transformers aligns: small models manage sequence mapping/feature extraction, large models develop inter-feature factual structure, and only highly augmented architectures (IALLMs) reliably support algorithmic, logical, or multi-schematic competences.
- Neural Mechanism: Mechanistic interpretability demonstrates that transformers implement these tiers via successively more compositional and reusable internal mechanisms: first, low-dimensional features; then, fact and state-tracking MLPs; finally, specific subnetworks (circuits) for algorithmic tasks (Beckmann et al., 7 Jul 2025).
4. Evaluation and Metrics Across Tiers
Multi-tiered frameworks require correspondingly granular metrics:
- Single-task accuracy (): Fraction of cases where the surface label (e.g., plausible story) is correct.
- Consistency (): Fraction where both outcome and the intermediate (e.g., causal contradiction) are correctly flagged.
- Verifiability (): Fraction where the output, causal pinpoint, and all explanatory attributes (e.g., entity physical states) match ground truth. By construction 0, and true understanding would satisfy 1 (Storks et al., 2021).
- Reliability 2 and Representation Distance 3: Used to distinguish behavioral consistency from representational alignment (Moore, 2022).
Empirical studies illustrate that state-of-the-art systems often achieve high Tier 1 accuracy, moderate Tier 2 consistency, and poor Tier 3 verifiability, indicating a gap between output reliability and genuine explanatory depth (Storks et al., 2021, Moore, 2022, Beckmann et al., 7 Jul 2025).
5. Human vs. Machine Forms of Understanding
While human cognition and current AI architectures both exhibit tiered progression, there are critical divergences:
- Mechanism Accumulation: LLMs often solve tasks via “bags of heuristics”—redundant parallel subnetworks—rather than the parsimonious, unified mechanisms favored by human cognition. This phenomenon of parallel mechanisms distinguishes machine understanding as potentially “richer and stranger,” lacking a norm for parsimony (Beckmann et al., 7 Jul 2025).
- Dependency Structure: Human learning often proceeds: concepts 4 facts 5 principles; analogous to the model's progression from feature extraction to factual connections to general circuits (Beckmann et al., 7 Jul 2025, Hubert, 2021).
- Computational Boundaries: Empirically, transformers only cross from Tier 1 to Tier 2 at large scale; transition to robust Tier 3 (e.g., full context-sensitive inference) still requires explicit architectural or memory augmentation (multi-stack, RL integration, external scratchpads) (Graham et al., 5 Mar 2025).
6. Applications, Experimental Probes, and Open Directions
Tiered frameworks have motivated rigorous experimental and methodological practices:
- Datasets: TRIP enables direct, multi-tier evaluation of commonsense and physical reasoning with dense annotation at all levels (Storks et al., 2021).
- Probing and Intervention: Linear probes, circuit ablation, and causal mediation experiments target specific representations or mechanisms to test for Tier 3-type alignment (Moore, 2022, Beckmann et al., 7 Jul 2025).
- Computational Benchmarks: Tasks drawn from grammar-automata hierarchy stress-test model competence at each tier, revealing phase transitions and architectural bottlenecks (Graham et al., 5 Mar 2025).
Critical open challenges include formalizing the precise representation-space distance sufficient for human-like understanding, quantifying the computational and data resources required to cross tiers, and developing neuro-inspired architectural innovations to close Tier 2–3 gaps. There is also an emerging imperative to move beyond a binary conception of “does the model understand?” to a detailed mapping of how distinct, often non-human, forms of machine understanding are instantiated (Beckmann et al., 7 Jul 2025, Moore, 2022).
7. Philosophical and Scientific Significance
The three-tiered conception underpins philosophical distinctions between description, explanation, and mechanistic modeling. It provides epistemic criteria against which understanding—human or artificial—can be rigorously evaluated, eschewing both mere behavioral sufficiency and the uncritical elevation of internal structure. Mechanistic and representational depth emerge as epistemic ideals, necessary for robust causal inference, generalization, and explanatory power (Hubert, 2021, Moore, 2022, Beckmann et al., 7 Jul 2025).
In sum, tiered frameworks enable a scientifically principled approach to measuring, engineering, and theorizing about understanding across systems, disciplines, and computational substrates. They offer both conceptual clarity and actionable guidance for the ongoing development and assessment of advanced reasoning systems.