ARCHE: Latent Reasoning Chain Extraction

Updated 23 November 2025

ARCHE is a paradigm that extracts continuous reasoning chains from the hidden states of neural models, offering a model-agnostic view of inferential processes.
It employs recurrent operators, activation steering, and cross-modal techniques to align textual and visual inputs for improved model explainability.
Empirical benchmarks show ARCHE enhances reasoning fidelity and efficiency while highlighting challenges in translating latent chains to human-readable rationales.

Latent Reasoning Chain Extraction (ARCHE) is a paradigm for surfacing, modeling, and evaluating the internal multi-step inferential processes of large language and multimodal models. Unlike classical chain-of-thought (CoT) techniques based on explicit natural language traces, ARCHE focuses on extracting—or inducing—structured reasoning chains directly from models' latent representations, thereby unmasking the hidden geometry of model cognition across text, vision, and cross-modal domains (Li et al., 16 Nov 2025, Zhu et al., 8 Jul 2025, Pham et al., 18 Aug 2025, Ma et al., 4 Nov 2025, Kuzina et al., 2 Oct 2025, Zhang et al., 2024, Zhang et al., 18 Feb 2025, Chen et al., 2019).

1. Conceptual Foundations and Definitions

ARCHE formalizes the extraction of reasoning chains that are latent—i.e., continuous and never explicitly tokenized—within a model's internal state evolution. Given an input (such as a scientific text, visual prompt, or question), ARCHE seeks to decompose the model’s solution into a stepwise trajectory, each step corresponding to a discrete logical, semantic, or perceptual operation, but represented in the continuous space of neural activations or hidden states. In text-based LLMs, this is typically the sequence $C = (h_1, h_2, \ldots, h_T)$ , with each $h_t \in \mathbb{R}^d$ representing a "thought vector" summarizing the model's internal state after t inference steps (Zhu et al., 8 Jul 2025, Ma et al., 4 Nov 2025, Pham et al., 18 Aug 2025).

A key insight is that such latent reasoning chains can be directly extracted, manipulated, or analyzed without resorting to language-based traces or manual annotation, thereby enabling model-agnostic introspection and efficient inference (Zhang et al., 2024, Kuzina et al., 2 Oct 2025, Li et al., 16 Nov 2025).

2. Mathematical and Algorithmic Formulations

Latent reasoning chains are framed in terms of recurrent or iterative operators over a model's hidden states. In the most general setup (Zhu et al., 8 Jul 2025):

The latent chain $C$ is a sequence $(h_1, ..., h_T)$ with each $h_t = f_\theta(h_{t-1}, x)$ , starting from encoded input $h_0 = g_0(x)$ .
The chain can span different granularities, including layerwise activations ( $h_t^{(l)}$ ), block-level loops, or cross-modal fusion modules.
The chain-extraction process can seek the MAP trajectory $C^* = \arg\max_C P(C|H)$ , where $H$ are all observed activations, and the search is either explicit (activation-based looping, e.g. “vertical” Universal Transformers) or implicit (hidden-state propagation as in linear memory models).

In vision-language contexts, as exemplified by MCOUT (Pham et al., 18 Aug 2025) and CoCoVa (Ma et al., 4 Nov 2025), latent thoughts $h_t$ (or $z_t$ ) are jointly refined by textual and visual module interactions, with iteration implemented via appended latent tokens, cross-attention, and feedback over output sequences.

ARCHE methods can be divided by supervision and extraction regime:

Pure unsupervised extraction: Probing or clustering hidden activations (Ma et al., 4 Nov 2025, Pham et al., 18 Aug 2025).
Steering via activation differences: Computing and injecting “reasoning direction” vectors (e.g., μ_CoT – μ_Baseline) to induce chain-of-thought behavior (Zhang et al., 2024).
Distillation from explicit CoT: Student models supervised on the compressed internal state trajectories of teacher models that generate full chain-of-thoughts (Kuzina et al., 2 Oct 2025).
Pseudogold extraction: Heuristically generating and aligning explicit reasoning chains to serve as weak supervision (e.g., NER-graph-based chain extraction for multi-hop QA) (Chen et al., 2019).

3. Modalities and Model Architectures

While ARCHE originated in textual LLMs, recent advances extend the methodology to unimodal and multimodal domains:

Textual LLMs: ARCHE captures latent chains using either activation recurrence (looping layers), hidden memory propagation (linear or nonlinear state), or explicit token-contrast steering (Zhu et al., 8 Jul 2025, Zhang et al., 2024, Kuzina et al., 2 Oct 2025, Zhang et al., 18 Feb 2025).
Vision-LLMs: In frameworks such as MCOUT and CoCoVa, the chain is composed of continuous latent vectors that couple vision and text through cross-modal attention and iterative refinement (Pham et al., 18 Aug 2025, Ma et al., 4 Nov 2025). In MCOUT, latent vectors are appended token-like to the LLM input, whereas in CoCoVa, a Latent Q-Former iteratively generates new cross-modal latent states with a token selection mechanism focusing computation on salient regions of the input.
Benchmarked Extraction: The “ARCHE Bench” introduces task-driven extraction and evaluation, where output is a structured Reasoning Logic Tree (RLT) with each node representing a fact or hypothesis and each edge marked as deduction, induction, or abduction (Li et al., 16 Nov 2025).

Table: Variants of ARCHE Across Modalities

Modality	Chain Representation	Extraction Method
Text	Hidden states $h_t$	Recurrence, steering, distillation
Vision-Language	Latents $z_t$	Cross-modal attention, LQ-Former
Multi-hop QA	Discrete chain of facts	Graph-based extraction, heuristics

4. Training Objectives, Losses, and Extraction Procedures

Training and extraction protocols in ARCHE implementations vary by architecture:

Auxiliary and Answer Losses: MCOUT employs auxiliary cross-entropy losses at each latent step combined with a final answer loss: $\mathcal{L}_{\text{total}} = \sum_{k=1}^{N_t} \mu \cdot \mathcal{L}_{\text{aux}}^{(k)} + \mathcal{L}_{\text{final}}$ (Pham et al., 18 Aug 2025).
Contrastive and Diffusion Losses: CoCoVa employs symmetric InfoNCE contrastive losses and diffusion-based visual reconstruction to enforce cross-modal alignment and ensure the chain can recover both the image and text (Ma et al., 4 Nov 2025).
KV-Cache Distillation: KaVa aligns a student’s latent tokens with a compressed teacher KV-cache using a combined L1/MSE loss over keys and values, along with teacher/student cross-entropy losses and CODI hidden state matching (Kuzina et al., 2 Oct 2025).
Activation Steering: ARCHE–as activation intervention–injects the computed steering vector at selected layers and token positions, tuning hyperparameters for optimal performance (Zhang et al., 2024).
Quality Filtering and Self-training: SERT discovers latent chains in small models by zero-shot sampling, multi-step filtering (length, repetition, PPL), and self-finetuning to move probability mass to high-quality latent reasoning paths (Zhang et al., 18 Feb 2025).

Latent chain extraction often involves saving the chain of hidden states during inference and, for interpretability, optionally projecting them back into token or semantic space via dedicated decoders or rationale heads (Pham et al., 18 Aug 2025, Ma et al., 4 Nov 2025).

5. Benchmarking, Evaluation Protocols, and Empirical Results

Explicit evaluation of latent chain extraction is addressed in ARCHE Bench (Li et al., 16 Nov 2025), which requires models to reconstruct Reasoning Logic Trees (RLTs) over scientific texts:

Output Format: Directed acyclic graphs, with nodes labeled by evidence (from Introduction or cited works) and each step marked by Peircean inference mode (deduction, induction, abduction).
Metrics:
- Entity Coverage (EC): Fraction of key entities from gold-standard Introduction present in the RLT.
- Reasoning Edge Accuracy (REA): Proportion of inference steps judged logically valid by ensemble model voting.

No current model achieves >50% REA; top models demonstrate the challenge of balancing broad entity coverage and stepwise logical rigor.

Additional ARCHE variants show strong performance gains and efficiency:

MCOUT achieves up to 8.23% accuracy improvement and 8.27 BLEU over baseline on multimodal QA (Pham et al., 18 Aug 2025).
KaVa yields nearly explicit-CoT accuracy on high-complexity arithmetic and NL traces but with 89% lower inference compute (Kuzina et al., 2 Oct 2025).
ARCHE steering is as effective as natural language CoT on GSM8k, MMLU, and ARC AI2, with minimal computational overhead (Zhang et al., 2024).
In multi-hop question answering, ARCHE-style discrete chain extraction with context-aware encoding delivers state-of-the-art accuracy and F1 on WikiHop and HotpotQA without any gold annotated supporting facts (Chen et al., 2019).

6. Interpretability, Analysis, and Open Challenges

ARCHE makes the internal inferential processes of neural models tractable by rendering the reasoning trajectory explicit, continuous, and analyzable (Zhu et al., 8 Jul 2025, Li et al., 16 Nov 2025). Empirical probing (e.g., t-SNE/PCA of latent chains), qualitative analyses (attention focus, rationale projections), and quantitative association (latent-token/text/image alignment via SVM or MLP classifiers) validate that extracted latent chains encode semantically meaningful, stepwise transformations (Ma et al., 4 Nov 2025, Pham et al., 18 Aug 2025).

However, foundational challenges remain:

The mapping from continuous chains to human-level steps is often non-invertible; reliable decoding to natural language “rationales” is still under development.
Latent chain fidelity depends sensitively on model architecture, training supervision, and the choice of extraction protocol.
No current approach reliably achieves complete and logically valid reasoning decomposition in technically rigorous domains, as ARCHE Bench highlights (Li et al., 16 Nov 2025).

A plausible implication is that future research must combine improved chain-to-language projections, richer paradigm-aligned supervision, and broader benchmarking to close the gap between implicit model cognition and explicit, auditable reasoning.

ARCHE is rapidly evolving from purely analytical tools (probing hidden states for interpretability) to training and evaluation frameworks shaping how next-generation models internalize and expose their reasoning. Extensions under active exploration include:

Task-conditioned steering vectors, per-head activation-geometry interventions, and mixtures for finer control (Zhang et al., 2024).
Infinite-depth and diffusion-based latent reasoning to enable reversible, globally consistent chains (Zhu et al., 8 Jul 2025, Ma et al., 4 Nov 2025).
Larger and multi-domain benchmarks, improved dynamic stopping, and interpretability protocols (Li et al., 16 Nov 2025).
Integration of ARCHE into end-to-end learning pipelines for scientific discovery, multi-step planning, and complex multimodal reasoning.

By establishing a unified latent reasoning extraction methodology that is model-agnostic, scalable, and empirically validated, ARCHE provides a principled foundation for both probing and advancing the rigor of neural reasoning systems across diverse problem domains.