Transformer Programs: Interpretable Computation

Updated 8 March 2026

Transformer Programs are algorithmic artifacts extracted from transformer networks, enabling interpretable and verifiable computations.
They leverage formal languages like C-RASP and RASP to structure operations and disentangle complex neural processes.
They integrate learning, symbolic synthesis, and SMT-based verification to achieve robust, human-readable, and efficient program extraction.

Transformer Programs are formally structured, intrinsically interpretable algorithmic artifacts encoded and, more importantly, explicitly extractable from transformer neural networks. This paradigm contrasts with conventional transformers whose learned weights typically encode programs in a highly entangled, uninterpretable manner. By leveraging restricted architectural designs or discrete operational languages—such as C-RASP or RASP—Transformer Programs enable interpretable, verifiable, and often formally analyzable computation within transformer models. Research from 2023–2026 has established both the theoretical foundations and practical algorithms for learning, extracting, verifying, and synthesizing transformer-level programs across symbolic, numeric, and algorithmic domains (Friedman et al., 2023, Strobl et al., 2024, Zhang et al., 9 Jan 2026, Jiang et al., 18 Feb 2026).

1. Formal Languages for Transformer Programs: C-RASP and RASP

The foundational formalism for transformer programs is C-RASP (Counting Restricted Access Sequence Processing). A C-RASP program consists of a composition of rules over a finite word, each rule being either Boolean or counting, with explicit semantics for position-local and windowed operations. The program operates over finite sequences via layered applications of these rules, with the output determined by the final Boolean rule evaluated at the last position.

C-RASP syntax includes:

Boolean rules with operations (∧, ∨, ¬), predicates over the alphabet, and counting or window-windowed comparisons.
Counting rules as first-order expressions using $\#$ (total or windowed occurrence count) and compositional arithmetic.

C-RASP is strictly characterized: $P = (R_1, R_2, \dots, R_k), \quad R_i \in \begin{cases} \text{Boolean: } B_i \coloneqq e^B_i \ \text{Counting: } C_i \coloneqq e^C_i \end{cases}$ Semantically, for an input word $w = a_1a_2\cdots a_n$ , the evaluation of each expression at position $j$ recursively composes those of prior rules and base predicates.

Expressivity: C-RASP characterizes precisely the class of sequence tasks that can be solved by length-generalizable transformers with softmax attention and standard positional encodings: all languages recognized by such transformers correspond to a C-RASP program (Jiang et al., 18 Feb 2026). RASP and its extensions (B-RASP, B-RASP[pos], S-RASP) are expressive enough to describe first-order rational and polyregular sequence-to-sequence functions, and can be compiled directly into transformer architectures (Strobl et al., 2024).

2. Learning and Extraction of Transformer Programs

Transformer Programs are distinguished by their learnability—either through architectural constraint and discrete optimization, or by symbolic synthesis from data. Two primary extraction methodologies have emerged:

Transformers are deliberately structured such that every attention head and MLP module correspond one-to-one with interpretable RASP-style operations.

Residual stream disentanglement: Each variable is assigned a distinct subspace of the hidden state; programs are assembled by concatenating these variable subspaces layer-wise.
Gated attention and hard predicates: Gumbel-Softmax or temperature-annealed sampling ensures that each head only routes information along specific, discrete circuits.
Program decompilation: After convergence, heads and MLPs can be systematically converted to human-readable code (e.g., Python), reflecting the precise logical or arithmetic function learned.
Extraction and simplification: Attention primitives are extracted by hypothesis testing over routing indices, while arithmetic is recovered using symbolic regression and analytic simplification (Zhang et al., 9 Jan 2026).

Transformer programs in C-RASP are synthesized directly from examples using local search:

Program shape definition: A bounded space of candidate program trees (number of Boolean/counting rules, maximum constant, etc.).
Search and mutation: Programs are locally mutated (syntactic rewrites, resampling, or micro-edits).
Objective: A loss combining misclassification count (on labeled examples), program size, and unreachable rules, with a hard priority on correctness.
Optimization: Simulated annealing, with reheating and pruning, finds the minimal program consistent with the data.
Verification: The synthesized C-RASP is translated to a Lustre synchronous dataflow program, and correctness is established via SMT-based model checking (KIND 2).

Empirically, this synthesis+verification pipeline achieves perfect accuracy and program minimization on a suite of regular, context-free, and counting languages within seconds to minutes (Jiang et al., 18 Feb 2026).

3. Verification, Minimization, and Formal Methods

A unique advantage of transformer programs expressed in C-RASP (or RASP) is the availability of efficient, algorithmic formal verification. Core elements include:

Encoding to Lustre: Each C-RASP rule is directly compiled to a small set of Boolean/integer equations with time-indexed semantics.
Assertion properties: Properties such as language inclusion, equivalence, or emptiness are embedded as Lustre safety properties (□A).
Model checking: SMT-based engines (CVC5, Z3) with KIND 2 verify the property or produce counterexamples for further learning or repair.
Program minimization and constrained learning: The pipeline cycles between synthesis (correct-by-examples on current counterexamples), verification, and refinement, yielding provably minimal and spec-conforming transformer programs.

This combination brings decidability to important practical fragments, even though general emptiness/inclusion for arbitrary C-RASP (due to windowed counts and unbounded constants) is undecidable (Jiang et al., 18 Feb 2026).

4. Empirical Performance and Benchmarks

Transformer programs—via both neural and symbolic extraction—demonstrate robust, scalable performance on algorithmic language tasks, regular and context-free language recognition, and symbolic regression benchmarks.

Key results include:

Perfect or near-perfect accuracy on classically hard languages (Dyck-1, $a^nb^nc^n$ , majority, piecewise testable).
Competitive or superior performance vs. unconstrained transformer baselines in tasks requiring length generalization and mechanistic interpretability.
Fast verification/minimization cycles: Synthesis and property checking complete in seconds to a few minutes on hundreds of examples; timeouts occur only for languages that are not C-RASP-expressible (Jiang et al., 18 Feb 2026).

Benchmark	Synthesis Accuracy	Program Minimization	Constrained Learning
Dyck-1	100%	21 rules / 27.7s	12 rules / –
Majority	100%	9 / 4.8s	1 / 61.4s
$a^b^$	100%	11 / 7.7s	1 / 10.2s
$a^nb^nc^n$	100%	22 / 88.7s	2 / –

This illustrates both the expressivity and tractability of direct program synthesis and minimization pipelines.

5. Impact on Interpretability and Explainable AI

By concretely linking transformer architectures to symbolic program representations, transformer programs enable a new scientific workflow for the algorithmic analysis, re-use, and certification of neural models:

Faithful extraction: Programs are extracted in a fully mechanistic, lossless manner from learned transformers.
Transparency: Each operator, count, or route is human-interpretable; debugging and modification are tractable at the program level.
Certifiable correctness: SMT-based verification allows formal guarantees about the properties of programs synthesized or internalized by transformers.
Reproducibility: Discrete program representations can be shared, compared, and independently validated, bridging the gap between formal verification and machine learning.

This establishes a blueprint for deploying transformer-based systems in high-assurance, safety-critical, or regulatory-constrained environments.

6. Limitations and Future Directions

While transformer programs markedly advance the interpretability of neural computation, several open limitations and extensions remain:

Undecidability: General verification is undecidable for C-RASP with unrestricted windowed counts or arbitrary constants. Nonetheless, for realistic, bounded program shapes, practical synthesis/verification is tractable.
Expressivity: Not all tasks are C-RASP-expressible; Turing-completeness of unconstrained transformers exceeds that of this fragment. Universal expressivity requires explicit handling of external memory or unbounded recursion (Jiang et al., 18 Feb 2026).
Optimization: Learning discrete parameters at scale requires sophisticated discrete optimization strategies (beyond simulated annealing or Gumbel-Softmax), especially for complex or high-dimensional tasks (Friedman et al., 2023, Zhang et al., 9 Jan 2026).
Integration with conventional neural architectures: Scaling to multimodal, continuous, or highly compositional settings demands further architectural enhancements and possibly hybrid symbolic-neural techniques.

The current research frontier focuses on scaling up expressivity without sacrificing verification, enriching the underlying program languages (conditional counting, dataflow, recursion), and generalizing extraction to arbitrary transformer checkpoints.

References:

(Friedman et al., 2023) Learning Transformer Programs
(Strobl et al., 2024) Transformers as Transducers
(Zhang et al., 9 Jan 2026) Weights to Code: Extracting Interpretable Algorithms from the Discrete Transformer
(Jiang et al., 18 Feb 2026) Synthesis and Verification of Transformer Programs

Markdown Report Issue Upgrade to Chat

References (4)

Learning Transformer Programs (2023)

Transformers as Transducers (2024)

Weights to Code: Extracting Interpretable Algorithms from the Discrete Transformer (2026)

Synthesis and Verification of Transformer Programs (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Transformer Programs.

Transformer Programs: Interpretable Computation

1. Formal Languages for Transformer Programs: C-RASP and RASP

2. Learning and Extraction of Transformer Programs

a) Mechanistically Constrained Learning (Friedman et al., 2023, Zhang et al., 9 Jan 2026)

b) Symbolic Synthesis via Simulated Annealing (Jiang et al., 18 Feb 2026)

3. Verification, Minimization, and Formal Methods

4. Empirical Performance and Benchmarks

5. Impact on Interpretability and Explainable AI

6. Limitations and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Transformer Programs: Interpretable Computation

1. Formal Languages for Transformer Programs: C-RASP and RASP

2. Learning and Extraction of Transformer Programs

a) Mechanistically Constrained Learning (Friedman et al., 2023, Zhang et al., 9 Jan 2026)

b) Symbolic Synthesis via Simulated Annealing (Jiang et al., 18 Feb 2026)

3. Verification, Minimization, and Formal Methods

4. Empirical Performance and Benchmarks

5. Impact on Interpretability and Explainable AI

6. Limitations and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics