Boolean RASP (B-RASP): Formal Transformer Model

Updated 12 February 2026

Boolean RASP (B-RASP) is a formal schema that defines transformer capabilities using straight-line Boolean operations and hard-attention lookups.
It precisely characterizes star-free regular languages and FO-rational transductions by mapping logical formulas and automata constructs to transformer mechanisms.
Extensions like B-RASP[pos] and S-RASP expand its expressiveness by incorporating position arithmetic and prefix-sum operations for advanced string transductions.

Boolean RASP (B-RASP) is a formal programming schema introduced to analytically relate the representational capabilities of masked hard-attention transformers and classical models in formal language theory. Positioned as a syntactic and operational intermediate between linear temporal logic (LTL), star-free regular languages, and transformer architectures, B-RASP enables precise characterization of what transformers with specific architectural restrictions can and cannot compute. The framework has been systematically developed and extended in two influential lines of research, each focusing on the recognition and transduction power of such neural models (Yang et al., 2023, Strobl et al., 2024).

1. Formal Definition and Syntax

A B-RASP program processes strings over a finite alphabet $\Sigma$ and manipulates Boolean or symbol vectors indexed by input positions. The construction is strictly straight-line, with no branching or looping: each subsequent vector depends only on previously defined vectors. There are two primary instruction families:

Position-wise Boolean Combinations: Given Boolean vectors $P_1,\ldots,P_k$ , one can define a new vector by applying any Boolean formula $\varphi(P_1(i),\ldots,P_k(i))$ . For alphabet symbols $a\in\Sigma$ , initial vectors are defined as $Q_a(i)=1$ iff $w_i=a$ .
Hard-Attention Lookup: The key primitive simulates the masked hard attention mechanism in transformers. For position $i$ , and for a Boolean mask $M(i,j)$ (restricting eligible $j$ —e.g., to strict past/future), a Boolean score predicate $S(i,j)$ , a value predicate $V(i,j)$ , and default $D(i)$ , B-RASP defines:

$P_{k+1}(i) = \begin{cases} V(i,\max B_i) & \text{(rightmost)} \ V(i,\min B_i) & \text{(leftmost)} \end{cases} \qquad \text{if } B_i\neq\varnothing,$

else $D(i)$ , where $B_i = \{j\mid M(i,j)=1, S(i,j)=1, S(i,j)\ge S(i,j')\ \forall j'\in U_i\}$ (Yang et al., 2023).

Compressed Symbol Output (Transductions, Extended): Beyond recognition, (Strobl et al., 2024) introduces B-RASP programs with symbol output registers, possibly mapping to bounded-length substrings, thus enabling general order-preserving string-to-string transductions.

Semantically, the program is evaluated left-to-right over a string $w$ , instantiating each vector in order and determining acceptance or output according to the designated final register.

2. Expressive Power: Recognition and Transduction

B-RASP’s expressiveness is exactly characterized in both the recognition and transduction settings by major theorems:

Recognition (Star-Free, LTL-Definable Languages): B-RASP captures precisely the class of star-free regular languages (those expressible without use of the Kleene star in regular expressions), equivalently the languages accepted by counter-free finite automata, and exactly those definable in LTL. Every LTL formula $\phi$ over atomic predicates $Q_a$ and Boolean/temporal operators translates to a B-RASP program computing the same predicate at every position, and vice versa (Yang et al., 2023).
Transduction (First-Order Rational, FO-Rat): In the sequence-to-sequence setting, B-RASP programs (with compressed-output registers) compute exactly the first-order rational transductions, i.e., string-to-string mappings definable by order-preserving first-order transducers. Specifically:

$\mathrm{B\text{-}RASP} = \mathrm{FO}\text{-}\mathrm{Rat}$

as established in (Strobl et al., 2024). Each such mapping can be decomposed into a pair of aperiodic two-state sequential transducers and implemented in B-RASP with strictly composed Boolean state registers and hard-attention operations.

3. Relationship to Transformers and Temporal Logic

B-RASP is designed as an intermediate language whose straight-line, attention-style semantics aligns closely with masked hard-attention transformer circuits, and which is logically equivalent to specific fragments of temporal logic. This establishes robust equivalence relations:

Model/Language	Equivalence Class
B-RASP	Star-free languages, Counter-free DFA, LTL
Strict-mask hard-attention transformers	B-RASP
B-RASP with compressed output	FO-rational (order-preserving) transductions
Masked average-hard-attention transformers	FO-rational transductions

Any B-RASP program of Boolean vectors can be simulated by a strict-mask hard-attention transformer with depth two—encoding Boolean vectors in real activations and using meticulously constructed self-attention score/value heads and shallow ReLU FFNs. Conversely, any such transformer (without or with finite-image position embeddings) can be emulated by a B-RASP program via bit-encoded activations and simulated Boolean-lookup sequences (Yang et al., 2023, Strobl et al., 2024).

4. Extensions: Position Arithmetic and Prefix Sum

Two principal extensions enhance B-RASP’s expressiveness:

B-RASP[pos]: Introduces integer vectors, built-in position access $pos(i)=i$ , clipped register-wise addition/subtraction, comparisons, and attention predicates referencing positions. This extension characterizes the class of first-order regular functions—a strict superset of FO-rational. For example, the copy-first-half function, $w \mapsto$ first $\lfloor|w|/2\rfloor$ of $w$ , is not FO-rational but can be implemented in B-RASPpos.
S-RASP: Augments B-RASP with prefix-sum operations, permitting additional arithmetic over sequences. S-RASP precisely captures the class of first-order polyregular functions, such as squaring a string or producing marked squares of progressively longer prefixes.

These extensions respectively map to broader transduction classes and can be simulated by appropriately architected transformer variants (e.g., masked average-hard attention models).

5. Illustrative Program Examples

B-RASP enables concise, compositional programs for a range of formal language tasks:

Example	Description	B-RASP Sketch
Ends-in- $a$	Accepts strings ending in $a$	$Y(i):=Q_a(i)$ ; accept iff $Y(n)=1$
Exists- $a$	Accepts if $a$ occurs anywhere	$P(i):=[j\leq i]_{\blacktriangleright}(Q_a(j),0)$ , $Y(i):=P(i)$ ; accept iff $P(n)=1$
Dyck-1 Brackets (depth 2)	Recognize correctly nested pairs up to depth 2	Uses matching lookup and local consistency checks
Rotate-right (transduction)	$w=a_0a_1\ldots a_{n-1} \mapsto a_{n-1}a_0a_1\ldots a_{n-2}$	Implements right/leftmost symbol lookups with default/out-of-bounds symbols (Strobl et al., 2024)

For more involved languages (e.g., the Dyck-1 bracket language of depth 2), B-RASP programs utilize sequences of attention lookups and Boolean consistency checks across positions. Detailed intermediate vectors and registers, as well as tabulated step semantics, are provided in §4.2 of (Yang et al., 2023).

6. Masking, Position Embeddings, and Depth Hierarchies

Masking: B-RASP supports “strict masking” (positions can attend only to strict past or future, not themselves), which ensures expressiveness matches the star-free languages. Relaxing to non-strict masks (“usual” transformer masking, $j \leq i$ or $j \geq i$ ) collapses expressiveness to the stutter-invariant star-free languages (languages invariant under repeated symbols) (Yang et al., 2023).
Position Embeddings: Assuming position embeddings $\theta_n$ with finite image, the equivalence between strict-mask hard-attention transformers, B-RASP with predicates on position embeddings $P_\theta$ , and LTL with added monadic predicates holds. Using rational sinusoidal embeddings collapses recognition to the $AC^0$ regular languages; arbitrary finite-image PEs correspond to $LTL[Mon]$ (LTL with all monadic numeric predicates).
Depth and Hierarchy: The expressive power of B-RASP, LTL, and masked hard-attention transformers strictly increases with depth of attention/computation. For each $k$ , there exists $\mathit{STAIR}_{2k+1}$ , a star-free language expressible with depth $2k+1$ but not $2k$. Thus,

$\mathrm{MUHAT}_{\leq k} \subsetneq \mathrm{MUHAT}_{\leq 2(k+1)}$

reflecting a true hierarchy within star-free languages (Yang et al., 2023).

Complexity: The translation of arbitrary score predicates $S(i,j)$ to forms depending only on $j$ may incur exponential blow-up; however, value predicates can be made unary without such cost.

7. Role as Theoretical Intermediary and Limitations

B-RASP provides a transparent, compositional, and mechanically simulatable language that sits between high-level logic specifications and low-level transformer circuits. This alignment enables:

Direct translation of LTL formulas and automata-based constructions into B-RASP programs.
Rigorous exhibition of transformer limitations and capabilities, particularly with respect to masking strategies, position encoding schemes, and network depth.

Notably, plain B-RASP does not support position arithmetic or general counting, and thus cannot express languages or transductions requiring these capabilities (e.g., global half-length tests), except when extended (B-RASP[pos], S-RASP) (Strobl et al., 2024).

The B-RASP model has become central in the formal analysis of deep sequence models, providing an effective bridge between symbolic automata theory and the operational semantics of contemporary neural architectures. For comprehensive program examples, normal-form lemmas, and proof details, see (Yang et al., 2023) for language recognition and (Strobl et al., 2024) for transduction results.

Markdown Report Issue Upgrade to Chat

References (2)

Masked Hard-Attention Transformers Recognize Exactly the Star-Free Languages (2023)

Transformers as Transducers (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Boolean RASP (B-RASP).

Boolean RASP (B-RASP): Formal Transformer Model

1. Formal Definition and Syntax

2. Expressive Power: Recognition and Transduction

3. Relationship to Transformers and Temporal Logic

4. Extensions: Position Arithmetic and Prefix Sum

5. Illustrative Program Examples

6. Masking, Position Embeddings, and Depth Hierarchies

7. Role as Theoretical Intermediary and Limitations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Boolean RASP (B-RASP): Formal Transformer Model

1. Formal Definition and Syntax

2. Expressive Power: Recognition and Transduction

3. Relationship to Transformers and Temporal Logic

4. Extensions: Position Arithmetic and Prefix Sum

5. Illustrative Program Examples

6. Masking, Position Embeddings, and Depth Hierarchies

7. Role as Theoretical Intermediary and Limitations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research