Papers
Topics
Authors
Recent
Search
2000 character limit reached

Boolean RASP (B-RASP): Formal Transformer Model

Updated 12 February 2026
  • Boolean RASP (B-RASP) is a formal schema that defines transformer capabilities using straight-line Boolean operations and hard-attention lookups.
  • It precisely characterizes star-free regular languages and FO-rational transductions by mapping logical formulas and automata constructs to transformer mechanisms.
  • Extensions like B-RASP[pos] and S-RASP expand its expressiveness by incorporating position arithmetic and prefix-sum operations for advanced string transductions.

Boolean RASP (B-RASP) is a formal programming schema introduced to analytically relate the representational capabilities of masked hard-attention transformers and classical models in formal language theory. Positioned as a syntactic and operational intermediate between linear temporal logic (LTL), star-free regular languages, and transformer architectures, B-RASP enables precise characterization of what transformers with specific architectural restrictions can and cannot compute. The framework has been systematically developed and extended in two influential lines of research, each focusing on the recognition and transduction power of such neural models (Yang et al., 2023, Strobl et al., 2024).

1. Formal Definition and Syntax

A B-RASP program processes strings over a finite alphabet Σ\Sigma and manipulates Boolean or symbol vectors indexed by input positions. The construction is strictly straight-line, with no branching or looping: each subsequent vector depends only on previously defined vectors. There are two primary instruction families:

  • Position-wise Boolean Combinations: Given Boolean vectors P1,,PkP_1,\ldots,P_k, one can define a new vector by applying any Boolean formula φ(P1(i),,Pk(i))\varphi(P_1(i),\ldots,P_k(i)). For alphabet symbols aΣa\in\Sigma, initial vectors are defined as Qa(i)=1Q_a(i)=1 iff wi=aw_i=a.
  • Hard-Attention Lookup: The key primitive simulates the masked hard attention mechanism in transformers. For position ii, and for a Boolean mask M(i,j)M(i,j) (restricting eligible jj—e.g., to strict past/future), a Boolean score predicate S(i,j)S(i,j), a value predicate V(i,j)V(i,j), and default D(i)D(i), B-RASP defines:

Pk+1(i)={V(i,maxBi)(rightmost) V(i,minBi)(leftmost)if Bi,P_{k+1}(i) = \begin{cases} V(i,\max B_i) & \text{(rightmost)} \ V(i,\min B_i) & \text{(leftmost)} \end{cases} \qquad \text{if } B_i\neq\varnothing,

else D(i)D(i), where Bi={jM(i,j)=1,S(i,j)=1,S(i,j)S(i,j) jUi}B_i = \{j\mid M(i,j)=1, S(i,j)=1, S(i,j)\ge S(i,j')\ \forall j'\in U_i\} (Yang et al., 2023).

  • Compressed Symbol Output (Transductions, Extended): Beyond recognition, (Strobl et al., 2024) introduces B-RASP programs with symbol output registers, possibly mapping to bounded-length substrings, thus enabling general order-preserving string-to-string transductions.

Semantically, the program is evaluated left-to-right over a string ww, instantiating each vector in order and determining acceptance or output according to the designated final register.

2. Expressive Power: Recognition and Transduction

B-RASP’s expressiveness is exactly characterized in both the recognition and transduction settings by major theorems:

  • Recognition (Star-Free, LTL-Definable Languages): B-RASP captures precisely the class of star-free regular languages (those expressible without use of the Kleene star in regular expressions), equivalently the languages accepted by counter-free finite automata, and exactly those definable in LTL. Every LTL formula ϕ\phi over atomic predicates QaQ_a and Boolean/temporal operators translates to a B-RASP program computing the same predicate at every position, and vice versa (Yang et al., 2023).
  • Transduction (First-Order Rational, FO-Rat): In the sequence-to-sequence setting, B-RASP programs (with compressed-output registers) compute exactly the first-order rational transductions, i.e., string-to-string mappings definable by order-preserving first-order transducers. Specifically:

B-RASP=FO-Rat\mathrm{B\text{-}RASP} = \mathrm{FO}\text{-}\mathrm{Rat}

as established in (Strobl et al., 2024). Each such mapping can be decomposed into a pair of aperiodic two-state sequential transducers and implemented in B-RASP with strictly composed Boolean state registers and hard-attention operations.

3. Relationship to Transformers and Temporal Logic

B-RASP is designed as an intermediate language whose straight-line, attention-style semantics aligns closely with masked hard-attention transformer circuits, and which is logically equivalent to specific fragments of temporal logic. This establishes robust equivalence relations:

Model/Language Equivalence Class
B-RASP Star-free languages, Counter-free DFA, LTL
Strict-mask hard-attention transformers B-RASP
B-RASP with compressed output FO-rational (order-preserving) transductions
Masked average-hard-attention transformers FO-rational transductions

Any B-RASP program of Boolean vectors can be simulated by a strict-mask hard-attention transformer with depth two—encoding Boolean vectors in real activations and using meticulously constructed self-attention score/value heads and shallow ReLU FFNs. Conversely, any such transformer (without or with finite-image position embeddings) can be emulated by a B-RASP program via bit-encoded activations and simulated Boolean-lookup sequences (Yang et al., 2023, Strobl et al., 2024).

4. Extensions: Position Arithmetic and Prefix Sum

Two principal extensions enhance B-RASP’s expressiveness:

  • B-RASP[pos]: Introduces integer vectors, built-in position access pos(i)=ipos(i)=i, clipped register-wise addition/subtraction, comparisons, and attention predicates referencing positions. This extension characterizes the class of first-order regular functions—a strict superset of FO-rational. For example, the copy-first-half function, ww \mapsto first w/2\lfloor|w|/2\rfloor of ww, is not FO-rational but can be implemented in B-RASPpos.
  • S-RASP: Augments B-RASP with prefix-sum operations, permitting additional arithmetic over sequences. S-RASP precisely captures the class of first-order polyregular functions, such as squaring a string or producing marked squares of progressively longer prefixes.

These extensions respectively map to broader transduction classes and can be simulated by appropriately architected transformer variants (e.g., masked average-hard attention models).

5. Illustrative Program Examples

B-RASP enables concise, compositional programs for a range of formal language tasks:

Example Description B-RASP Sketch
Ends-in-aa Accepts strings ending in aa Y(i):=Qa(i)Y(i):=Q_a(i); accept iff Y(n)=1Y(n)=1
Exists-aa Accepts if aa occurs anywhere P(i):=[ji](Qa(j),0)P(i):=[j\leq i]_{\blacktriangleright}(Q_a(j),0), Y(i):=P(i)Y(i):=P(i); accept iff P(n)=1P(n)=1
Dyck-1 Brackets (depth 2) Recognize correctly nested pairs up to depth 2 Uses matching lookup and local consistency checks
Rotate-right (transduction) w=a0a1an1an1a0a1an2w=a_0a_1\ldots a_{n-1} \mapsto a_{n-1}a_0a_1\ldots a_{n-2} Implements right/leftmost symbol lookups with default/out-of-bounds symbols (Strobl et al., 2024)

For more involved languages (e.g., the Dyck-1 bracket language of depth 2), B-RASP programs utilize sequences of attention lookups and Boolean consistency checks across positions. Detailed intermediate vectors and registers, as well as tabulated step semantics, are provided in §4.2 of (Yang et al., 2023).

6. Masking, Position Embeddings, and Depth Hierarchies

  • Masking: B-RASP supports “strict masking” (positions can attend only to strict past or future, not themselves), which ensures expressiveness matches the star-free languages. Relaxing to non-strict masks (“usual” transformer masking, jij \leq i or jij \geq i) collapses expressiveness to the stutter-invariant star-free languages (languages invariant under repeated symbols) (Yang et al., 2023).
  • Position Embeddings: Assuming position embeddings θn\theta_n with finite image, the equivalence between strict-mask hard-attention transformers, B-RASP with predicates on position embeddings PθP_\theta, and LTL with added monadic predicates holds. Using rational sinusoidal embeddings collapses recognition to the AC0AC^0 regular languages; arbitrary finite-image PEs correspond to LTL[Mon]LTL[Mon] (LTL with all monadic numeric predicates).
  • Depth and Hierarchy: The expressive power of B-RASP, LTL, and masked hard-attention transformers strictly increases with depth of attention/computation. For each kk, there exists STAIR2k+1\mathit{STAIR}_{2k+1}, a star-free language expressible with depth $2k+1$ but not $2k$. Thus,

MUHATkMUHAT2(k+1)\mathrm{MUHAT}_{\leq k} \subsetneq \mathrm{MUHAT}_{\leq 2(k+1)}

reflecting a true hierarchy within star-free languages (Yang et al., 2023).

  • Complexity: The translation of arbitrary score predicates S(i,j)S(i,j) to forms depending only on jj may incur exponential blow-up; however, value predicates can be made unary without such cost.

7. Role as Theoretical Intermediary and Limitations

B-RASP provides a transparent, compositional, and mechanically simulatable language that sits between high-level logic specifications and low-level transformer circuits. This alignment enables:

  • Direct translation of LTL formulas and automata-based constructions into B-RASP programs.
  • Rigorous exhibition of transformer limitations and capabilities, particularly with respect to masking strategies, position encoding schemes, and network depth.

Notably, plain B-RASP does not support position arithmetic or general counting, and thus cannot express languages or transductions requiring these capabilities (e.g., global half-length tests), except when extended (B-RASP[pos], S-RASP) (Strobl et al., 2024).

The B-RASP model has become central in the formal analysis of deep sequence models, providing an effective bridge between symbolic automata theory and the operational semantics of contemporary neural architectures. For comprehensive program examples, normal-form lemmas, and proof details, see (Yang et al., 2023) for language recognition and (Strobl et al., 2024) for transduction results.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Boolean RASP (B-RASP).