Axial Grammar: Deterministic Hierarchical Parsing
- Axial grammar is a formal syntactic framework that embeds multi-dimensional hierarchical structure in flat token sequences via a finite set of ranked separators.
- It enables one-pass, deterministic parsing by mapping each token to a coordinate in a multi-dimensional grid without requiring recursion or backtracking.
- This framework is applied in Memelang for LLM-generated vector-relational queries, offering low token redundancy and precise slot assignment for efficient IR generation.
Axial grammar is a formal syntactic framework that constructs multi-dimensional hierarchical structure in a flat token sequence by using a finite set of ranked separators. In this framework, iterated placement of these separators in the token stream induces an -dimensional indexing, enabling one-pass, deterministic, and parenthesis-free parsing. The concept was introduced in the context of Memelang, a domain-specific language (DSL) for LLM-generated vector-relational queries, to serve as a compact intermediate representation (IR) suitable for direct LLM emission (Holt, 18 Dec 2025).
1. Formal Definition of Axial Grammar
Let be a context-free grammar. The terminal vocabulary partitions into non-separator tokens (such as identifiers, literals, comparators) and a finite set of separator tokens , each with an associated rank via a ranking function .
The minimal Memelang-style EBNF for is:
- Atoms, Comparisons, Tags, Variables, etc.
No parentheses or nested surface productions appear; hierarchy is exclusively created by the interleaving of separator tokens of different ranks. The start symbol is typically . For the Memelang instantiation, typical separators are double semicolon (rank 2, Matrix), single semicolon (rank 1, Vector), and whitespace (rank 0, Limit).
2. Coordinate Assignment and Parsing Mechanism
Parsing a flat stream proceeds by associating each position with an -tuple coordinate .
- Initialization: .
- Upon encountering of rank , transform by:
- If , assign coordinate ; if , set .
Each non-separator token is thus unambiguously mapped to a coordinate in . This enables the projection of the linear parse into a multi-dimensional grid.
The entire parsing process requires only a single left-to-right pass, with per-token constant work given is a small constant (typically ). No stack, lookahead, or backtracking is required. Complexity is with constant, hence linear in .
3. Key Properties and Theoretical Guarantees
Determinism and unambiguity are immediate consequences of the coordinate-assignment mechanism: the total ordering on ranks, atomicity of separators, and coordinate-update rules ensure that every token sequence yields a unique hierarchical parse. Specifically, parenthesis-free uniqueness holds, as separator-token positioning and rank assignment force a fixed tree structure of depth up to , circumventing the exponential ambiguity typical of parenthesized grammars.
Expressiveness covers all finite ordered trees of depth , as each separator rank encodes a branching level. At the same time, the grammar is token-sparse: for , only three special tokens serve as separators, replacing parentheses, braces, and clause head keywords common in other DSLs.
A plausible implication is that this framework is especially well-suited for LLM-emittable IRs, optimizing for low generation entropy and syntactic regularity by reducing the token set and enforcing slot roles via coordinates.
4. Concrete Example and Grid Recovery
Consider the Memelang query:
1 |
movies year <1970; title _;; |
- movies [A], ␣ [r=0], year [A], ␣ [0], <1970 [A], ; [1], ␣ [0], title [A], ␣ [0], _ [A], ;; [2]
Parsing proceeds as follows (axes: M = Matrix, V = Vector, L = Limit):
| step | token | rank | coordinate (M,V,L) |
|---|---|---|---|
| 1 | movies | atom | (0,0,0) |
| 2 | ␣ | r=0 | (0,0,1) |
| 3 | year | atom | (0,0,1) |
| 4 | ␣ | r=0 | (0,0,2) |
| 5 | <1970 | atom | (0,0,2) |
| 6 | ; | r=1 | (0,1,0) |
| 7 | ␣ | r=0 | (0,1,1) |
| 8 | title | atom | (0,1,1) |
| 9 | ␣ | r=0 | (0,1,2) |
| 10 | _ | atom | (0,1,2) |
| 11 | ;; | r=2 | (1,0,0) |
After cell recovery, table slots, column slots, and value slots map as:
- (0,0,2): [movies] → Table
- (0,0,1): [year] → Column
- (0,0,0): [<1970] → Value
- (0,1,2): [title] → Column (inherits Table)
- (0,1,1): [_] → Value
This enables extraction of two vector predicates, and by coordinate-inheritance, redundant slot emission is avoided.
5. Comparative Advantages over Parenthesis-Based DSLs
Axial grammar establishes several advantages:
- Token-sparseness: only separator tokens are required, yielding lower surface entropy and less token redundancy.
- Streaming and single-pass parsing: no stack, lookahead, or matching parentheses are ever required; each atom is immediately slotted into a known hierarchical position.
- Deterministic slotting: coordinates guarantee fixed semantic roles, minimizing syntax variation encountered by downstream models or compilers.
- Suitability for LLM IRs: the structure supports implicit context carry-forward and scoped variable binding, minimizing repetition often induced by clause-based syntaxes.
This suggests that generative LLMs benefit from both reduced surface complexity and deterministic downstream mapping, particularly in tool-use scenarios.
6. Memelang Instantiation and Compilation
Memelang demonstrates a concrete application of axial grammar with axes:
- Axis 2 (Matrix): separator ";;"
- Axis 1 (Vector): separator ";"
- Axis 0 (Limit): separator whitespace
Within this system, axis-0 index 2 is interpreted as the Table slot, index 1 as Column slot, and index 0 as Value slot (predicates, constants, tags).
Embodiments of the grammar include:
- Carry-forward semantics: omitted cells inherit previously stated Table/Column.
- Inline value tags (e.g., :sum, :min, :asc, :grp, :E:\mathbb{N}^3 \to$ SlotContent, with compilation schemes emitting standard parameterized SQL (with optional pgvector operators).
The above Memelang query, for example, compiles as:
1 2 3 |
SELECT t0.year, t0.title FROM movies AS t0 WHERE t0.year < 1970; |
7. Summary and Formal Significance
Axial grammar provides a deterministic, parenthesis-free, and highly compact mechanism for embedding hierarchical structure in flat token streams. Its coordinate indexing via ranked separators supports single-pass parsing and explicit slot assignment, making it well-aligned with the requirements of LLM-emittable IRs and streaming query compilation. Memelang illustrates its efficacy in the domain of hybrid vector-relational querying, where a small separator vocabulary and fixed coordinate roles yield robust, low-entropy, and high-parsability representations for both human and machine agents (Holt, 18 Dec 2025).