Axial Grammar: Deterministic Hierarchical Parsing

Updated 23 December 2025

Axial grammar is a formal syntactic framework that embeds multi-dimensional hierarchical structure in flat token sequences via a finite set of ranked separators.
It enables one-pass, deterministic parsing by mapping each token to a coordinate in a multi-dimensional grid without requiring recursion or backtracking.
This framework is applied in Memelang for LLM-generated vector-relational queries, offering low token redundancy and precise slot assignment for efficient IR generation.

Axial grammar is a formal syntactic framework that constructs multi-dimensional hierarchical structure in a flat token sequence by using a finite set of ranked separators. In this framework, iterated placement of these separators in the token stream induces an $n$ -dimensional indexing, enabling one-pass, deterministic, and parenthesis-free parsing. The concept was introduced in the context of Memelang, a domain-specific language (DSL) for LLM-generated vector-relational queries, to serve as a compact intermediate representation (IR) suitable for direct LLM emission (Holt, 18 Dec 2025).

1. Formal Definition of Axial Grammar

Let $G = (N, T, P, S)$ be a context-free grammar. The terminal vocabulary $T = A \cup S$ partitions into non-separator tokens $A$ (such as identifiers, literals, comparators) and a finite set of separator tokens $S = \{s_0, s_1, \dots, s_{n-1}\}$ , each with an associated rank via a ranking function $\rho: S \rightarrow \{0, \dots, n-1\}$ .

The minimal Memelang-style EBNF for $n=3$ is:

$\text{Query} := (\text{Matrix}\ s_2)+$
$\text{Matrix} := \text{Vector}\ (s_1\ \text{Vector})^*$
$\text{Vector} := \text{Limit}\ (s_0\ \text{Limit})^*$
$\text{Limit} :=$ Atoms, Comparisons, Tags, Variables, etc.

No parentheses or nested surface productions appear; hierarchy is exclusively created by the interleaving of separator tokens of different ranks. The start symbol is typically $\text{Query}$ . For the Memelang instantiation, typical separators are double semicolon $``;;''$ (rank 2, Matrix), single semicolon $``;''$ (rank 1, Vector), and whitespace (rank 0, Limit).

2. Coordinate Assignment and Parsing Mechanism

Parsing a flat stream $\tau = t_1 t_2 \dots t_T$ proceeds by associating each position $k$ with an $n$ -tuple coordinate $i^{(k)} \in \mathbb{N}^n$ .

Initialization: $i^{(0)} = (0, \ldots, 0)$ .
Upon encountering $t_k \in S$ of rank $r = \rho(t_k)$ , transform $i$ by:

$(U_r(i))_m = \begin{cases} i_m & m > r \ i_r + 1 & m = r \ 0 & m < r \end{cases}$

If $t_k \in A$ , assign coordinate $i^{(k)} = i^{(k - 1)}$ ; if $t_k \in S$ , set $i^{(k)} = U_{\rho(t_k)}(i^{(k-1)})$ .

Each non-separator token is thus unambiguously mapped to a coordinate in $\mathbb{N}^n$ . This enables the projection of the linear parse into a multi-dimensional grid.

The entire parsing process requires only a single left-to-right pass, with per-token constant work given $n$ is a small constant (typically $n=3$ ). No stack, lookahead, or backtracking is required. Complexity is $O(T \cdot n)$ with $n$ constant, hence linear in $T$ .

3. Key Properties and Theoretical Guarantees

Determinism and unambiguity are immediate consequences of the coordinate-assignment mechanism: the total ordering on ranks, atomicity of separators, and coordinate-update rules ensure that every token sequence yields a unique hierarchical parse. Specifically, parenthesis-free uniqueness holds, as separator-token positioning and rank assignment force a fixed tree structure of depth up to $n$ , circumventing the exponential ambiguity typical of parenthesized grammars.

Expressiveness covers all finite ordered trees of depth $\leq n$ , as each separator rank encodes a branching level. At the same time, the grammar is token-sparse: for $n=3$ , only three special tokens serve as separators, replacing parentheses, braces, and clause head keywords common in other DSLs.

A plausible implication is that this framework is especially well-suited for LLM-emittable IRs, optimizing for low generation entropy and syntactic regularity by reducing the token set and enforcing slot roles via coordinates.

4. Concrete Example and Grid Recovery

Consider the Memelang query:

1	movies year <1970; title _;;

The token stream, annotated by separator rank, is:

movies [A], ␣ [r=0], year [A], ␣ [0], <1970 [A], ; [1], ␣ [0], title [A], ␣ [0], _ [A], ;; [2]

Parsing proceeds as follows (axes: M = Matrix, V = Vector, L = Limit):

step	token	rank	coordinate (M,V,L)
1	movies	atom	(0,0,0)
2	␣	r=0	(0,0,1)
3	year	atom	(0,0,1)
4	␣	r=0	(0,0,2)
5	<1970	atom	(0,0,2)
6	;	r=1	(0,1,0)
7	␣	r=0	(0,1,1)
8	title	atom	(0,1,1)
9	␣	r=0	(0,1,2)
10	_	atom	(0,1,2)
11	;;	r=2	(1,0,0)

After cell recovery, table slots, column slots, and value slots map as:

(0,0,2): [movies] → Table
(0,0,1): [year] → Column
(0,0,0): [<1970] → Value
(0,1,2): [title] → Column (inherits Table)
(0,1,1): [_] → Value

This enables extraction of two vector predicates, and by coordinate-inheritance, redundant slot emission is avoided.

5. Comparative Advantages over Parenthesis-Based DSLs

Axial grammar establishes several advantages:

Token-sparseness: only $n$ separator tokens are required, yielding lower surface entropy and less token redundancy.
Streaming and single-pass parsing: no stack, lookahead, or matching parentheses are ever required; each atom is immediately slotted into a known hierarchical position.
Deterministic slotting: coordinates guarantee fixed semantic roles, minimizing syntax variation encountered by downstream models or compilers.
Suitability for LLM IRs: the structure supports implicit context carry-forward and scoped variable binding, minimizing repetition often induced by clause-based syntaxes.

This suggests that generative LLMs benefit from both reduced surface complexity and deterministic downstream mapping, particularly in tool-use scenarios.

6. Memelang Instantiation and Compilation

Memelang demonstrates a concrete application of axial grammar with $n=3$ axes:

Axis 2 (Matrix): separator ";;"
Axis 1 (Vector): separator ";"
Axis 0 (Limit): separator whitespace

Within this system, axis-0 index 2 is interpreted as the Table slot, index 1 as Column slot, and index 0 as Value slot (predicates, constants, tags).

Embodiments of the grammar include:

Carry-forward semantics: omitted cells inherit previously stated Table/Column.
Inline value tags (e.g., :sum, :min, :asc, :grp, : $x): directly annotate value cells for groupings, aggregates, ordering, and variable binding.</li> <li>Relative references (“@”, “<sup>”):</sup> resolve via coordinate locations in the grid to prior values.</li> <li>A full query is assembled as a 3D coordinate-indexed map$ E:\mathbb{N}^3 \to$ SlotContent, with compilation schemes emitting standard parameterized SQL (with optional pgvector operators).

The above Memelang query, for example, compiles as:

1
2
3

SELECT t0.year, t0.title
  FROM movies AS t0
 WHERE t0.year < 1970;

More complex queries, such as aggregations and vector-distance predicates, are handled uniformly by the grid structure and tagging semantics.

7. Summary and Formal Significance

Axial grammar provides a deterministic, parenthesis-free, and highly compact mechanism for embedding hierarchical structure in flat token streams. Its coordinate indexing via ranked separators supports single-pass parsing and explicit slot assignment, making it well-aligned with the requirements of LLM-emittable IRs and streaming query compilation. Memelang illustrates its efficacy in the domain of hybrid vector-relational querying, where a small separator vocabulary and fixed coordinate roles yield robust, low-entropy, and high-parsability representations for both human and machine agents (Holt, 18 Dec 2025).

PDF Markdown Chat (Pro)

References (1)

Memelang: An Axial Grammar for LLM-Generated Vector-Relational Queries (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Axial Grammar.