Papers
Topics
Authors
Recent
2000 character limit reached

Memelang: Axial DSL for LLM Queries

Updated 23 December 2025
  • Memelang is a deterministic, axial multi-dimensional DSL that uses rank-specific separators to reconstruct a three-dimensional grid for precise query mapping.
  • It features implicit context carry-forward, parse-time variable binding, and inline tags for aggregation, grouping, and ordering, streamlining SQL compilation.
  • By integrating relational and vector-aware operations, Memelang enhances LLM pipelines with minimal syntax and clear semantic slot assignment.

Memelang is a deterministic, axial, multi-dimensional grammar and a compact domain-specific language (DSL) designed as an intermediate representation (IR) for Linear LLMs to emit complex relational and vector-relational database queries. Utilizing a linear token sequence encoded with rank-specific separators, Memelang recovers an explicit three-dimensional grid structure in a single left-to-right parse pass. Each “coordinate axis” in this grid is mapped to a semantic role, enabling unambiguous assignment of tokens to table, column, and value slots, thus supporting tool-augmented LLM pipelines with minimal surface syntax and maximal downstream determinism. The Memelang system includes mechanisms for implicit context carry-forward, parse-time variable binding, inline grouping/aggregation/ordering, and compilation to parameterized SQL—optionally targeting vector-aware backends such as PostgreSQL with pgvector support (Holt, 18 Dec 2025).

1. Formal Structure of Memelang’s Axial Grammar

Memelang instantiates the general axial grammar framework with n=3n=3 axes: Matrix (2), Vector (1), and Limit (0). The token vocabulary Σ\Sigma is partitioned into atoms AA (non-separators) and separators SS. Separator tokens receive a rank via ρ:S{0,1,2}\rho: S \to \{0,1,2\}, assigning whitespace (ρ(ws+)=0\rho(\text{ws}^+)=0), single semicolon (ρ(";")=1\rho(";")=1), and double semicolon (ρ(";;")=2\rho(";;")=2). Parsing proceeds with a scan state i=(i2,i1,i0)i=(i_2,i_1,i_0), incremented and reset according to the rank of each encountered separator. Each non-separator token is mapped uniquely into a coordinate N3\in \mathbb{N}^3, yielding the grid Cτ(x)C_\tau(x) of cell contents for stream τ\tau.

Memelang's EBNF grammar—abbreviated—includes:

  • Query ::= ( Matrix “;;” )+
  • Matrix ::= Vector ( “;” Vector )*
  • Vector ::= Limit ( WS Limit )*
  • Limit ::= [left] [cmp] right | left (cmp right)?
  • Term ::= atom | (mod atom) | (atom mod atom)
  • Inline tags (e.g., “:sum”, “:grp”, “:asc”) modify values. Three slots per vector are assigned via Axis:0 (Limit): x0=2x_0=2 (Table), x0=1x_0=1 (Column), x0=0x_0=0 (Value/Predicate), after right-alignment local reversal.

2. Streaming Parsing and Coordinate Indexing

The parser makes a single left-to-right pass over the token stream. For each token, those of separator type update coordinates as follows: all lower axes (j<rj<r) are reset to zero, axis rr is incremented, higher axes remain unchanged. Each atom is written into cell C(i)C(i) at the current coordinate ii. There is no need for parentheses or indentation; the ranked separators are sufficient to recover the three-dimensional structure.

Once parsing is complete, a small record is constructed for each non-empty cell using a “cell interpreter” gg, which parses local token structure and accumulates inline tags and binding information. This coordinate system enables deterministic slot interpretation and downstream compositional compilation.

3. Semantic Roles, Carry-Forward, and Variable Binding

Each axis is mapped to a fixed role: Matrix (query/subquery block), Vector (column/predicate specification), and Limit (slot type). Within each (Matrix, Vector) slice, the last three non-empty positions are assigned to Table, Column, and Value. Carry-forward is applied along the Vector axis for Table and Column slots: if a slot is empty, its most recent non-empty value is inherited from the previous Vector in the same Matrix.

Variable binding is achieved with the inline tag “:vv” placed in the Value slot, binding variable vv to that coordinate. Subsequently, any occurrence “vv” in the same Matrix references the bound cell, supporting relative self-joins, nested queries, or parameter reuse.

4. Inline Tags, Aggregation, Grouping, and Ordering

Memelang provides inline tags as Value-slot prefixes to enable query plan specification:

  • Aggregates: :sum, :cnt, :min, :max, :avg, :last (compile to corresponding SQL aggregation functions)
  • Grouping key: :grp (marks column as GROUP BY key)
  • Ordering: :asc, :des (specifies ORDER BY)
  • Variable binding: :vv (binds vv to the cell) When a Value slot contains [“:f1[\text{“:f}_1, ..., “:f_m”, t1t_1, ..., tkt_k]$, tags are separated from the core term, and the interpreter processes each, emitting the necessary aggregation, grouping, or ordering logic. Group keys are propagated for SELECT and GROUP BY clauses, while ordering tags trigger primary/secondary ORDER BYs on aggregate or raw expressions.</li> </ul> <h2 class='paper-heading' id='reference-implementation-lexer-parser-and-sql-compiler'>5. Reference Implementation: Lexer, Parser, and SQL Compiler</h2> <p>The Memelang reference pipeline includes:</p> <ol> <li><strong>Lexer</strong>: Splits tokens on separators (whitespace, &quot;;&quot;, &quot;;;&quot;), comparators, colon, commas, quoted literals, and variables, yielding a flat token stream.</li> <li><strong>Parser</strong>: Executes the streaming coordinate assignment and populates the $C(x)$ cell map.</li> <li><strong>Cell interpreter</strong>: For each cell, parses as (Table?), (Column?), (Value+tags?), extracting SQL fragment, aggregation, grouping, ordering, and binding information.</li> <li><strong>Carry-forward and binding passes</strong>: Fill Table/Column slots where necessary and resolve variable bindings.</li> <li><strong>IR→SQL compiler</strong>: <ul> <li>Enumerates unique Table instances, assigns aliases (t₀, t₁, ...).</li> <li>Constructs FROM/JOIN clauses based on Table aliasing and self-join markers (&quot;@&quot;).</li> <li>WHERE clause comprises Value slot predicates with comparator.</li> <li>SELECT clause is built from those marked for projection, applying aggregation/wrapping based on grouping.</li> <li>GROUP BY collects all columns tagged :grp.</li> <li>ORDER BY is constructed from slots with :asc/:des.</li> <li>LIMIT is included if specified via a meta-mode Vector.</li> <li>Vector similarity operators (&quot;&lt;=&gt;&quot; etc.) compile to PostgreSQL pgvector operators.</li> </ul></li> </ol> <p>All literal values become parameterized SQL variables in the order they appear, and vector expressions are mapped to $...::VECTOR$ forms for backend compatibility.</p> <h2 class='paper-heading' id='annotated-examples'>6. Annotated Examples</h2> <p><strong>Scalar filter with carry-forward</strong></p> <p>Memelang:
    1
    
    movies year <1970; title _;;
    The coordinate assignment yields: | (Matrix, Vector, Limit) | Token(s) | Slot | |-------------------------|---------------|----------------| | (0,0,2) | [movies] | Table@V0 | | (0,0,1) | [year] | Column@V0 | | (0,0,0) | [&lt;1970] | Value | | (0,1,2) | [] → movies | Table@V1 (cf) | | (0,1,1) | [title] | Column@V1 | | (0,1,0) | [_] | Value (wildcard)|</p> <p>This compiles to:
    1
    2
    3
    
    SELECT t0.year, t0.title
    FROM movies AS t0
    WHERE t0.year < $1
    with parameter $1=1970.

    Vector predicate and ordering

    Memelang:

    1
    
    movies description <=>“robot”:asc<0.35; title _;;
    This parses to expressions with inline ordering tag :asc, compiled as:
    1
    2
    3
    4
    
    SELECT (t0.description <=> $1::VECTOR) AS col0, t0.title AS col1
    FROM movies AS t0
    WHERE (t0.description <=> %%%%30%%%%2
    ORDER BY (t0.description <=> $1::VECTOR) ASC
    with parameters $1=embedding("robot"),$2=0.35.

    Co-star self-join with binding

    Memelang:

    1
    2
    
    roles actor :$a="Bruce Willis"; movie _;
    @ @ @; actor !=$a;;
    Cell (0,0,1) binds aa, the “@”s initiate self-joins, and all Table/Column/Value slots are inherited through carry-forward and binding passes.

    Grouped join, vector predicate, limit

    Memelang:

    1
    
    movies year <1980; description <=>"war"<=\$sim; title :grp; roles movie @; rating :min:des; %m lim 12;;
    Parsed tags :grp and :min:des determine grouping and order. Output:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    
    SELECT
      MAX(t0.year) AS col0,
      MAX(t0.description <=> $1::VECTOR) AS col1,
      t0.title AS col2,
      MAX(t1.movie) AS col3,
      MIN(t1.rating) AS col4
    FROM movies AS t0, roles AS t1
    WHERE t0.year < $2
      AND (t0.description <=> %%%%33%%%%3
      AND t1.movie = t0.title
    GROUP BY t0.title
    ORDER BY MIN(t1.rating) DESC
    LIMIT 12
    with $1=embedding("war"),$2=1980, $3=\$sim.

    7. Context and Significance

    Memelang’s axial grammar enables deterministic, context-minimal parsing for LLM-emitted relational and vector-relational queries, reducing verbosity, surface ambiguity, and post-processing overhead. Its coordinate-indexed slot assignments, along with mechanisms for context carry-forward, variable binding, and inline manipulation of group/aggregate/order semantics, permit LLMs to emit queries whose parse trees and execution plans require no ambiguity resolution or second-pass transformation. Deployments benefit from transparent mapping to parameterized SQL, composable joins/self-joins, full vector operator support for pgvector, and programmatic suitability for streaming tool-use settings. The formal framework also sets a foundation for further research on higher-rank DSLs and deterministic interface grammars for LLM tool-use (Holt, 18 Dec 2025).

    Definition Search Book Streamline Icon: https://streamlinehq.com
    References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Memelang.