Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 149 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 35 tok/s Pro
GPT-5 High 35 tok/s Pro
GPT-4o 92 tok/s Pro
Kimi K2 196 tok/s Pro
GPT OSS 120B 425 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

Autumn DSL: Robust PEG Parsing

Updated 29 October 2025
  • Autumn DSL is a powerful PEG parsing library that extends standard PEG capabilities through comprehensive left-recursion support and explicit associativity handling.
  • It incorporates an innovative expression cluster mechanism to efficiently parse multiple operator types while ensuring predictable precedence and performance.
  • The library offers extensibility through modular customization, enabling robust syntax tree generation, error handling, and memoization for complex grammars.

Autumn is a general-purpose Parsing Expression Grammar (PEG) library designed to address core practical limitations of the PEG formalism, particularly in the context of left-recursive grammar rules, associativity, precedence, and efficient operator handling. PEGs specify languages by way of recursive-descent parsers, providing closure under composition, built-in disambiguation, and a close match to programmer intuition. Autumn extends the PEG paradigm by delivering robust, extensible parsing capabilities required for both language engineering and domain-specific languages (DSLs), with efficiency and maintainability as central design goals.

1. Left-Recursion Handling

Autumn implements comprehensive support for direct, indirect, and hidden left-recursion—contrary to most traditional PEG parsers, which either prohibit left-recursion entirely or only support direct instances, forcing grammarians to rewrite rules in right-recursive form. Hidden left-recursion is supported through constructs like X=Y?XX = Y?\,X. The parser allows explicit choice of associativity (left or right) via annotations such as @left_recur, removing the necessity for grammar refactoring and enabling direct expression of language rules.

The central algorithm maintains a seeds table seeds={}\text{seeds} = \{\} for tracking results at each (expr,position)(\text{expr}, \text{position}) tuple, and a blocked set to manage recursion according to the associativity specification. Parsing proceeds by incrementally "growing the seed" and returns the longest-consumed input result. While left-recursive expressions are being parsed, memoization is deferred until the process stabilizes, ensuring correctness and termination.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
\begin{algorithm}
seeds = \{\}
blocked = []
\Parse{expr:} left-recursive expression \At {position}
{
    \If{seeds[position][expr] exists}{\Return seeds[position][expr]}
    \ElseIf{blocked contains expr}{\Return failure}
    current = failure
    seeds[position][expr] = failure
    \If{expr is left-associative}{blocked.add(expr)}
    \Loop{
        result = parse(expr.operand)
        \If{result consumed more input than current}{
            current = result
            seeds[position][expr] = result
        }
        \Else{
            remove seeds[position][expr]
            \If{expr is left-associative}{blocked.remove(expr)}
            \Return current
        }
    }
}
\end{algorithm}

This mechanism ensures termination and enables explicit, user-controllable associativity.

2. Associativity and Precedence Management

Autumn provides native constructs for associativity and precedence within PEG grammars. Precedence is encoded as values associated with grammar expressions and maintained in a global parse state. Parsing is permitted only if an expression’s precedence is at least as high as the current requirement; successful matches advance the active precedence. This strategy yields clear, local error prevention and makes operator precedence explicit.

Associativity is enforced by precedence bumps within 'left' groups; left-associativity is activated by raising the minimum required precedence, whereas right-associativity is default. All expressions in a group must share associativity when at equal precedence levels, a restriction that supports efficient and predictable operator behavior.

3. Efficient Operator Parsing via Expression Clusters

Infix and postfix operators—especially across multiple precedence and associativity layers—have traditionally presented significant parsing inefficiencies in PEG libraries. Autumn resolves this by introducing the ‘expression cluster’ parsing operator. An expression cluster consolidates all operator alternates (binary, unary, literals) into a single group, optionally annotated with precedence (@+) and associativity (@left_recur).

The parsing algorithm for clusters iterates across precedence groups, tries higher precedence first, and only descends as needed. Left-associative alternates trigger precedence increments that block recursion at current levels, guaranteeing correct associativity and operator layering.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
\begin{algorithm}
seeds = \{\}
precedences = \{\}
\Parse{expr:} cluster expression \At {position}
{
    \If{seeds[position][expr] exists}{\Return seeds[position][expr]}
    current = failure
    seeds[position][expr] = failure
    min_precedence = precedences[expr] if defined else 0
    loop:
    \For{group in expr.groups}{
        \If{group.precedence < min_precedence}{break}
        precedences[expr] = group.precedence + (group.left_associative ? 1 : 0)
        \For{op in group.ops}{
            result = parse(op)
            \If{result consumed more input than current}{
                current = result
                seeds[position][expr] = result
                goto loop
            }
        }
    }
    remove seeds[position][expr]
    \If{no other ongoing invocation of expr}{remove precedences[expr]}
    \Return current
}
\end{algorithm}

The time complexity for operator-rich grammars is O(PL)O(P \cdot L) for PP operators and LL precedence levels, matching the efficiency of hand-written recursive descent. Minimal intervention is required by the user, with optimal performance regardless of grammar complexity.

Example (PEG syntax):

1
2
3
4
5
6
E = expr
    -> E '+' E @+ @left_recur
    -> E '-' E
    -> E '*' E @+ @left_recur
    -> E '/' E
    -> [0-9] @+

4. Extensibility and Modular Customization

Autumn is architected for extensibility, allowing researchers and practitioners to introduce custom parsing operators via the ParsingExpression interface. The essential requirement is single-parse determinism: matching an expression at a given position must yield identical results and state changes, permitting safe memoization and composability.

Memoization is pluggable; the default is (expression,position)(\text{expression}, \text{position}), but can be extended (e.g., to include precedence) for more complex use-cases. Error handling is similarly modular; the default strategy tracks and reports the farthest failure position with context, but richer diagnostics may be introduced at the user’s discretion.

Grammar instrumentation utilizes visitor-style transformation passes over the expression graph—enabling logging, debug breakpoints, or syntax tree annotations. Syntax tree construction is decoupled from grammar structure, allowing flattened, unified subtrees via ‘captures’—user-annotated expressions marked for inclusion in the output AST.

Whitespace handling is managed via annotation of token expressions and is not hardwired globally, which supports grammars for languages with complex or heterogeneous lexical structures.

5. Performance Profile and Comparative Benchmarks

Autumn demonstrates competitive performance on large-scale real-world parsing tasks. Parsing 34 MB of Java (Spring framework) with parse-tree generation, Autumn achieves:

Parser Time (Single) Time (Iterated) Memory
Autumn 13.17 s 12.66 s 6,154 KB
Mouse 101.43 s 99.93 s 45,952 KB
Parboiled 12.02 s 11.45 s 13,921 KB
Rats! 5.95 s 2.41 s 10,632 KB
ANTLR v4 4.63 s 2.31 s 44,432 KB

Autumn is within the same order of magnitude as established PEG/CFG parsers (Parboiled, Rats!), approximately two to three times slower than the highly optimized ANTLR and Rats!, but substantially faster (~8x) than Mouse, a minimal PEG parser generator. The introduction of expression clusters in Autumn improved its own parse performance by a factor of three. Memory usage is significantly lower than most alternatives, and the implementation used in benchmarking involved no heavy manual optimization, suggesting scope for further performance improvements.

6. Practical Impact and Implementation Considerations

Autumn advances the PEG library field by robustly handling direct, indirect, and hidden left-recursion with explicit user-directed associativity. The expression cluster innovation permits high-performance, readable grammars for languages with rich operator sets. By facilitating extensible parsing operators, custom memoization, error handling, and decoupled syntax tree generation, Autumn accommodates the needs of both general language parsing and specialized DSL construction.

This design results in more maintainable grammars, easier grammar evolution, and practical integration of whitespace rules and AST annotation—features essential for contemporary language engineering and DSL authoring.

Autumn’s reference implementation and documentation are available at https://github.com/norswap/autumn.

This suggests that researchers requiring efficient, maintainable PEG-based parsers with support for complex language features and extensibility may find Autumn suitable for both experimental and production environments, particularly where operator layering, associativity control, and grammar toolchain integration are key.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Autumn DSL.