Autumn DSL: Robust PEG Parsing
- Autumn DSL is a powerful PEG parsing library that extends standard PEG capabilities through comprehensive left-recursion support and explicit associativity handling.
- It incorporates an innovative expression cluster mechanism to efficiently parse multiple operator types while ensuring predictable precedence and performance.
- The library offers extensibility through modular customization, enabling robust syntax tree generation, error handling, and memoization for complex grammars.
Autumn is a general-purpose Parsing Expression Grammar (PEG) library designed to address core practical limitations of the PEG formalism, particularly in the context of left-recursive grammar rules, associativity, precedence, and efficient operator handling. PEGs specify languages by way of recursive-descent parsers, providing closure under composition, built-in disambiguation, and a close match to programmer intuition. Autumn extends the PEG paradigm by delivering robust, extensible parsing capabilities required for both language engineering and domain-specific languages (DSLs), with efficiency and maintainability as central design goals.
1. Left-Recursion Handling
Autumn implements comprehensive support for direct, indirect, and hidden left-recursion—contrary to most traditional PEG parsers, which either prohibit left-recursion entirely or only support direct instances, forcing grammarians to rewrite rules in right-recursive form. Hidden left-recursion is supported through constructs like . The parser allows explicit choice of associativity (left or right) via annotations such as @left_recur, removing the necessity for grammar refactoring and enabling direct expression of language rules.
The central algorithm maintains a seeds table for tracking results at each tuple, and a blocked set to manage recursion according to the associativity specification. Parsing proceeds by incrementally "growing the seed" and returns the longest-consumed input result. While left-recursive expressions are being parsed, memoization is deferred until the process stabilizes, ensuring correctness and termination.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
\begin{algorithm}
seeds = \{\}
blocked = []
\Parse{expr:} left-recursive expression \At {position}
{
\If{seeds[position][expr] exists}{\Return seeds[position][expr]}
\ElseIf{blocked contains expr}{\Return failure}
current = failure
seeds[position][expr] = failure
\If{expr is left-associative}{blocked.add(expr)}
\Loop{
result = parse(expr.operand)
\If{result consumed more input than current}{
current = result
seeds[position][expr] = result
}
\Else{
remove seeds[position][expr]
\If{expr is left-associative}{blocked.remove(expr)}
\Return current
}
}
}
\end{algorithm} |
This mechanism ensures termination and enables explicit, user-controllable associativity.
2. Associativity and Precedence Management
Autumn provides native constructs for associativity and precedence within PEG grammars. Precedence is encoded as values associated with grammar expressions and maintained in a global parse state. Parsing is permitted only if an expression’s precedence is at least as high as the current requirement; successful matches advance the active precedence. This strategy yields clear, local error prevention and makes operator precedence explicit.
Associativity is enforced by precedence bumps within 'left' groups; left-associativity is activated by raising the minimum required precedence, whereas right-associativity is default. All expressions in a group must share associativity when at equal precedence levels, a restriction that supports efficient and predictable operator behavior.
3. Efficient Operator Parsing via Expression Clusters
Infix and postfix operators—especially across multiple precedence and associativity layers—have traditionally presented significant parsing inefficiencies in PEG libraries. Autumn resolves this by introducing the ‘expression cluster’ parsing operator. An expression cluster consolidates all operator alternates (binary, unary, literals) into a single group, optionally annotated with precedence (@+) and associativity (@left_recur).
The parsing algorithm for clusters iterates across precedence groups, tries higher precedence first, and only descends as needed. Left-associative alternates trigger precedence increments that block recursion at current levels, guaranteeing correct associativity and operator layering.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
\begin{algorithm}
seeds = \{\}
precedences = \{\}
\Parse{expr:} cluster expression \At {position}
{
\If{seeds[position][expr] exists}{\Return seeds[position][expr]}
current = failure
seeds[position][expr] = failure
min_precedence = precedences[expr] if defined else 0
loop:
\For{group in expr.groups}{
\If{group.precedence < min_precedence}{break}
precedences[expr] = group.precedence + (group.left_associative ? 1 : 0)
\For{op in group.ops}{
result = parse(op)
\If{result consumed more input than current}{
current = result
seeds[position][expr] = result
goto loop
}
}
}
remove seeds[position][expr]
\If{no other ongoing invocation of expr}{remove precedences[expr]}
\Return current
}
\end{algorithm} |
The time complexity for operator-rich grammars is for operators and precedence levels, matching the efficiency of hand-written recursive descent. Minimal intervention is required by the user, with optimal performance regardless of grammar complexity.
Example (PEG syntax):
1 2 3 4 5 6 |
E = expr
-> E '+' E @+ @left_recur
-> E '-' E
-> E '*' E @+ @left_recur
-> E '/' E
-> [0-9] @+ |
4. Extensibility and Modular Customization
Autumn is architected for extensibility, allowing researchers and practitioners to introduce custom parsing operators via the ParsingExpression interface. The essential requirement is single-parse determinism: matching an expression at a given position must yield identical results and state changes, permitting safe memoization and composability.
Memoization is pluggable; the default is , but can be extended (e.g., to include precedence) for more complex use-cases. Error handling is similarly modular; the default strategy tracks and reports the farthest failure position with context, but richer diagnostics may be introduced at the user’s discretion.
Grammar instrumentation utilizes visitor-style transformation passes over the expression graph—enabling logging, debug breakpoints, or syntax tree annotations. Syntax tree construction is decoupled from grammar structure, allowing flattened, unified subtrees via ‘captures’—user-annotated expressions marked for inclusion in the output AST.
Whitespace handling is managed via annotation of token expressions and is not hardwired globally, which supports grammars for languages with complex or heterogeneous lexical structures.
5. Performance Profile and Comparative Benchmarks
Autumn demonstrates competitive performance on large-scale real-world parsing tasks. Parsing 34 MB of Java (Spring framework) with parse-tree generation, Autumn achieves:
| Parser | Time (Single) | Time (Iterated) | Memory |
|---|---|---|---|
| Autumn | 13.17 s | 12.66 s | 6,154 KB |
| Mouse | 101.43 s | 99.93 s | 45,952 KB |
| Parboiled | 12.02 s | 11.45 s | 13,921 KB |
| Rats! | 5.95 s | 2.41 s | 10,632 KB |
| ANTLR v4 | 4.63 s | 2.31 s | 44,432 KB |
Autumn is within the same order of magnitude as established PEG/CFG parsers (Parboiled, Rats!), approximately two to three times slower than the highly optimized ANTLR and Rats!, but substantially faster (~8x) than Mouse, a minimal PEG parser generator. The introduction of expression clusters in Autumn improved its own parse performance by a factor of three. Memory usage is significantly lower than most alternatives, and the implementation used in benchmarking involved no heavy manual optimization, suggesting scope for further performance improvements.
6. Practical Impact and Implementation Considerations
Autumn advances the PEG library field by robustly handling direct, indirect, and hidden left-recursion with explicit user-directed associativity. The expression cluster innovation permits high-performance, readable grammars for languages with rich operator sets. By facilitating extensible parsing operators, custom memoization, error handling, and decoupled syntax tree generation, Autumn accommodates the needs of both general language parsing and specialized DSL construction.
This design results in more maintainable grammars, easier grammar evolution, and practical integration of whitespace rules and AST annotation—features essential for contemporary language engineering and DSL authoring.
Autumn’s reference implementation and documentation are available at https://github.com/norswap/autumn.
This suggests that researchers requiring efficient, maintainable PEG-based parsers with support for complex language features and extensibility may find Autumn suitable for both experimental and production environments, particularly where operator layering, associativity control, and grammar toolchain integration are key.