Papers
Topics
Authors
Recent
2000 character limit reached

Resiliparse Parser: Robust PEG Error Recovery

Updated 27 December 2025
  • Resiliparse Parser is a PEG engine enhanced with labeled failures and recovery expressions, enabling robust syntax error recovery and partial AST construction.
  • It incorporates error logging, token skipping, and placeholder insertion to maintain AST integrity in interactive development environments.
  • Empirical studies show that Resiliparse achieves significant speed improvements and more targeted error reporting compared to traditional PEG and ANTLR parsers.

A Resiliparse parser is defined as a parsing engine for Parsing Expression Grammars (PEGs) that augments traditional deterministic top-down parsing with labeled failures and per-label recovery expressions, enabling robust syntax error recovery and partial Abstract Syntax Tree (AST) construction suitable for integrated development environments (IDEs) and other interactive tooling (Medeiros et al., 2018). The approach modifies PEG formalism to allow error reporting and fine-grained resynchronization, overcoming intrinsic limitations of conventional PEG-based parsers that typically abort or rely on ad-hoc extensions upon encountering invalid or incomplete input.

1. Formal Definition of Parsing Expression Grammars

Parsing Expression Grammars are formally defined as G=(V,T,P,ps)G = (V, T, P, p_s) where:

  • VV is the finite set of non-terminals,
  • TT is the finite set of terminals,
  • P:VExprP: V \to \text{Expr} maps each AVA \in V to a parsing expression P(A)P(A),
  • psExprp_s \in \text{Expr} is the start expression.

The repertoire of parsing expressions consists of ϵ\epsilon (empty), aa (terminal), AA (non-terminal), p1p2p_1 p_2 (sequence), p1/p2p_1 / p_2 (ordered choice), pp^* (zero-or-more), and !p!p (negative predicate). PEGs define a deterministic, top-down, backtracking parsing process. Match failures conventionally trigger backtracking, not immediate syntax errors. In the absence of explicit recovery, PEG-based parsers halt upon unresolvable failure, rendering them inadequate for scenarios requiring partial ASTs (e.g., during code editing or completion in IDEs) (Medeiros et al., 2018).

2. Labeled Failures and Recovery Expressions

To equip PEGs with structured error recovery, the Resiliparse methodology introduces a throw operator $\,^{\wedge}l\,$, where ll is an error label. The resulting grammar is extended as G=(V,T,P,L,fail,ps)G = (V, T, P, L, \mathrm{fail}, p_s), with LL a finite set of labels disjoint from the distinguished fail\mathrm{fail} used for “ordinary” failure.

Semantically, $\,^{\wedge}l\,$ immediately signals label ll. If lfaill \ne \mathrm{fail} the label is propagated—ordered choice (//) does not intercept it, thereby denoting a true syntax error. A recovery map RR assigns to each ll an auxiliary parsing expression R(l)R(l) (the recovery expression). When $\,^{\wedge}l\,$ is thrown and R(l)R(l) exists, the parser:

  1. Logs (l,position)(l,\text{position}),
  2. Invokes R(l)R(l) to skip tokens and resynchronize,
  3. Resumes parsing the remainder of the grammar.

For example, a block-ending label in a Java-like grammar:

1
2
3
BlockStmt ← LCUR (Stmt)* [RCUR]^{rcblk}
recovery(rcblk) = SkipToRCUR
SkipToRCUR ← (!RCUR (LCUR SkipToRCUR / .))* RCUR
ensures robust handling of missing or misplaced block ends (Medeiros et al., 2018).

3. Operational Semantics and Inference

Resiliparse parsers formalize error recovery with operational semantics. Parsing is denoted as

G[p]  R  input(remaining,farthest-failure,logged-errors)G[p]\; R\; \text{input} \Longrightarrow (\text{remaining}, \text{farthest-failure}, \text{logged-errors})

or error(l)\Longrightarrow \mathrm{error}(l) if unrecoverable. Key rules:

  • Throw with no recovery:

If ldom(R)l \notin \mathrm{dom}(R),

$G[\,^{\wedge}l\,]\,R\,x \Longrightarrow \mathrm{error}(l)$

  • Throw with recovery:

If R(l)=rR(l)=r and recovery parses to (y,f,L)(y, f, L),

$G[\,^{\wedge}l\,]\,R\, x y \Longrightarrow (y, f, (l,x)::L)$

  • Ordered choice:

Only failures with label fail\mathrm{fail} trigger the alternate, non-fail\mathrm{fail} labels propagate as errors.

This framework preserves deterministic parsing while tracking error positions and labels, facilitating downstream error reporting and AST placeholder insertion (Medeiros et al., 2018).

4. Design of Recovery Expressions

Recovery expressions are designed to advance parsing to a synchronization point, typically determined via FOLLOW-set tokens (e.g., semicolon, closing brace). Nesting must be respected to avoid misalignment in structured constructs. Whenever possible, attempts are made to salvage partial parses of subexpressions, increasing AST fidelity.

Examples include:

  • Skip until next semicolon:

(!;.);(!\mathtt{;}\,.)^*\,\mathtt{;}

  • Skip to matching ‘}’ (handling nested blocks):

(!RCUR(LCURSkipToRCUR/.))RCUR(!\,\mathtt{RCUR}\,(\mathtt{LCUR}\,\mathit{SkipToRCUR}\,/\,.)\,)^*\,\mathtt{RCUR}

  • For subexpression failures with known FOLLOW:

(!(RPAR/SEMI).)(!(\mathtt{RPAR}/\mathtt{SEMI})\,.)^*

A plausible implication is that precise recovery expressions reduce loss of valid AST subtrees and mitigate error cascades, especially in nested or recursive syntactic constructs (Medeiros et al., 2018).

5. Implementation in the Lua Parser

A case study in (Medeiros et al., 2018) implemented Resiliparse principles via the LPegLabel extension of the Lua grammar. The grammar, based on the Lua reference manual, employed approximately 75 labels, each annotated following the heuristic “every symbol whose failure cannot sensibly backtrack.” The parser architecture utilizes a packrat-style engine that tracks the farthest failure, manages an R-map of recovery expressions, and logs error-label-position pairs.

AST construction continues even after recovery, inserting placeholder nodes (such as “MissingSemicolon”) to ensure downstream static analyses obtain structurally valid trees. Error recovery and reporting for IDE scenarios benefit from this method, yielding immediate and localized user feedback on syntax errors (Medeiros et al., 2018).

6. Empirical Evaluation and Comparative Performance

Recovery quality on 180 invalid Lua programs was rated as follows:

Rating Number of Programs Percentage
Excellent 100 56%
Good 63 35%
Poor 17 9%
Failed 0 0%

A direct comparison with an ANTLR-generated Lua parser on the same input corpus:

Condition Number of Files Percentage
ANTLR reports more errors 56 31%
PEG reports more errors 14 8%
Same number of errors 110 61%

The ANTLR parser was observed to emit more spurious (irrelevant) errors, while the PEG-based approach produced more targeted recoveries. Regarding performance (measured in ms, averaged over 20 runs):

File LPegLabel ANTLR
broke.lua 14 89
Lua test suite 94 647

The PEG-based parser demonstrated approximately 6×6\times speedup and avoided costly re-parsing on syntax errors (Medeiros et al., 2018).

7. Guidelines and Best Practices for Resiliparse Parser Construction

Key guidelines for building a Resiliparse parser include:

  • Labeling strategy: Annotate every grammar symbol where failure represents an actionable syntax error, not a point for ordinary backtracking. Consistent naming of labels (e.g., “semia”, “rcblk”, “condw”) supports precise, user-friendly error messages.
  • Recovery expression definition: Base “default” recovery on FIRST/FOLLOW analysis (consume until a FOLLOW token). Employ custom recovery for block boundaries and complex nested constructs, preferring conservative skips close to the error site to maintain AST coverage.
  • AST consistency: On recovery, insert placeholder nodes. This ensures that the AST remains structurally complete for static analysis or code tooling.
  • Tooling: Integrate label/position with farthest-failure information to provide IDE hooks and fallback messaging. Ensure that all recovery expressions are total (terminate on all inputs), avoiding left recursion and infinite skips.

By adhering to these principles—labeled failures, expressive and context-sensitive recoveries, and careful synchronization—a Resiliparse parser can achieve actionable syntax error reporting, near-complete AST generation for incomplete/invalid code, and high parsing performance, robustly addressing limitations of naïve PEG-based tools (Medeiros et al., 2018).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Resiliparse Parser.