Papers
Topics
Authors
Recent
Search
2000 character limit reached

Squirrel Parser: Efficient PEG and Error Recovery

Updated 15 January 2026
  • Squirrel Parser is a PEG-based packrat parser that directly handles direct, indirect, and mutual left recursion using per-position cycle detection and fixed-point search.
  • It employs a two-phase parsing strategy for automatic error recovery, avoiding manual grammar rewrites while ensuring robust syntax error management.
  • Maintaining linear time and space complexity, the parser integrates constrained error-recovery with O(1) recursion management for efficient grammar processing.

The Squirrel Parser is a Parsing Expression Grammar (PEG) packrat parser architecture that achieves direct handling of all forms of left recursion—direct, indirect, and mutual—together with a provably optimal, fully automatic error recovery mechanism. Distinct from traditional approaches requiring grammar rewrites or explicit annotations, Squirrel applies a mathematically minimal algorithmic extension to Ford’s original packrat parser: per-position cycle detection, O(1)O(1) left-recursion management, fixed-point search, and a rigorously constrained two-phase error-recovery strategy. The system maintains linear time and space complexity in the input length and grammar size, ensuring robustness even in the presence of an arbitrary number of syntactic errors (Hutchison, 8 Jan 2026).

2 Architectural Foundations

Squirrel operates as a packrat parser—memoizing the result of each %%%%2 pair—augmented by mechanisms to support unbounded left recursion and optimal error recovery within the same asymptotic bounds as traditional packrat parsers. The critical innovations are:

  • A unified MemoEntry structure per (C,p)(C, p), holding the match result, left-recursion state (inRecPath, foundLeftRec, version), and a recovery phase flag.
  • Per-position cycleDepthForPos[p] counters, eliminating the need for explicit version stacks.
  • Two-phase operation: an initial parse (Phase 2 setting a completeness flag, followed, if errors are detected, by a second parse (Phase 2) in recovery mode.

Traditional packrat parsers suffer from infinite recursion on left-recursive rules and lack principled error recovery. Squirrel addresses both without user intervention, memo-table flushing, or grammatical transformations, preserving the O(nG)O(n \cdot |\mathcal{G}|) time/space guarantees for input length nn and grammar size G|\mathcal{G}| (Hutchison, 8 Jan 2026).

2. Direct Left-Recursion Handling

2.2^ Theoretical Basis

Squirrel’s solution to left recursion is underpinned by three principal theorems:

  • Fixed-Point Existence: Any left-recursive cycle at position pp admits a finite least fixed point. Each expansion that consumes at least one symbol ensures termination within input length.
  • Bottom-Up Necessity: The correct parse arises only by seed-and-grow: starting from failure (the mismatch seed), progressively expanding until the match no longer extends.
  • Monotonic Length Increase: Each iteration yields a strictly longer match, enforcing convergence.

Applying Kleene’s Fixed-Point Theorem, Squirrel computes the match sequence r0=r_0 = \bot and ri+1=F(ri)r_{i+1} = F(r_i) for FF as a single round of expansion, halting at the least fixed point where %%%%2

2.2 Algorithmic Mechanisms

Each MemoEntry for %%%%2 tracks:

  • inRecPath: True if the clause is on the call stack at %%%%2
  • foundLeftRec: True if a descendant detected a left-recursion cycle at this %%%%2
  • version: Current cycleDepthForPos[p] token

Upon invoking MemoEntry.match, the parser:

2 Returns the cached result if fresh and matching phase.

  1. Uses inRecPath for %%%%2 cycle detection; on re-entry, seeds the fixed-point iteration.
  2. Iteratively expands the match as long as length grows, incrementing the per-position version and propagating freshness via version.
  3. Sets cachedInRecoveryPhase per result for phase isolation.

Pseudocode:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
MemoEntry.match(parser, C, p):
  if (entry.match  null and entry.version == parser.cycleDepthForPos[p]
      and (entry.match.isComplete or entry.cachedInRecoveryPhase == parser.inRecoveryPhase)):
    return entry.match
  if (entry.inRecPath):
    entry.foundLeftRec = True
    entry.match = MISMATCH
    return entry.match
  entry.inRecPath = True
  entry.foundLeftRec = False
  entry.match = MISMATCH
  do:
    newMatch = C.match(parser, p)
    if (newMatch.len <= entry.match.len): break
    entry.match = newMatch
    parser.cycleDepthForPos[p] += 1
    entry.version = parser.cycleDepthForPos[p]
  while (entry.foundLeftRec)
  entry.inRecPath = False
  entry.cachedInRecoveryPhase = parser.inRecoveryPhase
  return entry.match

This mechanism ensures %%%%2 communication of left-recursive state, version-tagged memoization, and strictly monotonic expansion, securing both correctness and efficiency.

3. Error Recovery: Axioms and Constraints

3.2^ Core Principles

Optimal recovery satisfies four foundational axioms:

2 Packrat Invariant: No %%%%2 reevaluation within a phase.

  1. PEG Ordered-Choice: No alternate tried if preceding alternative succeeds.
  2. Monotonic Consumption: Parsing never backtracks to a position earlier than consumed.
  3. Left-Recursion Fixed-Point: Left recursion proceeds bottom-up until fixed point.

3.2 Twelve Constraints

Error recovery is further restricted by twelve design constraints, categorized as follows:

Category Constraints
Linearity Single-pass per phase (C2 memoization validity (C2), bounded recovery (C3)
Composability Clause independence (C4), referential transparency (C5)
Correctness Completeness propagation (C6), phase isolation (C7), boundary preservation (C8), non-cascading errors (C9), LR-recovery separation (C2 visibility (C2 parse-tree-spanning (C2

These criteria encode requirements for performance, compositional soundness, and user-facing intuitiveness.

3.3 Constraint-Satisfaction and Unique Minimal Algorithm

The space of recovery strategies was exhaustively searched under these requirements, using constraint encoding, LLM-based reasoning, and a 632 test suite. A unique minimal algorithm emerges, justifying the necessity of each flag and mechanism for meeting all constraints (Hutchison, 8 Jan 2026).

4. Two-Phase Parsing and Recovery Operations

The adopted strategy is a two-phase process:

  • Phase 2 Pure parsing, isComplete flag set.
  • Phase 2: Recovery phase enabled only if Phase 2 yields incomplete result.

Each MemoEntry caches the phase of population, validating hits solely under matched phase flag or complete result status, thus enforcing phase isolation.

Phase Transition Pseudocode:

!!!!2

Recovery itself is performed by linear skipping at error points: upon failure in inRecoveryPhase, the parser attempts to skip 2 2, ... up to MAX_SKIP characters, wrapping each skipped span in a SyntaxError node and continuing, constrained by clause boundaries and forbidding LTC-context recovery.

The error recovery logic ensures:

  • Compositional locality: Skips are local to the failing grammar node (C9).
  • Boundary adherence: Skips do not traverse into sibling regions (C8).
  • No memo-table pollution: Recovery steps do not break memoization or cause exponential replay.

5. Complexity Analysis

By Theorem 5.2^ (Hutchison, 8 Jan 2026), Squirrel ensures both time and space complexity of %%%%2 The proof relies on:

  • At most one memoized match per %%%%2 per phase (%%%%2 calls).
  • Left-recursion expansion per position bounded by input length (O(n)O(n) total).
  • Error-recovery skips cost at most %%%%22 traversals.
  • Unmemoized recursive structures generate trees of O(n)O(n) nodes.

Aggregating these yields the global bound.

6. Illustrative Examples

Left Recursion: EE + T  TE \leftarrow E\ +\ T\ |\ T, Input "1+2+3"

At position 0:

2 Phase 2 E.match(0)E.match(0) calls itself (first alternative); inRecPathinRecPath triggers cycle, seeds mismatch.

  1. Fixed-point expansion:

    • Seed r0=r_0 = \bot; TT matches "2 r1.len=1r_1.\mathrm{len} = 1
    • Next, E+ TE+\ T at pos 2 (EE spans "2 matches "2 %%%%32
    • Repeat, E+ TE+\ T at pos71+27": matches "1+2+3": r3.len=5r_3.\mathrm{len} = 5
    • No further match increase: fixed point attained, resulting left-associative tree:

    E(E(E("1") + "2") + "3")E(E(E(\text{"1"})\ +\ \text{"2"})\ +\ \text{"3"})

Error Recovery: List Grammar

Grammar: ListItem(, Item),\mathrm{List} \leftarrow \mathrm{Item} (','\ \mathrm{Item})^*, Item[09]+\mathrm{Item} \leftarrow [0-9]+

Input: "2 ,2"

  • Phase 2 Item\mathrm{Item} at pos 2 fails (space); propagate isComplete=false\mathrm{isComplete} = \mathrm{false}
  • Phase 2: Item\mathrm{Item} at pos 2 attempts skip 2 (" "); still not digit. Attempts skip 2 (", "), then matches "2"; SyntaxErrorSyntaxError node wraps skipped region.

Result: %%%%42 node whose children are "2 SyntaxErrorSyntaxError(",", " "), "2", with isComplete=true\mathrm{isComplete}=true. The parse-tree covers the complete input.

7. Significance and Research Context

The Squirrel Parser constitutes a strictly minimal, mathematically justified extension of Ford’s PEG packrat framework. It resolves direct, indirect, and mutual left recursion using per-position cycle-detected fixed-point iteration, and error recovery by a uniquely constrained, automatic two-phase mechanism. No manual annotation, grammar rewriting, or loss of O(nG)O(n \cdot |\mathcal{G}|) performance is required. These results are supported by formal theorems, proof sketches, and comprehensive empirical validation (Hutchison, 8 Jan 2026). The construction represents a robust reference point for future research in grammar-based parsing, error recovery, and the semantics of compositional language processors.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Squirrel Parser.