Squirrel Parser: Efficient PEG and Error Recovery
- Squirrel Parser is a PEG-based packrat parser that directly handles direct, indirect, and mutual left recursion using per-position cycle detection and fixed-point search.
- It employs a two-phase parsing strategy for automatic error recovery, avoiding manual grammar rewrites while ensuring robust syntax error management.
- Maintaining linear time and space complexity, the parser integrates constrained error-recovery with O(1) recursion management for efficient grammar processing.
The Squirrel Parser is a Parsing Expression Grammar (PEG) packrat parser architecture that achieves direct handling of all forms of left recursion—direct, indirect, and mutual—together with a provably optimal, fully automatic error recovery mechanism. Distinct from traditional approaches requiring grammar rewrites or explicit annotations, Squirrel applies a mathematically minimal algorithmic extension to Ford’s original packrat parser: per-position cycle detection, left-recursion management, fixed-point search, and a rigorously constrained two-phase error-recovery strategy. The system maintains linear time and space complexity in the input length and grammar size, ensuring robustness even in the presence of an arbitrary number of syntactic errors (Hutchison, 8 Jan 2026).
2 Architectural Foundations
Squirrel operates as a packrat parser—memoizing the result of each %%%%2 pair—augmented by mechanisms to support unbounded left recursion and optimal error recovery within the same asymptotic bounds as traditional packrat parsers. The critical innovations are:
- A unified
MemoEntrystructure per , holding the match result, left-recursion state (inRecPath,foundLeftRec,version), and a recovery phase flag. - Per-position
cycleDepthForPos[p]counters, eliminating the need for explicit version stacks. - Two-phase operation: an initial parse (Phase 2 setting a completeness flag, followed, if errors are detected, by a second parse (Phase 2) in recovery mode.
Traditional packrat parsers suffer from infinite recursion on left-recursive rules and lack principled error recovery. Squirrel addresses both without user intervention, memo-table flushing, or grammatical transformations, preserving the time/space guarantees for input length and grammar size (Hutchison, 8 Jan 2026).
2. Direct Left-Recursion Handling
2.2^ Theoretical Basis
Squirrel’s solution to left recursion is underpinned by three principal theorems:
- Fixed-Point Existence: Any left-recursive cycle at position admits a finite least fixed point. Each expansion that consumes at least one symbol ensures termination within input length.
- Bottom-Up Necessity: The correct parse arises only by seed-and-grow: starting from failure (the mismatch seed), progressively expanding until the match no longer extends.
- Monotonic Length Increase: Each iteration yields a strictly longer match, enforcing convergence.
Applying Kleene’s Fixed-Point Theorem, Squirrel computes the match sequence and for as a single round of expansion, halting at the least fixed point where %%%%2
2.2 Algorithmic Mechanisms
Each MemoEntry for %%%%2 tracks:
inRecPath: True if the clause is on the call stack at %%%%2foundLeftRec: True if a descendant detected a left-recursion cycle at this %%%%2version: CurrentcycleDepthForPos[p]token
Upon invoking MemoEntry.match, the parser:
2 Returns the cached result if fresh and matching phase.
- Uses
inRecPathfor %%%%2 cycle detection; on re-entry, seeds the fixed-point iteration. - Iteratively expands the match as long as length grows, incrementing the per-position version and propagating freshness via
version. - Sets
cachedInRecoveryPhaseper result for phase isolation.
Pseudocode:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
MemoEntry.match(parser, C, p): if (entry.match ≠ null and entry.version == parser.cycleDepthForPos[p] and (entry.match.isComplete or entry.cachedInRecoveryPhase == parser.inRecoveryPhase)): return entry.match if (entry.inRecPath): entry.foundLeftRec = True entry.match = MISMATCH return entry.match entry.inRecPath = True entry.foundLeftRec = False entry.match = MISMATCH do: newMatch = C.match(parser, p) if (newMatch.len <= entry.match.len): break entry.match = newMatch parser.cycleDepthForPos[p] += 1 entry.version = parser.cycleDepthForPos[p] while (entry.foundLeftRec) entry.inRecPath = False entry.cachedInRecoveryPhase = parser.inRecoveryPhase return entry.match |
This mechanism ensures %%%%2 communication of left-recursive state, version-tagged memoization, and strictly monotonic expansion, securing both correctness and efficiency.
3. Error Recovery: Axioms and Constraints
3.2^ Core Principles
Optimal recovery satisfies four foundational axioms:
2 Packrat Invariant: No %%%%2 reevaluation within a phase.
- PEG Ordered-Choice: No alternate tried if preceding alternative succeeds.
- Monotonic Consumption: Parsing never backtracks to a position earlier than consumed.
- Left-Recursion Fixed-Point: Left recursion proceeds bottom-up until fixed point.
3.2 Twelve Constraints
Error recovery is further restricted by twelve design constraints, categorized as follows:
| Category | Constraints |
|---|---|
| Linearity | Single-pass per phase (C2 memoization validity (C2), bounded recovery (C3) |
| Composability | Clause independence (C4), referential transparency (C5) |
| Correctness | Completeness propagation (C6), phase isolation (C7), boundary preservation (C8), non-cascading errors (C9), LR-recovery separation (C2 visibility (C2 parse-tree-spanning (C2 |
These criteria encode requirements for performance, compositional soundness, and user-facing intuitiveness.
3.3 Constraint-Satisfaction and Unique Minimal Algorithm
The space of recovery strategies was exhaustively searched under these requirements, using constraint encoding, LLM-based reasoning, and a 632 test suite. A unique minimal algorithm emerges, justifying the necessity of each flag and mechanism for meeting all constraints (Hutchison, 8 Jan 2026).
4. Two-Phase Parsing and Recovery Operations
The adopted strategy is a two-phase process:
- Phase 2 Pure parsing, isComplete flag set.
- Phase 2: Recovery phase enabled only if Phase 2 yields incomplete result.
Each MemoEntry caches the phase of population, validating hits solely under matched phase flag or complete result status, thus enforcing phase isolation.
Phase Transition Pseudocode:
!!!!2
Recovery itself is performed by linear skipping at error points: upon failure in inRecoveryPhase, the parser attempts to skip 2 2, ... up to MAX_SKIP characters, wrapping each skipped span in a SyntaxError node and continuing, constrained by clause boundaries and forbidding LTC-context recovery.
The error recovery logic ensures:
- Compositional locality: Skips are local to the failing grammar node (C9).
- Boundary adherence: Skips do not traverse into sibling regions (C8).
- No memo-table pollution: Recovery steps do not break memoization or cause exponential replay.
5. Complexity Analysis
By Theorem 5.2^ (Hutchison, 8 Jan 2026), Squirrel ensures both time and space complexity of %%%%2 The proof relies on:
- At most one memoized match per %%%%2 per phase (%%%%2 calls).
- Left-recursion expansion per position bounded by input length ( total).
- Error-recovery skips cost at most %%%%22 traversals.
- Unmemoized recursive structures generate trees of nodes.
Aggregating these yields the global bound.
6. Illustrative Examples
Left Recursion: , Input "1+2+3"
At position 0:
2 Phase 2 calls itself (first alternative); triggers cycle, seeds mismatch.
- Fixed-point expansion:
- Seed ; matches "2
- Next, at pos 2 ( spans "2 matches "2 %%%%32
- Repeat, at pos71+27": matches "1+2+3":
- No further match increase: fixed point attained, resulting left-associative tree:
Error Recovery: List Grammar
Grammar:
Input: "2 ,2"
- Phase 2 at pos 2 fails (space); propagate
- Phase 2: at pos 2 attempts skip 2 (" "); still not digit. Attempts skip 2 (", "), then matches "2"; node wraps skipped region.
Result: %%%%42 node whose children are "2 (",", " "), "2", with . The parse-tree covers the complete input.
7. Significance and Research Context
The Squirrel Parser constitutes a strictly minimal, mathematically justified extension of Ford’s PEG packrat framework. It resolves direct, indirect, and mutual left recursion using per-position cycle-detected fixed-point iteration, and error recovery by a uniquely constrained, automatic two-phase mechanism. No manual annotation, grammar rewriting, or loss of performance is required. These results are supported by formal theorems, proof sketches, and comprehensive empirical validation (Hutchison, 8 Jan 2026). The construction represents a robust reference point for future research in grammar-based parsing, error recovery, and the semantics of compositional language processors.