OSTRICH: String Constraint Solving Framework
- OSTRICH is an automata- and pre-image-based string constraint solving framework that decides path feasibility and constraint satisfiability in programs.
- It supports rich string operations—including concatenation, reverse, and ECMAScript regexes—using cost-enriched finite automata and a modular, extensible SMT-based architecture.
- Employing regular constraint propagation with formal completeness guarantees, the framework improves solver performance by up to 74% on challenging benchmarks.
The OSTRICH String Constraint Solving Framework is an automata- and pre-image-based string solver architecture designed for deciding path feasibility and constraint satisfiability in programs with complex string operations, including those encountered in symbolic execution, security analysis, and formal verification. OSTRICH operates by systematically propagating regular language constraints through assignments and operations on string variables, leveraging the regularity-preserving properties of most practical string functions. The framework is extensible, supporting user-defined operations, and provides formal completeness guarantees for core fragments. OSTRICH and its successors (including OSTRICH+, OSTRICH2, and ) combine automata-theoretic reasoning, constraint propagation, and integer reasoning to address the semantic and algorithmic challenges of modern string analysis problems (Chen et al., 2018, Chen et al., 2020, Hague et al., 17 Jun 2025, Hague et al., 27 Aug 2025, Hu et al., 31 Aug 2025).
1. Core Principles and Decidable Fragments
OSTRICH’s decision procedures are grounded on two semantic conditions ensuring decidability: (1) all assertions (e.g., membership, equality) must admit a regular monadic decomposition, meaning the set of solutions can be expressed as a finite union of Cartesian products of regular languages; (2) for each string assignment , the function must have the property that the (inverse) pre-image of a regular language under , , is again a recognisable relation (finite union of products of regular languages). This setup subsumes many classes of string operations, including concatenation, reverse, functional finite-state transducers, and .
OSTRICH delivers completeness for the straight-line fragment—where assignments can be topologically ordered without cycles—and the more expressive chain-free fragment, which allows acyclic dependencies in a more general constraint graph (Hague et al., 17 Jun 2025). These encompass nearly all straight-line symbolic execution traces and many practically relevant constraint encodings (Chen et al., 2018, Abdulla et al., 2023, Hague et al., 27 Aug 2025).
2. Automata-Theoretic Constraint Propagation
At the heart of OSTRICH is the repeated computation of pre-images and post-images of regular languages along assignments and function applications. Suppose and the input variables are constrained to regular languages :
- Forward propagation: If is regularity-preserving, the image is again regular; infer for the image language .
- Backward propagation: Given and , compute as a finite union of products of regular languages and propagate the constraints to each (Hague et al., 27 Aug 2025).
This process is formalized in a sequent calculus with proof rules for forward/backward propagation, equation splitting (for handling word equations), and closure (for detecting contradiction when intersected languages are empty).
OSTRICH exploits distributivity for deterministic functions: e.g., for a deterministic , . This containment is critical for the succinct and efficient handling of straight-line and chain-free fragments, avoiding the combinatorial explosion present for nondeterministic functions (Chen et al., 2018, Hague et al., 27 Aug 2025).
3. Support for Rich String Operations
The OSTRICH framework natively handles:
- Concatenation and reverse
- Functional transductions (finite-state, deterministic by default; limited support for restricted nondeterminism)
- with symbolic patterns and replacements (not just constants)
- Regular expression membership and extension to ECMAScript regexes, including capture groups and lookarounds (OSTRICH2)
- Integer-valued string functions (e.g., length, , substring) via cost-enriched automata (OSTRICH+)
- Complex sequence operations for arrays/lists of strings via a reduction to string constraints () (Hu et al., 31 Aug 2025)
A major technical device is the cost-enriched finite automaton (CEFA)—an FSA with counters/running sums—enabling the unified treatment of string and integer constraints. For each extension (e.g., join, split, sequence write), pre-image computation routines are provided with precise handling of the relevant cost registers (Chen et al., 2020, Hu et al., 31 Aug 2025).
4. Extended Solving Strategies: Regular Constraint Propagation
Recent developments, including (Hague et al., 27 Aug 2025), establish regular constraint propagation (RCP) as a central and generic proof procedure for string constraints. RCP iteratively applies forward and backward propagation of regular languages:
- Forward: Pushing constraints through functions and assignments.
- Backward: Pulling constraints against functions, splitting as needed over the union of potential pre-images.
This strategy is shown to be both sound and complete for the orderable fragment—strictly subsuming straight-line and chain-free constraints—and significantly improves practical solver performance, especially on hard benchmarks (random Post Correspondence Problem, bioinformatics) where prior approaches based on equation splitting or search-based heuristics were insufficient. Integration of RCP into OSTRICH yields up to 74% improvement on previously unsolved benchmarks (Hague et al., 27 Aug 2025).
5. Algorithmic Architecture and Extensibility
OSTRICH is implemented in Scala atop the Princess SMT engine. The architecture is modular and supports user-defined functions through the PreOp interface: to add a function, provide code for concrete evaluation and for pre-image computation on automata or cost-enriched automata.
OSTRICH2 generalizes the architecture into a portfolio of engines:
- ADT-Str: modeling strings as algebraic data types for axiomatizing complex behaviors such as string-to-int conversions.
- Regular Constraint Propagation (RCP): propagation-based kernel with forward/backward pre-image and intersection rules.
- CE-Str: CEFA-based handling of integer and string interleaving constraints.
This modularity allows OSTRICH2 to be easily extended (e.g., support of ECMAScript regular expressions, user-defined prioritized transducers, and SMT-LIB standard-compliant Unicode) and enables the application of time-sliced or heuristic-guided scheduling among engines for maximum robustness (Hague et al., 17 Jun 2025).
6. Expressiveness, Completeness Guarantees, and Limitations
OSTRICH’s expressiveness encompasses the core theory of strings with concatenation, regular membership predicates, and all regularity-preserving functions, extended by cost-enriched automata for operations mapping strings to integers (length, , etc.) (Chen et al., 2020, Chen et al., 2018). The fragment covered by completeness guarantees includes essentially all straight-line constraints found in symbolic execution and program verification, and chain-free/weakly chaining fragments which subsume most practical applications (Abdulla et al., 2023, Hague et al., 17 Jun 2025).
Limitations remain for highly nondeterministic operations, unbounded word equations with overlapping variables, and fragments where the pre-image under required functions is not regular (e.g., arbitrary context-sensitive string transformations, deeply nested regex complements). In such cases, either reduction to restricted fragments is used, or an incomplete/under-approximate analysis is reported (Chen et al., 2018, Chen et al., 2023).
7. Experimental Performance and Comparisons
Empirical evaluations across OSTRICH versions on SMT-LIB, random PCP, bioinformatics, and web application security benchmarks reveal that OSTRICH2—with RCP and CEFA engines, and support for prioritized regex transducers—is competitive with, or superior to, leading solvers such as Z3-Noodler and cvc5 on unsatisfiable instances and randomly generated hard problems (Hague et al., 17 Jun 2025, Hague et al., 27 Aug 2025). Notably:
- OSTRICH consistently solves all classical straight-line and chain-free fragment benchmarks, including full coverage on SLOG, Stranger, and mutation-XSS instances (Chen et al., 2018).
- Portfolio strategies within OSTRICH2, leveraging both RCP and CEFA, solve more than 1,900 out of 2,000 sampled SMT-COMP benchmarks (Hague et al., 17 Jun 2025).
- For extended sequence constraints from real-world JavaScript programs, outperforms array- and string-based competitors on both correctness and runtime (Hu et al., 31 Aug 2025).
A plausible implication is that the modular combination of automata propagation, cost-enriched reasoning, and engine portfolio is a key factor for this performance and flexibility.
In summary, the OSTRICH String Constraint Solving Framework, through iterative propagation of regular and cost-enriched automata constraints, modular architecture, and provable completeness for orderable/straight-line/chain-free fragments, provides a powerful and extensible platform for solving complex string constraints. Its continuous evolution—adding richer function support, broader Unicode and regex parsing, and enhanced propagation algorithms—demonstrates its central role in both the theoretical development and practical deployment of SMT-based string analysis (Chen et al., 2018, Chen et al., 2020, Hague et al., 17 Jun 2025, Hague et al., 27 Aug 2025, Hu et al., 31 Aug 2025).