Papers
Topics
Authors
Recent
2000 character limit reached

SATLUTION: Autonomous Evolution of SAT Solvers

Updated 11 September 2025
  • SATLUTION is an autonomous code evolution framework that leverages synergistic LLM agents for planning and coding to evolve repository-scale SAT solvers.
  • It employs a closed-loop generate-compile-test cycle with distributed CPU cluster feedback, ensuring rigorous validation and superior performance against SAT benchmarks.
  • Dynamic rulebase management and agentic planning drive continuous improvement and robust error prevention, setting new standards over traditional human-designed solvers.

SATLUTION is an autonomous code evolution framework that orchestrates LLM agents to evolve repository-scale SAT solvers for the canonical Boolean Satisfiability (SAT) problem. Distinct from kernel-focused predecessors, SATLUTION continuously self-improves solver algorithms and its own evolution rulebase through agentic planning, coding, verification, and distributed feedback, yielding solver variants that have surpassed human-designed SAT Competition winners on both contemporary and prior benchmarks (Yu et al., 9 Sep 2025).

1. Framework Architecture and Agentic Paradigm

SATLUTION introduces repository-scale algorithmic evolution driven by two synergistic LLM agents:

  • Planning Agent: Strategically analyzes complete solver repositories, performance metrics, and historical evolution to propose actionable directions across hundreds of files (e.g., heuristic subsystem refactoring, integration of competing algorithmic innovations from seed solvers).
  • Coding Agent: Executes repository-wide code modifications, handling implementation details in C/C++, build system management, documentation (HYPOTHESIS.md, CHANGELOG.md, RESULTS.md), and compliance with static directory structures.

The agentic framework operates in a closed-loop sequence:

Iteration: solvern+1=Evolution(solvern,Feedbackn,RuleBasen)\text{Iteration: } \texttt{solver}_{n+1} = \texttt{Evolution}(\texttt{solver}_n, \texttt{Feedback}_n, \texttt{RuleBase}_n)

This evolutionary step is governed by static initialization rule templates and dynamically self-evolved rule patches, all validated through modular compliance scripts prior to acceptance into the candidate pool.

2. Technical Operation: Evolution, Verification, and Feedback

The SATLUTION methodology integrates these processes:

  • Generate → Compile → Test → Analyze → Evolve Loop: A continuous cycle where agents propose code changes, rebuild the solver, test for low-level correctness (Stage 1: trivial CNFs, catching segmentation faults, runtime errors), then carry out detailed evaluation (Stage 2: benchmark suite, correctness validation, DRAT proof verification for UNSAT instances).
  • Distributed Runtime Feedback: An 800-node CPU cluster executes solver evaluation over the full SAT Competition 2024 benchmark set. Performance metrics—including solved instance counts, PAR-2 scores (penalized average runtime for unsolved instances with a fixed 5000 s penalty), and time-distributed statistics—are provided within each round to inform the next iteration.

The robust verification pipeline ensures that only implementations passing all mandated correctness and compliance checks persist through evolutionary cycles, thereby preventing error propagation and ensuring that runtime feedback reflects meaningful fitness improvements.

3. Static and Dynamic Rulebase Management

SATLUTION's rule management system features:

  • Static Initialization Rules: These encompass domain constraints such as mandatory SAT/UNSAT correctness, DRAT proof output, prescribed repository layout, and obligations for compilation and result documentation.
  • Active Compliance Scripts: Automated scripts parse code changes and documentation to ensure adherence to the rulebase before further verification.
  • Dynamic Self-Evolving Rules: Following each evolution cycle, a post-mortem analyzer inspects failures (e.g., unsafe memory operations, documentation lapses) and appends corrective patches to the rulebase. For example, introduction of forbidden C idioms leads to an immediate block on future occurrences via rule augmentation.

This dual evolution of both solver code and rulebase preserves correctness and drives monotonic improvement across successive generations.

4. Performance Outcomes and Benchmark Evaluation

In competitive testing on SAT Competition 2024/2025 benchmarks:

  • SATLUTION-evolved solvers exhibited superior PAR-2 scores and solved more instances than the best human-designed contest winners.
  • Example figures: SATLUTION variants solved 347, 345, and 344 instances vs. human-designed winners at 334 and 331.
  • Cactus plots and runtime breakdowns indicate that SATLUTION solves both SAT and UNSAT instances with better scaling for medium-to-hard cases (1000–4000 s).
  • On prior year benchmarks (2024), SATLUTION variants achieved lower PAR-2 (generalization performance), not merely overfitting to the current contest set.

The evolution trajectory demonstrates rapid improvements within initial iterations, then gradual refinement, with performance peaking near iteration 70.

5. Innovations in Autonomous Code Evolution for SAT

SATLUTION marks a progression from previous agentic frameworks focused on small kernels (e.g., AlphaEvolve). It handles large-scale repositories, hundreds of source files, cross-module dependencies, and stringent correctness requirements (including proof generation and verification) (Yu et al., 9 Sep 2025). The orchestration of planning and coding agents with automatic feedback enables:

  • Discovery and exploitation of nontrivial algorithmic improvements.
  • Flexible adaptation to new contest benchmarks and solver architectures.
  • Prevention of error regression and codebase degeneration via dynamic rules and compliance checks.

A plausible implication is that agent-driven, repository-scale evolution can transcend human expertise in practical SAT solver design and possibly extend to other NP-complete and optimization domains.

6. Comparative Perspective and Framework Interactions

Relative to other frameworks:

  • Modular enhancement techniques from resolution-derived modern SAT solver theory (Dershowitz et al., 2011)—including parent clause maintenance, Boolean Constraint Propagation, Non-Chronological Backtracking, and 1UIP-based Conflict-Directed Backjumping—may be integrated as distinct modules, allowing targeted evolution rather than monolithic overhaul.
  • SATLUTION could further incorporate advancements from automated, LLM-driven code generation systems such as SolSearch (Sheng et al., 20 Feb 2025) by orchestrating curriculum-based iterative code enhancement at the repository level.
  • Advanced reasoning mechanisms (e.g., explicit SAT-based LTL reasoning and SMT-based extensions (Li et al., 2015)) may be modularized into SATLUTION evolution candidates, addressing theory-based and temporal SAT extensions.

7. Implications, Limitations, and Future Research

SATLUTION’s outcomes demonstrate tangible progress towards automated algorithmic innovation—outperforming human experts and demonstrating robustness on unseen benchmarks. Prospective extensions are:

  • Automating evolution of correctness verifiers in parallel with solvers.
  • Application to domains such as electronic design automation (EDA).
  • Token cost and runtime feedback optimization to accelerate cycles.
  • Support for multi-threaded and parallel architectures to address scalability.

Challenges include managing repository-scale complexity, avoiding escalation in token and verification costs, and ensuring that agentic policies remain up-to-date and aligned with evolving domain best practices. Future research may focus on refining distributed feedback mechanisms, expanding to broader NP-hard problem domains, and enhancing the agentic orchestration for general-purpose algorithmic evolution.

Whiteboard

Follow Topic

Get notified by email when new papers are published related to SATLUTION Framework.