A Reinforcement Learning based Reset Policy for CDCL SAT Solvers (2404.03753v2)
Abstract: Restart policy is an important technique used in modern Conflict-Driven Clause Learning (CDCL) solvers, wherein some parts of the solver state are erased at certain intervals during the run of the solver. In most solvers, variable activities are preserved across restart boundaries, resulting in solvers continuing to search parts of the assignment tree that are not far from the one immediately prior to a restart. To enable the solver to search possibly "distant" parts of the assignment tree, we study the effect of resets, a variant of restarts which not only erases the assignment trail, but also randomizes the activity scores of the variables of the input formula after reset, thus potentially enabling a better global exploration of the search space. In this paper, we model the problem of whether to trigger reset as a multi-armed bandit (MAB) problem, and propose two reinforcement learning (RL) based adaptive reset policies using the Upper Confidence Bound (UCB) and Thompson sampling algorithms. These two algorithms balance the exploration-exploitation tradeoff by adaptively choosing arms (reset vs. no reset) based on their estimated rewards during the solver's run. We implement our reset policies in four baseline SOTA CDCL solvers and compare the baselines against the reset versions on Satcoin benchmarks and SAT Competition instances. Our results show that RL-based reset versions outperform the corresponding baseline solvers on both Satcoin and the SAT competition instances, suggesting that our RL policy helps to dynamically and profitably adapt the reset frequency for any given input instance. We also introduce the concept of a partial reset, where at least a constant number of variable activities are retained across reset boundaries. Building on previous results, we show that there is an exponential separation between O(1) vs. $\Omega(n)$-length partial resets.
- Predicting learnt clauses quality in modern sat solvers. In Twenty-first international joint conference on artificial intelligence. Citeseer, 2009.
- Finite-time analysis of the multiarmed bandit problem. Machine learning, 47:235–256, 2002.
- CaDiCaL, Kissat, Paracooba, Plingeling and Treengeling entering the SAT Competition 2020. In Tomas Balyo, Nils Froleyks, Marijn Heule, Markus Iser, Matti Järvisalo, and Martin Suda, editors, Proc. of SAT Competition 2020 – Solver and Benchmark Descriptions, volume B-2020-1 of Department of Computer Science Report Series B, pages 51–53. University of Helsinki, 2020.
- EXE: Automatically Generating Inputs of Death. ACM Transactions on Information and System Security (TISSEC), 12(2):1–38, 2008.
- Unsat solver synthesis via monte carlo forest search. arXiv preprint arXiv:2211.12581, 2022.
- Combining vsids and chb using restarts in sat. In 27th International Conference on Principles and Practice of Constraint Programming, 2021.
- Finding Bugs Efficiently With a SAT Solver. In Proceedings of the 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 195–204, 2007. doi:10.1145/1287624.1287653.
- Cadical, kissat, paracooba, plingeling and treengeling entering the sat competition 2020. SAT COMPETITION, 2020:50, 2020.
- On upper-confidence bound policies for switching bandit problems. In International conference on algorithmic learning theory, pages 174–188. Springer, 2011.
- Berkmin: A fast and robust sat-solver. Discrete Applied Mathematics, 155(12):1549–1561, 2007.
- Heavy-tailed phenomena in satisfiability and constraint satisfaction problems. Journal of automated reasoning, 24(1-2):67–100, 2000.
- Boosting combinatorial search through randomization. In Proceedings of the Fifteenth National Conference on Artificial Intelligence and Tenth Innovative Applications of Artificial Intelligence Conference (AAAI/IAAI 1998), pages 431–437, 1998.
- Effective auxiliary variables via structured reencoding. arXiv preprint arXiv:2307.01904, 2023.
- Henry A. Kautz. Deconstructing planning as satisfiability. In Proceedings, The Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Applications of Artificial Intelligence Conference, July 16-20, 2006, Boston, Massachusetts, USA, pages 1524–1526. AAAI Press, 2006. URL: http://www.aaai.org/Library/AAAI/2006/aaai06-241.php.
- Planning as satisfiability. In Bernd Neumann, editor, 10th European Conference on Artificial Intelligence, ECAI 92, Vienna, Austria, August 3-7, 1992. Proceedings, pages 359–363. John Wiley and Sons, 1992.
- Improving sat solver heuristics with graph networks and reinforcement learning. arXiv preprint arXiv: 1909.11830, 2019.
- Learning to select branching rules in the dpll procedure for satisfiability. Electronic Notes in Discrete Mathematics, 9:344–359, 2001.
- Towards a complexity-theoretic understanding of restarts in sat solvers, 2020. arXiv:2003.02323.
- Learning rate based branching heuristic for sat solvers. In Theory and Applications of Satisfiability Testing–SAT 2016: 19th International Conference, Bordeaux, France, July 5-8, 2016, Proceedings 19, pages 123–140. Springer, 2016.
- Machine learning-based restart policy for cdcl sat solvers. In International Conference on Theory and Applications of Satisfiability Testing, pages 94–110. Springer, 2018.
- An empirical study of branching heuristics through the lens of global learning rate. In International Conference on Theory and Applications of Satisfiability Testing, pages 119–135. Springer, 2017.
- Satcoin–bitcoin mining via sat. Proceedings of SAT Competition, 2018:67–68, 2018.
- Chaff: Engineering an efficient sat solver. In Proceedings of the 38th annual Design Automation Conference, pages 530–535, 2001.
- SAT. The International SAT Competition. http://www.satcompetition.org. Accessed: 2023-05-05.
- Reinforcement learning: An introduction. MIT press, 2018.
- William R Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3-4):285–294, 1933.
- Marc Vinyals. Hard examples for common variable decision heuristics. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI ’20), February 2020.
- Saturn: A SAT-Based Tool for Bug Detection. In Proceedings of the 17th International Conference on Computer Aided Verification, CAV 2005, pages 139–143, 2005. doi:10.1007/11513988_13.
- Combining hybrid walking strategy with kissat mab, cadical, and lstech-maple. SAT COMPETITION 2022, page 20, 2022.