Hot-Swapping Clause Assignments
- Hot-swapping clause assignments is a runtime mechanism that dynamically updates CNF clause partitions during Boolean Constraint Propagation in SAT solvers.
- It leverages FPGA and GPU architectures by streaming clauses into local memories to bypass static assignments and maximize parallel processing.
- Practical implementations show improved throughput and solver efficiency, with FPGA designs achieving up to 6× acceleration over software baselines.
Hot-swapping clause assignments is a runtime mechanism for synchronously or asynchronously updating the set of clauses assigned to processing units during Boolean Constraint Propagation (BCP) in satisfiability (SAT) solvers. This is most prominently deployed in SAT accelerators leveraging either FPGAs or GPUs, where clause assignments can be streamed in and replaced on-the-fly to circumvent memory bottlenecks, maximize parallel utilization, and ensure rapid propagation based on solver state. In such frameworks, “hot-swapping” encompasses both the efficient loading and eviction of CNF clause partitions within hardware-accelerated solvers, and the selective, event-triggered import of clauses into multithreaded solver threads in CPU-GPU CDCL hybrids (Godindasamy et al., 2023, Prevot, 2020).
1. Fundamental Architecture and Mechanisms
FPGA-enabled SAT solutions implementing hot-swapping allocate an array of Clause Processors (CPs) on the FPGA, each backed by local BRAM capable of storing a clause and an associated variable-assignment vector. The central Control Unit (CU), realized as a finite-state machine, orchestrates transitions among IDLE, LOAD_PARTITION, BCP_EXECUTE, PROPAGATE, and CLEAR states. The complete CNF formula resides in external DRAM, while at each DPLL decision node, the host ARM processor partitions the formula and streams a subset P into the CPs using a DMA engine. The CU “hot-swaps” the contents of the CPs by loading the new partition in bulk, supplanting the previous set of clauses without requiring BCP-specific heuristics for grouping (Godindasamy et al., 2023).
On GPUs, as in GpuShareSat, hot-swapping targets the selective admission of clauses into per-thread watched-literal databases. Multiple CPU CDCL threads export learned clauses and recent frontier assignments to a GPU, which performs bit-parallel trigger tests to identify for each thread which foreign clauses would have implied or conflicted had they been imported then. Triggered clauses are promptly “hot-swapped” into data structures of the relevant threads and are immediately eligible for propagation (Prevot, 2020).
2. Hot-Swapping Workflow and Protocols
FPGA Clause Processor Hot-Swap Sequence
- FPGA-side software issues a “LOAD_PARTITION j” command over AXI.
- The CU enters the LOAD_PARTITION state, enabling clause loading.
- A DMA engine streams clause descriptors and literal lists from DRAM to the BRAMs of the CPs, one clause per CP.
- Once all clauses of the partition reside on-chip (|P| ≤ N, N = number of CPs), the CU signals load completion and transitions to BCP_EXECUTE.
- A decision literal is broadcast to all CPs; each evaluates the clause status in lockstep.
- Upon any unit or conflict detection, the Implication Selector multiplexes control back to the CPU, the CPs are reset, and the next LOAD_PARTITION may be initiated.
This protocol, enforced by the CU state machine, guarantees that no BCP operation overlaps with partition loading, ensuring memory consistency and preventing stale assignments. Buffer management (single- or double-buffered) determines whether clause partitions are preloaded for subsequent swaps or loaded serially, trading off BRAM utilization and load latency (Godindasamy et al., 2023).
GPU Clause Import Hot-Swapping
In a multithreaded GPU-accelerated SAT solver, each CPU thread asynchronously exports learned clauses and trailing assignments (A_i) to the GPU. CUDA kernels concurrently evaluate, for each assignment-thread pair, whether a clause is “triggered,” i.e., it would have fired unit propagation or a conflict. This is implemented by folding bit-parallel “isFalse” and “isUndef” masks across all variables per clause.
Triggered clause-thread pairs are reported back to a CPU manager, which immediately invokes “importClause” on the associated thread—without deferral to decision-level 0. The imported clause can thus drive propagations and conflicts at the current thread state (Prevot, 2020).
3. Mathematical Formulation of Partitioning and Triggering
FPGA Clause Partitioning
Let denote the full clause set and the variables. The clauses are partitioned as with
where is the number of CPs. The BRAM budget imposes
with the length of clause , and the bits per literal.
GPU Clause Trigger Bitmask
Given assignments 0, for variable 1:
- 2 iff 3,
- 4 iff 5.
Define 6 iff 7, 8 iff 9. For clause 0,
1
2 indicates 3 is triggered for assignment 4 and should be hot-swapped in.
4. Hardware and Memory Organization
FPGA Block Design
The main hardware components include the on-chip Control Unit (FSM and Clause Loader), a clause processor array (each CP paired with local BRAM), an implication selector (multiplexer), and DMA interfaces to external DRAM. Key design choices include:
- Single-buffer swapping: Partition loaded into N CP BRAMs, all prior memory cleared on swap.
- Proposed double-buffer: Two BRAM banks enable overlapping DMA load with BCP execution, pointer-flip at swap boundaries (at the cost of doubled BRAM).
Partition size is determined by the number of CPs and available BRAM. Formula size is bounded only by external DRAM (Godindasamy et al., 2023).
GPU Memory Layout
GpuShareSat encodes clauses in a structure-of-arrays layout: a flat array of literal indices for all clauses, plus per-clause head pointers for coalesced access. Assignment masks for each variable are stored in separate arrays allowing high-throughput bitwise operations. Data movement between CPU and GPU is managed by lock-free ring buffers and double-buffered global memory (Prevot, 2020).
5. Performance, Metrics, and Comparative Analysis
Key metrics for hot-swapping clause assignments are summarized below.
| Platform | Theoretical Peak (MBCP/s) | End-to-End Throughput | Speedup vs. Prior | Formula Size Limiting Factor |
|---|---|---|---|---|
| FPGA (Zynq) | 175 | 313 KBCP/s | up to 6× vs. SW | External memory (not on-chip BRAM) |
| Davis23 | 40 | — | — | On-chip memory |
| Thong19 | 102 | — | — | On-chip memory |
| GpuShareSat | (594 M clause-tests/s) | (22 more instances) | 272 vs 250 | GPU memory (5.05M average resident) |
For FPGA accelerators, resource utilization on Xilinx Zynq reached 91.3% LUTs and 38.4% FFs for the full design. Empirical evaluation indicated a 1.7× and 1.1× speedup over prior SOTA, and up to 6× acceleration vs. software baseline, for example, 5 (63 vars, 22 400 clauses) (Godindasamy et al., 2023).
In GpuShareSat, effective clause test throughput was 594 million tests/sec, with <1% synchronization overhead from buffer handoff. Only about 2.13 clauses per assignment were imported, illustrating selectivity in hot-swapping. The solver improved problem instances solved from 250 to 272 against a 32-thread Glucose-Syrup baseline (Prevot, 2020).
6. Design Trade-offs and Scalability
The primary benefit of hot-swapping clause assignments is exposing fine-grained parallelism across up to N simultaneously loaded clauses, unconstrained by static pre-grouping or fixed clause allocation. Arbitrary clause-variable overlap within a partition is permitted, maximizing processor utilization.
However, run-time costs of streaming new partitions (in FPGAs) or repeatedly computing clause × assignment triggers (in GPUs) must be carefully controlled. With many small partitions—and thus frequent swaps—DMA overhead for FPGAs or buffer overflow for GPUs can offset parallel gains. For FPGA scalability, doubling N increases BRAM requirements linearly; future optimizations include intelligent partitioning to minimize variable-dispersion and overlapping load/execute phases via double-buffering (Godindasamy et al., 2023). In GPU-based schemes, minimizing superfluous clause exchange and judicious buffer management remain central.
A plausible implication is that demand-paged, runtime-driven clause management mechanisms enabled by hot-swapping are optimal in environments with abundant external (CPU or GPU) memory and where rapid, targeted propagation is essential.
7. Applications and Extensions
Hot-swapping clause assignments has been primarily exploited in hardware-accelerated SAT solvers for embedded and high-performance settings. In the FPGA paradigm, it allows unrestricted formula scaling, as formula capacity is delimited only by external DRAM. GPU-accelerated hot-swapping enables selective, propagation-driven cross-thread clause sharing in large-scale CDCL solvers, improving instance coverage and search focus without over-sharing. Extensions, such as staged memory for “hot” variables or aggregated assignment blocks, are straightforward incorporations that further improve throughput. The techniques remain compatible with other leading CDCL cores (Godindasamy et al., 2023, Prevot, 2020).