Papers
Topics
Authors
Recent
2000 character limit reached

FastLEC: Adaptive Hybrid CEC Prover

Updated 14 December 2025
  • FastLEC is an adaptive hybrid prover for combinational equivalence checking that combines SAT, BDD, and exact-simulation to deliver sound and complete proofs.
  • It employs regression-guided scheduling with XGBoost and structure-aware partitioning to optimize resource allocation and handle complex datapath circuits.
  • The framework scales effectively via parallel CPU and GPU acceleration, achieving significant speedups over conventional industrial and academic CEC tools.

FastLEC is an adaptive, hybrid automated prover for combinational equivalence checking (CEC) in datapath circuits, designed to address the limitations inherent to stand-alone SAT, BDD, or exact-simulation (ES) methods. By integrating all three formal reasoning engines within a data-driven, parallelized framework, FastLEC achieves sound and complete equivalence proofs at superior scale and with significant computational efficiency—demonstrated by substantial improvements over leading industrial and academic CEC tools across hundreds of computational benchmarks (Zhang et al., 7 Dec 2025).

1. Architectural Overview

FastLEC’s architecture comprises three principal phases—Data Preparation, Offline Model Training, and Online Scheduling & Solving:

  • Data Preparation: Given a candidate miter circuit (typically in AIG/XAG form), FastLEC conducts a rapid sweep to extract each potentially equivalent (PE) output pair, yielding per-PE sub-miters that restrict logic to relevant transitive fan-in cones, collapsed by merging previously proven nodes.
  • Offline Model Training: Each sub-miter is featurized and subjected to controlled-timeout probes with each engine (SAT, BDD, ES). The empirical runtime data serves as ground-truth to train XGBoost regression models (for SAT and BDD), with a simple analytic fit for ES.
  • Online Scheduling & Solving: For unseen sub-miters, FastLEC extracts features, predicts t^SAT\hat t_{\rm SAT}, t^BDD\hat t_{\rm BDD}, and t^ES\hat t_{\rm ES}, and allocates nn CPU cores (plus optional GPU) to the most promising proof engines. SAT, BDD, and ES run in parallel under coordinated master–worker control. Upon the first engine proving (non-)equivalence, results propagate upstream, enabling a sweeping, divide-and-conquer workflow.

2. Regression-Guided Engine Scheduling

Rather than statically dispatching proof engines, FastLEC employs a learned regression-driven heuristic to dynamically allocate resources:

  • Sub-miter Featureization: A 32-dimensional feature vector encodes, for each sub-miter, logic structure (e.g., #PI\#\mathsf{PI}, #gates\#\mathsf{gates}, #XOR\#\mathsf{XOR}, #AND\#\mathsf{AND}), CNF metrics, statistics over XOR chains, distance metrics (disI\mathsf{dis}_I, disO\mathsf{dis}_O), cost estimates for each proof engine (e.g., cost_ES =2#PI=2^{\#\mathsf{PI}}), and logic simulation parameters.
  • Modeling and Scheduling: XGBoost models, tuned with 10-fold cross-validation and SMAC3-based hyperparameter search, estimate one-core runtimes for SAT and BDD. ES time is estimated as t^ES=α#gates2(#PIβ)\hat t_{\rm ES} = \alpha \cdot \#\mathsf{gates} \cdot 2^{(\#\mathsf{PI} - \beta)} (with α=0.0003\alpha=0.0003, β=23\beta=23). Scheduling feasibility considers both predicted runtime and hardware availability, enabling only engines likely to complete within cutoff.
  • Resource Split: Letting ρ=t^SAT/(nt^ES)\rho=\hat t_{\rm SAT}/(n\cdot\hat t_{\rm ES}), thread assignment follows a rule-based allocation: SAT or ES each receive n ⁣ ⁣2n\!-\!2 threads if strongly favored; if comparable, resources are split; BDD only participates if it predicts to be faster than SAT by a defined margin. A GPU-enabled ES is treated as 128-thread equivalent, reserved when its predicted time exceeds 0.1s. This scheme never alters engine soundness—only orchestration for speed.

3. Partitioning and Parallel SAT with Structure-Aware Heuristics

For sub-miters where SAT is enabled, FastLEC deploys a dynamic, structure-aware partitioning and parallel SAT engine guided by datapath topology:

  • Divide-and-Conquer Engine: Each SAT subproblem is formulated as CNF plus a constraint cube, representing a partial input assignment determined by the problem’s path in the partitioning tree.
  • Task Management: A master thread selects the longest-pending subproblem, invokes a Partitioner to choose a structural split variable (driven by an XOR-chain-centric heuristic), dispatches child cubes to available workers, and monitors for SAT/UNSAT outcomes. Global State Checker logic prunes subtrees upon UNSAT or halts computation upon SAT.
  • Variable Scoring Heuristic: The variable scoring strategy prioritizes splitting “late” along long XOR chains, defined for each variable vv as score(v)=c2αdisO[v]+(1α)disI[v]+1\text{score}(v)=\frac{|c|^2}{\alpha \mathsf{dis}_O[v] + (1-\alpha)\mathsf{dis}_I[v] + 1} for chain cc and α=0.6\alpha=0.6. Nodes with structural cut point status receive a bonus. Local averaging on the scoring graph ensures neighborhood-dependent diffusion, optimizing for large residual SAT subproblem collapse without destroying balance.

4. Memory and Acceleration in Exact Simulation

To address the prohibitive memory and performance costs of exact simulation:

  • Instructionalization: FastLEC compiles logic to a compact, branch-free instruction stream using reference-count–guided register reuse. Each gate output register is freed immediately when its reference count reduces to zero, bounding the maximal number of required registers by the circuit’s minimal cut rather than total gate count.
  • Compression Results: For a 32×32 multiplier miter, the approach compacts 29,553 variables into only 309 simulation instructions (≈1.04%), validating the anticipated memory savings.
  • GPU Acceleration: For sufficiently large ES jobs, the instruction stream is dispatched to the GPU’s constant cache; bit-vector registers utilize shared memory. Each GPU thread processes 32 input vectors, and aggregate throughput across thousands of threads matches that of ≈128 CPU cores. On NVIDIA 4090 hardware, single-GPU ES achieves near-128-thread CPU equivalence.
  • Complexity and Scaling: The ES runtime remains tESCPU#gates  2#PI/mt_{\rm ES}^{\rm CPU}\propto \#\mathsf{gates}\;2^{\#\mathsf{PI}/m}, for mm threads.

5. Benchmarking and Comparative Performance

Experimental evaluation leverages 368 single-output datapath miters sampled from an industrial challenge suite, spanning “Easy,” “Medium,” and “Hard” classes (127/120/121 circuits). Solvers evaluated include:

  • ABC cec: Sequential SAT sweeping baseline.
  • HybridCEC: CPU-based hybrid SAT+ES.
  • pHCEC: Parallelized HybridCEC.
  • PDP-CEC: Dynamic graph-partitioning plus SAT.

Each miter is allotted a 1-hour timeout per attempt, with unsolved problems incurring a 2-hour penalty as per the PAR-2 metric (penalized average runtime). Summary of results under 32 CPU cores:

Tool Miters Solved PAR2 (s) Relative Speed (PAR2)
FastLEC 340/368 762.36 1.0×
PDP-CEC 266/368 2036.18 2.67× slower
HybridCEC 251/368 2541.58 3.33× slower
ABC cec 67/368 5892.82 7.73× slower
FastLEC + GPU (4090) 368/368 138.94 0.18× (4.07× faster)

With a 4090 GPU, FastLEC completes all 368 miters in PAR2 = 138.94 s, a further 4.07× improvement above the already fastest CPU configuration and the only contender achieving full coverage in the allotted timeout.

6. Scalability and Resource Utilization

FastLEC demonstrates strong scaling as a function of both CPU thread count and GPU integration:

  • CPU Scaling: PAR2 for the entire benchmark suite decreases from 2383.59 s (1 thread) to 762.36 s (32 threads) and further to 187.15 s (128 threads). For “Hard” circuits, PAR2 drops from 5208.29 s (1 thread) to 2241.89 s (32 threads) and 537.12 s (128 threads), achieving geometric mean speedups of 1.83× (32 threads) and 2.79× (128 threads) overall, and 11.57× at 128 threads for hard instances.
  • GPU Acceleration: Adding a single commodity GPU (4090) to the 32-thread configuration produces a further 4.07× speedup. The GPU-accelerated ES backend is particularly beneficial for large instances otherwise bottlenecked by simulation.

7. Technical Significance and Context

FastLEC’s unified framework illustrates that high-throughput, sound, and complete CEC for complex datapath circuits can be achieved only via adaptive, hybrid orchestration of multiple engines. By coupling learned, instance-specific scheduling with topology-guided SAT decomposition and memory-efficient, hardware-accelerated simulation, FastLEC systematically overcomes the exhaustion points of its constituent solvers on large, arithmetic-heavy benchmarks. A plausible implication is that future EDA CEC frameworks may be expected to adopt similar regression-driven, resource-aware, and GPU-accelerated hybridization schemes to remain competitive for datapath verification in industry and research (Zhang et al., 7 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to FastLEC.