Papers
Topics
Authors
Recent
Search
2000 character limit reached

HybridCORELS: Certifiably Optimal Hybrid Models

Updated 23 February 2026
  • HybridCORELS is an algorithmic framework that integrates interpretable rule-list classifiers with complex black-box models to deliver certifiably optimal solutions under explicit transparency constraints.
  • It extends the CORELS branch-and-bound approach to optimize a mixed loss function that balances misclassification, rule-list complexity, and controlled transparency in binary classification tasks.
  • Empirical evaluations on benchmarks like COMPAS and ACS demonstrate that HybridCORELS achieves comparable or superior accuracy to pure black-box methods while ensuring user-specified transparency.

HybridCORELS is an algorithmic framework for constructing hybrid models that combine interpretable rule-list classifiers with complex black-box models, delivering certifiably optimal solutions under explicit transparency constraints. Through its extension of the CORELS branch-and-bound approach, HybridCORELS enables precise control over the division of input space between interpretable and black-box components while providing strong theoretical and empirical guarantees (Ferry et al., 2023).

1. Model Composition and Gating

A HybridCORELS classifier addresses binary classification tasks on input space X\mathcal{X} with outputs in {0,1}\{0,1\}. It consists of a triplet (hs,hc,Ω)(h_s, h_c, \Omega), where hs:X{0,1}h_s:\mathcal{X}\to\{0,1\} is an interpretable rule-list (the simple model), hc:X{0,1}h_c:\mathcal{X}\to\{0,1\} is a pre-trained black-box classifier, and ΩX\Omega\subseteq\mathcal{X} is the region of input space covered by the rule list. The inference-time prediction is governed by a gating function g(x)=1[xΩ]g(x) = \mathbf{1}[x\in\Omega], which routes xx to hsh_s if xΩx\in\Omega and otherwise to hch_c. The transparency of the hybrid model, CΩC_\Omega, is the probability xΩx\in\Omega under the data distribution. In practice, the empirical transparency C^Ω\widehat{C}_\Omega is enforced as C^Ωτ\widehat{C}_\Omega \geq \tau for a user-chosen transparency level τ[0,1]\tau\in[0,1].

2. Optimization Objective

The HybridCORELS optimization problem seeks a rule list rr satisfying the transparency constraint while minimizing overall misclassification:

minr:C^ΩrτR(r;SΩr)+R(hc;SΩr)+λ(r)+β(1C^Ωr)\min_{r\,:\,\widehat{C}_{\Omega_r}\geq\tau} \quad R(r; S\cap\Omega_r) + R(h_c; S\setminus\Omega_r) + \lambda\,\ell(r) + \beta\, (1 - \widehat{C}_{\Omega_r})

where R(f;S)R(f;S) is the empirical 0-1 loss on SS, (r)\ell(r) is the length of the rule list, and β\beta is a small penalty (used only to break ties) since C^Ωrτ\widehat{C}_{\Omega_r}\geq\tau is a hard constraint. The objective thus balances empirical error across both model regions, rule-list complexity, and coverage, with formal prioritization of strict transparency thresholds.

HybridCORELS has two major training paradigms:

  • Post–black-box: The black-box hch_c is pre-trained and fixed, and rr is optimized with hch_c held constant.
  • Pre–black-box: The rule-list is fit first, with regions not covered handled by a black-box trained subsequently.

3. Algorithmic Structure: CORELS Extension

HybridCORELS extends the CORELS (Certifiably Optimal RulE ListS) algorithm, originally formulated for pure rule-lists, by modifying the objective and update steps. CORELS uses a prefix-tree representing partial rule-lists, a priority queue sorted by valid lower bounds, and a branch-and-bound scheme to search for provably optimal solutions. HybridCORELS integrates the new hybrid objective and transparency constraint. The lower bound remains valid because black-box error can only decrease with rule-list extension and transparency penalties can be eliminated by additional coverage.

If the queue empties, HybridCORELS guarantees that the found rule list is globally optimal under the given transparency constraint.

Component CORELS HybridCORELS
Objective R(r;S)+λ(r)R(r;S) + \lambda\ell(r) Mixed hybrid loss + complexity + coverage penalty
Constraint None Hard transparency C^Ωrτ\widehat{C}_{\Omega_r} \geq \tau
Search Structure Prefix-tree, PQ Prefix-tree, PQ with transparency updates

4. Theoretical Foundation and Generalization

Under mild finiteness assumptions on both the interpretable and complex hypothesis spaces, the class of HybridCORELS models is PAC-learnable. For finite rule-list (Hs\mathcal{H}_s) and black-box (Hc\mathcal{H}_c) spaces, with an “oracle” hybrid model of true risk zero, the PAC bound for excess risk ϵ\epsilon satisfies:

$\Pr\bigl[\exists\,\triplet\;\text{ERM on data}:\;\poploss(\triplet)>\epsilon\bigr] \leq \sum_{\Omega\in\mathcal{P}} B(\epsilon, C_\Omega, |\mathcal{H}_c|, |\mathcal{H}_s|, M)$

where the bound B(ϵ,C,Hc,Hs,M)B(\epsilon, C, |\mathcal{H}_c|, |\mathcal{H}_s|, M) captures contributions from both model parts and MM is the sample size. For a fixed oracle region Ω\Omega^*, the union disappears. The minimizer CC^{**} of the PAC bound as a function of CC quantifies an optimal “sweet-spot” transparency: the empirical risk can be better for a hybrid than either component alone, reflecting a fundamental hybrid regularization benefit.

5. Empirical Performance

HybridCORELS was evaluated on three benchmarks: COMPAS recidivism prediction (\sim6k examples), UCI Adult Income (\sim49k), and ACS Employment (\sim200k). Black-box baselines included Random Forests, AdaBoost, and Gradient-Boosted Trees (scikit-learn, cross-validated). Comparisons across Hybrid Rule Set (HyRS), Companion Rule List (CRL), and both HybridCORELS variants revealed:

  • Transparency under HybridCORELS increases monotonically with τ\tau, displaying negligible run-to-run variance. In contrast, HyRS/CRL exhibit substantial stochasticity at given transparency levels.
  • On Adult and ACS, transparency of approximately 0.7–0.8 can be achieved with no loss in overall accuracy relative to pure black-box models.
  • On COMPAS, HybridCORELSPre surpasses black-box accuracy at intermediate transparency (0.5–0.6), by up to 2 percentage-points. This aligns with the “sweet-spot self-regularization” predicted by theory.
  • HybridCORELS consistently matches or exceeds HyRS/CRL performance at all tested transparencies.

As an example, on ACS with AdaBoost, the pure black-box achieved approximately 74% accuracy; HybridCORELSPre matched this at 75% transparency, while HyRS/CRL trailed by 1–2 percentage points at comparable transparency (Ferry et al., 2023).

6. Strengths, Limitations, and Future Directions

HybridCORELS provides:

  • Certifiably optimal hybridization: Retains the CORELS global optimality guarantee under hard transparency.
  • Black-box agnosticism: Compatible with any pre-trained hch_c supporting (optionally) instance weighting.
  • User-controllable transparency: Enforcement of hard constraints on τ\tau yields empirically realizable, user-specified transparency; stochastic instability common to HyRS/CRL is eliminated.
  • Empirical regularization sweet-spot: Demonstrated capacity to attain accuracy superior to either component alone at intermediate transparency levels.

Limitations and open challenges include:

  • Scalability to large antecedent pools: The prefix-tree search can become computationally expensive as the rule base grows, or when extreme transparency is required.
  • Dependency on quality of pre-mined antecedents: The algorithm's success hinges on effective antecedent mining, an external, potentially non-trivial pipeline step.
  • Lack of end-to-end training: Current methods require a fixed two-stage pipeline (pre–/post–black-box). Joint optimization of hsh_s and hch_c remains unaddressed.
  • Generality beyond rule lists: While extending to other interpretable classes is in principle straightforward, it requires new analysis and PAC bounds.

Future work includes exploring end-to-end hybrid learning with guarantees, adaptive (data-dependent) transparency, multiclass extensions, and investigating the empirical-theoretical gap in PAC bounds and observed sweet-spots (Ferry et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to HybridCORELS.