Conditional PED-ANOVA in Hierarchical HPO
- condPED-ANOVA is a statistical framework that estimates hyperparameter importance by isolating within-regime effects in conditional and hierarchical search spaces.
- It employs a closed-form estimator using Pearson divergence and kernel density estimation to achieve scale invariance and computational efficiency.
- The method effectively resolves gating pathologies by distinguishing active hyperparameters from inactive ones in dynamic optimization settings.
Conditional PED-ANOVA (condPED-ANOVA) is a statistical framework for efficiently estimating hyperparameter importance (HPI) within hierarchical and dynamic search spaces, i.e., domains where the presence or configuration of a hyperparameter may depend on the values of other hyperparameters. This method generalizes the original PED-ANOVA, which quantifies HPI using the variance of top-performing configurations within a fixed search space, to accommodate conditional or regime-dependent structures—circumventing pathologies that plague standard approaches in such settings. condPED-ANOVA achieves scale-invariant, closed-form computation of conditional HPI via Pearson divergence on one-dimensional marginals, implemented efficiently through kernel density estimation (KDE), and is suitable for high-performance hyperparameter optimization (HPO) workflows (Watanabe et al., 2023, Baba et al., 28 Jan 2026).
1. Background: From f-ANOVA to PED-ANOVA and the Need for Conditioning
Conventional functional ANOVA (f-ANOVA) decomposes the variance of an objective function into main effects and interactions across coordinates, yielding a unique orthogonal decomposition: with main effect variances . The standard measure of variable importance is .
While f-ANOVA operates globally, PED-ANOVA (Watanabe et al., 2023) addresses importance within an arbitrary subspace , such as the top -quantile region . The main-effect local variance is computed efficiently by relating it to Pearson divergence between 1D marginals of top-performing and broader regions, using kernel density estimates instead of global integrals: This structure is highly efficient and scale-invariant, but assumes fixed, unconditional hyperparameter spaces.
In HPO settings where the space is hierarchical or certain hyperparameters are present only in specific "regimes" (e.g., tree-structured search or CASH problems), standard or even local methods fail to separate the effect of conditional activation or dynamic domains from genuine parameter importance, motivating the development of condPED-ANOVA (Baba et al., 28 Jan 2026).
2. Representing Conditional Search Spaces and Regime Structure
To formalize conditionality, condPED-ANOVA introduces regime assignments:
- Each coordinate can be in possible regimes (branches), defined by a function .
- Each regime for has domain (with dummy value for inactive coordinates).
- The extended domain for is the disjoint union .
- The law over is induced by the empirical distribution of the top- samples.
This approach ensures that, for each hyperparameter, the conditional impact (only when active in its regime) is separated from effects that originate purely from the structure of the search space.
3. Conditional Local HPI: Within-Regime Variance Principle
Application of the standard local HPI (variance of the marginal mean in the top- set) to extended domains yields two contributions via the law of total variance: where . The second term ("inter-regime") measures variance arising from regime selection, i.e., whether the hyperparameter is present or not—confounding HPI through upstream gating.
To ensure HPI is not spuriously attributed to variables that merely gate parametric presence, condPED-ANOVA defines the conditional local HPI as only the within-regime variance: This quantity vanishes when is inactive (regime-defined), and zeroes out effects stemming purely from the regime selection logic itself (Baba et al., 28 Jan 2026).
4. Closed-Form Estimator and Algorithmic Implementation
Analogous to PED-ANOVA, the conditional local HPI admits a closed-form estimator via Pearson divergence, evaluated for each regime. Define for regime :
- = fraction of top- samples in regime
- = fraction of top- samples in regime
- , = 1D PDFs of in regime among top-, top- samples (estimated by KDE)
The estimator is: Only the within-regime Pearson divergences are summed, and for inactive regimes the contribution is zero, since PDFs are degenerate at . The final normalized conditional local HPI is given by dividing by the total over all .
Algorithmic steps are:
- Extract regime–coordinate pairs per sample.
- For each, compute one-dimensional KDEs over in both and sets.
- Evaluate Pearson divergence via analytic formulae or sample averaging.
- Aggregate per regime, sum weighted by regime prevalence.
Computational complexity is for total regimes, the 1D KDE/divergence cost, and samples. condPED-ANOVA thus retains the efficiency of vanilla PED-ANOVA while handling conditional structure (Baba et al., 28 Jan 2026).
5. Pathologies of Naive Baselines and Empirical Validation
Naive adaptations of PED-ANOVA, f-ANOVA, tree-based impurity (MDI), or SHAP, extended by sample filtering, default imputation, or domain expansion, yield spurious or vanishing HPIs for inactive or gated parameters, or inappropriately distribute importance among children inactivated by gating.
Synthesized cases demonstrate:
- In a branching function where only gates or activity, condPED-ANOVA assigns all importance to at the branch, and zero to inactive children, while baselines diffuse importance between and or misassign to inactives.
- In regime-dependent domains (domain shifts but no gating), condPED-ANOVA isolates which domain edge matters; naive methods conflate the effect or collapse it.
- Inactive parameters always exhibit zero (or vanishing) conditional local HPI with condPED-ANOVA; naive baselines commonly do not.
Empirical results on synthetic and real HPO workflows show that condPED-ANOVA's importance curves vary smoothly with the quantile parameter , and align with known ground truth, while baseline outputs are erratic or fail to reflect the intended attribution (Baba et al., 28 Jan 2026).
6. Practical Applications, Guidance, and Limitations
condPED-ANOVA is integrated into tools such as Optuna’s importance API. Users select to probe desired subspace depths; typical ranges are for . For each regime–coordinate pair, KDEs (e.g., defaulting to Scott's rule) are computed and Pearson divergence is evaluated per dimension.
Typical applications include HPO scenarios with CASH or hierarchical conditional spaces, tree-structured search, and any context where parameter activity is regime-dependent or domains are dynamically assigned.
Limitations include:
- Instability when some regimes are sparsely sampled ( samples)—mitigated by merging rare regimes or discretizing parents.
- Necessity to know the regime assignment function in advance (e.g., from the space’s tree structure); detection of latent gating is outside the method’s scope.
- Sensitivity to the base sampling distribution ; for uneven coverage, reweighting or importance-weighted KDE may be necessary.
A plausible implication is that condPED-ANOVA provides a well-defined and interpretable solution for HPI in spaces long considered pathological for global or naive local methods, especially in neural architecture search or conditional hyperparameter grids.
7. Summary and Comparative Table
condPED-ANOVA combines the computational efficiency and scale-invariance of PED-ANOVA with a mathematically principled treatment of conditionality, ensuring that HPI is meaningful even under complex hierarchical and dynamic search spaces. By restricting attribution to within-regime (active) variance, it avoids spurious importance assignments and delivers interpretable results even as the HPO space becomes hierarchical or branches due to upstream decisions.
| Method | Handles Conditionality | Closed-form/Speed | Avoids Gating Pathologies |
|---|---|---|---|
| f-ANOVA, SHAP, MDI | No (naive extensions) | No (costly) | No |
| PED-ANOVA | No | Yes | No |
| condPED-ANOVA | Yes | Yes | Yes |
condPED-ANOVA thus represents a significant advance in the assessment of hyperparameter importance in modern, structured HPO settings, as established in (Watanabe et al., 2023) and (Baba et al., 28 Jan 2026).