Papers
Topics
Authors
Recent
Search
2000 character limit reached

Conditional PED-ANOVA in Hierarchical HPO

Updated 4 February 2026
  • condPED-ANOVA is a statistical framework that estimates hyperparameter importance by isolating within-regime effects in conditional and hierarchical search spaces.
  • It employs a closed-form estimator using Pearson divergence and kernel density estimation to achieve scale invariance and computational efficiency.
  • The method effectively resolves gating pathologies by distinguishing active hyperparameters from inactive ones in dynamic optimization settings.

Conditional PED-ANOVA (condPED-ANOVA) is a statistical framework for efficiently estimating hyperparameter importance (HPI) within hierarchical and dynamic search spaces, i.e., domains where the presence or configuration of a hyperparameter may depend on the values of other hyperparameters. This method generalizes the original PED-ANOVA, which quantifies HPI using the variance of top-performing configurations within a fixed search space, to accommodate conditional or regime-dependent structures—circumventing pathologies that plague standard approaches in such settings. condPED-ANOVA achieves scale-invariant, closed-form computation of conditional HPI via Pearson divergence on one-dimensional marginals, implemented efficiently through kernel density estimation (KDE), and is suitable for high-performance hyperparameter optimization (HPO) workflows (Watanabe et al., 2023, Baba et al., 28 Jan 2026).

1. Background: From f-ANOVA to PED-ANOVA and the Need for Conditioning

Conventional functional ANOVA (f-ANOVA) decomposes the variance of an objective function f:X=d=1DX(d)Rf:X=\prod_{d=1}^D X^{(d)} \to \mathbb R into main effects and interactions across coordinates, yielding a unique orthogonal decomposition: f(x)=f+d=1Df{d}(xd)+f(x) = f_\emptyset + \sum_{d=1}^D f_{\{d\}}(x_d) + \cdots with main effect variances vd=Exd[f{d}(xd)2]v_d = \mathbb E_{x_d}[f_{\{d\}}(x_d)^2]. The standard measure of variable importance is vd/v0v_d/v_0.

While f-ANOVA operates globally, PED-ANOVA (Watanabe et al., 2023) addresses importance within an arbitrary subspace SXS \subseteq X, such as the top γ\gamma-quantile region Sγ={x:f(x)fγ}S_\gamma = \{x : f(x) \leq f^\gamma\}. The main-effect local variance is computed efficiently by relating it to Pearson divergence between 1D marginals of top-performing and broader regions, using kernel density estimates instead of global integrals: vdγ=(γγ)2DPE(pd(Sγ)pd(Sγ))v_d^\gamma = \left( \frac{\gamma'}{\gamma} \right)^2 D_{PE}(p_d(\cdot|S_{\gamma'}) \| p_d(\cdot|S_\gamma) ) This structure is highly efficient and scale-invariant, but assumes fixed, unconditional hyperparameter spaces.

In HPO settings where the space is hierarchical or certain hyperparameters are present only in specific "regimes" (e.g., tree-structured search or CASH problems), standard or even local methods fail to separate the effect of conditional activation or dynamic domains from genuine parameter importance, motivating the development of condPED-ANOVA (Baba et al., 28 Jan 2026).

2. Representing Conditional Search Spaces and Regime Structure

To formalize conditionality, condPED-ANOVA introduces regime assignments:

  • Each coordinate x(d)x^{(d)} can be in K(d)K^{(d)} possible regimes (branches), defined by a function r(d):X{1,,K(d)}r^{(d)}: X \to \{1, \ldots, K^{(d)}\}.
  • Each regime ii for dd has domain Zi(d)\mathcal Z_i^{(d)} (with dummy value {}\{\perp\} for inactive coordinates).
  • The extended domain for dd is the disjoint union S(d)=i{i}×Zi(d)\mathsf S^{(d)} = \bigsqcup_i \{i\} \times \mathcal Z_i^{(d)}.
  • The law μγ(d)\mu_\gamma^{(d)} over S(d)\mathsf S^{(d)} is induced by the empirical distribution of the top-γ\gamma samples.

This approach ensures that, for each hyperparameter, the conditional impact (only when active in its regime) is separated from effects that originate purely from the structure of the search space.

3. Conditional Local HPI: Within-Regime Variance Principle

Application of the standard local HPI (variance of the marginal mean in the top-γ\gamma set) to extended domains yields two contributions via the law of total variance: vγ(d)=Ei[Varzi(gγ(i,z))]+Vari[Ezi(gγ(i,z))]v_\gamma^{(d)} = \mathbb{E}_i \left[ \mathrm{Var}_{z|i}\left( g_\gamma(i, z) \right) \right] + \mathrm{Var}_i \left[ \mathbb{E}_{z|i}\left( g_\gamma(i, z) \right) \right] where gγ(i,z)=E[bγI(d)=i,Z(d)=z]g_\gamma(i, z) = \mathbb{E}[b_{\gamma'} | I^{(d)} = i, Z^{(d)} = z]. The second term ("inter-regime") measures variance arising from regime selection, i.e., whether the hyperparameter is present or not—confounding HPI through upstream gating.

To ensure HPI is not spuriously attributed to variables that merely gate parametric presence, condPED-ANOVA defines the conditional local HPI as only the within-regime variance: vγ,within(d):=EI(d)[VarZ(d)(gγ(I(d),Z(d))I(d))]v_{\gamma,\mathrm{within}}^{(d)} := \mathbb{E}_{I^{(d)}} \left[ \mathrm{Var}_{Z^{(d)}} \left( g_\gamma(I^{(d)}, Z^{(d)}) \mid I^{(d)} \right) \right] This quantity vanishes when x(d)x^{(d)} is inactive (regime-defined), and zeroes out effects stemming purely from the regime selection logic itself (Baba et al., 28 Jan 2026).

4. Closed-Form Estimator and Algorithmic Implementation

Analogous to PED-ANOVA, the conditional local HPI admits a closed-form estimator via Pearson divergence, evaluated for each regime. Define for regime ii:

  • αi(d)\alpha_i^{(d)} = fraction of top-γ\gamma' samples in regime ii
  • βi(d)\beta_i^{(d)} = fraction of top-γ\gamma samples in regime ii
  • pγ,i(d)p_{\gamma', i}^{(d)}, pγ,i(d)p_{\gamma, i}^{(d)} = 1D PDFs of Z(d)Z^{(d)} in regime ii among top-γ\gamma', top-γ\gamma samples (estimated by KDE)

The estimator is: vγ,within(d)=(γγ)2i=1K(d)(αi(d))2βi(d)DPE(pγ,i(d)pγ,i(d))v_{\gamma,\mathrm{within}}^{(d)} = \left(\frac{\gamma'}{\gamma}\right)^2 \sum_{i=1}^{K^{(d)}} \frac{(\alpha_i^{(d)})^2}{\beta_i^{(d)}} D_{\mathrm{PE}}\big( p_{\gamma', i}^{(d)} \big\| p_{\gamma, i}^{(d)} \big) Only the within-regime Pearson divergences are summed, and for inactive regimes the contribution is zero, since PDFs are degenerate at \perp. The final normalized conditional local HPI is given by dividing vγ,within(d)v_{\gamma,\mathrm{within}}^{(d)} by the total over all dd.

Algorithmic steps are:

  1. Extract regime–coordinate pairs per sample.
  2. For each, compute one-dimensional KDEs over Z(d)Z^{(d)} in both γ\gamma and γ\gamma' sets.
  3. Evaluate Pearson divergence DPED_{\mathrm{PE}} via analytic formulae or sample averaging.
  4. Aggregate per regime, sum weighted by regime prevalence.

Computational complexity is O(RM+DN)O(R \, M + D N) for R=dK(d)R=\sum_d K^{(d)} total regimes, MM the 1D KDE/divergence cost, and NN samples. condPED-ANOVA thus retains the efficiency of vanilla PED-ANOVA while handling conditional structure (Baba et al., 28 Jan 2026).

5. Pathologies of Naive Baselines and Empirical Validation

Naive adaptations of PED-ANOVA, f-ANOVA, tree-based impurity (MDI), or SHAP, extended by sample filtering, default imputation, or domain expansion, yield spurious or vanishing HPIs for inactive or gated parameters, or inappropriately distribute importance among children inactivated by gating.

Synthesized cases demonstrate:

  • In a branching function f(c,x,y)f(c, x, y) where only cc gates xx or yy activity, condPED-ANOVA assigns all importance to cc at the branch, and zero to inactive children, while baselines diffuse importance between xx and yy or misassign to inactives.
  • In regime-dependent domains (domain shifts but no gating), condPED-ANOVA isolates which domain edge matters; naive methods conflate the effect or collapse it.
  • Inactive parameters always exhibit zero (or vanishing) conditional local HPI with condPED-ANOVA; naive baselines commonly do not.

Empirical results on synthetic and real HPO workflows show that condPED-ANOVA's importance curves vary smoothly with the quantile parameter γ\gamma', and align with known ground truth, while baseline outputs are erratic or fail to reflect the intended attribution (Baba et al., 28 Jan 2026).

6. Practical Applications, Guidance, and Limitations

condPED-ANOVA is integrated into tools such as Optuna’s importance API. Users select (γ,γ)(\gamma, \gamma') to probe desired subspace depths; typical ranges are γ{0.01,,0.49}\gamma' \in \{0.01, \ldots, 0.49\} for γ=1\gamma=1. For each regime–coordinate pair, KDEs (e.g., defaulting to Scott's rule) are computed and Pearson divergence is evaluated per dimension.

Typical applications include HPO scenarios with CASH or hierarchical conditional spaces, tree-structured search, and any context where parameter activity is regime-dependent or domains are dynamically assigned.

Limitations include:

  • Instability when some regimes are sparsely sampled (<30< 30 samples)—mitigated by merging rare regimes or discretizing parents.
  • Necessity to know the regime assignment function r(d)r^{(d)} in advance (e.g., from the space’s tree structure); detection of latent gating is outside the method’s scope.
  • Sensitivity to the base sampling distribution p0p_0; for uneven coverage, reweighting or importance-weighted KDE may be necessary.

A plausible implication is that condPED-ANOVA provides a well-defined and interpretable solution for HPI in spaces long considered pathological for global or naive local methods, especially in neural architecture search or conditional hyperparameter grids.

7. Summary and Comparative Table

condPED-ANOVA combines the computational efficiency and scale-invariance of PED-ANOVA with a mathematically principled treatment of conditionality, ensuring that HPI is meaningful even under complex hierarchical and dynamic search spaces. By restricting attribution to within-regime (active) variance, it avoids spurious importance assignments and delivers interpretable results even as the HPO space becomes hierarchical or branches due to upstream decisions.

Method Handles Conditionality Closed-form/Speed Avoids Gating Pathologies
f-ANOVA, SHAP, MDI No (naive extensions) No (costly) No
PED-ANOVA No Yes No
condPED-ANOVA Yes Yes Yes

condPED-ANOVA thus represents a significant advance in the assessment of hyperparameter importance in modern, structured HPO settings, as established in (Watanabe et al., 2023) and (Baba et al., 28 Jan 2026).

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Conditional PED-ANOVA (condPED-ANOVA).