Papers
Topics
Authors
Recent
Search
2000 character limit reached

Dynamic Hyperparameter Importance

Updated 23 June 2026
  • Dynamic hyperparameter importance is a framework that quantifies how the impact of hyperparameters varies across training phases, conditional regimes, and multi-objective settings.
  • It adapts evaluation metrics to account for hierarchical and dynamic search spaces, thereby informing more efficient hyperparameter optimization and AutoML applications.
  • Key algorithms like condPED-ANOVA and HyperSHAP leverage regime-specific variance estimation and Shapley values to improve optimization convergence and interpretability.

Dynamic hyperparameter importance (HPI) denotes the quantification and analysis of how the impact of hyperparameters on learning outcomes varies across training regimes, phases, regions of the search space, or multi-objective trade-offs. Recent advances address not only changes in HPI over time or objective-weightings, but also its dependency on dynamic (hierarchical, conditional) search spaces where hyperparameter activity, domain, and relevance themselves evolve according to the configuration context. These developments underpin modern approaches for scalable AutoML, architecture search, and multi-objective Bayesian optimization, where understanding and actively exploiting dynamic HPI directly improves efficiency, interpretability, and convergence (Theodorakopoulos et al., 6 Jan 2026, Baba et al., 28 Jan 2026, Wever et al., 3 Feb 2025, Mohan et al., 2023, Watanabe et al., 2023, Zhang et al., 2021).

1. Conceptual Foundations and Motivating Scenarios

Static HPI assumes the relevance of hyperparameters is constant across all settings. This assumption breaks down in several settings:

  • Conditional/hierarchical search spaces: Some hyperparameters are only active or meaningful under specific parent choices (e.g., optimizer-specific learning rates, architecture branch selectors) (Baba et al., 28 Jan 2026).
  • Dynamic learning processes: The impact of specific hyperparameters evolves as training progresses—e.g., RL exploration rate matters early, learning rate schedule is critical mid-training, while discount factor dominates late performance (Mohan et al., 2023, Zhang et al., 2021).
  • Multi-objective trade-offs: Optimal regions and the dimensions that matter most vary as one moves along (or as a function of) the Pareto front in multi-objective HPO (Theodorakopoulos et al., 6 Jan 2026).
  • Active region targeting: Practical HPO often focuses on “top-performing” subspaces (e.g., top-γ quantile), requiring HPI tailored to these regions (Watanabe et al., 2023).

Dynamic HPI provides tools to address these realities, quantifying “what matters when and where,” and enabling adaptive focus during optimization.

2. Formal Definitions: Dynamic and Conditional HPI

Unconditional HPI in Static Domains

Classic f-ANOVA and related approaches evaluate HPI globally or in fixed subspaces, decomposing the objective f:XRf: X \to \mathbb{R} across hyperparameters (x1,...,xD)(x_1, ..., x_D): v(d)=Varxp0[E[f(x)x(d)]]v^{(d)} = \operatorname{Var}_{x \sim p_0}\left[ \mathbb{E}[f(x) \mid x^{(d)}] \right] and normalized importance as v(d)/dv(d)v^{(d)} / \sum_{d'} v^{(d')} (Watanabe et al., 2023).

Dynamic and Conditional HPI

In hierarchical/dynamic settings, each hyperparameter x(d)x^{(d)} can be partitioned into K(d)K^{(d)} regimes (indexed by ii, each with domain Zi(d)Z_i^{(d)} and activation indicator I(d)I^{(d)}). The conditional local HPI isolates within-regime variance: vγ,within(d)=EI(d)[VarZ(d)(E[1ffγI(d),Z(d)]|I(d))]v_{\gamma,\mathrm{within}}^{(d)} = \mathbb{E}_{I^{(d)}}\left[ \operatorname{Var}_{Z^{(d)}}\left( \mathbb{E}\left[1_{f \leq f_{\gamma'}} \mid I^{(d)}, Z^{(d)} \right] \middle| I^{(d)} \right) \right] Normalizing over (x1,...,xD)(x_1, ..., x_D)0 yields the conditional local HPI (Baba et al., 28 Jan 2026).

For multi-objective settings, let (x1,...,xD)(x_1, ..., x_D)1. Given a scalarization (e.g., ParEGO’s (x1,...,xD)(x_1, ..., x_D)2), Shapley value-based HPI scores the normalized surrogate-imputed marginal gain of each hyperparameter for the current trade-off (Theodorakopoulos et al., 6 Jan 2026, Wever et al., 3 Feb 2025).

3. Key Algorithms and Estimation Methods

Conditional PED-ANOVA

This method adapts PED-ANOVA by decomposing the marginal variance into regime-specific within-regime effects. For each regime (x1,...,xD)(x_1, ..., x_D)3, let (x1,...,xD)(x_1, ..., x_D)4, (x1,...,xD)(x_1, ..., x_D)5 be 1-D KDEs (over top-γ′ and top-γ samples per regime). The main result is: (x1,...,xD)(x_1, ..., x_D)6 where (x1,...,xD)(x_1, ..., x_D)7 is the Pearson divergence, (x1,...,xD)(x_1, ..., x_D)8 and (x1,...,xD)(x_1, ..., x_D)9 are empirical regime probabilities (Baba et al., 28 Jan 2026).

Algorithmic Steps (condPED-ANOVA)

  1. Compute empirical quantiles v(d)=Varxp0[E[f(x)x(d)]]v^{(d)} = \operatorname{Var}_{x \sim p_0}\left[ \mathbb{E}[f(x) \mid x^{(d)}] \right]0, v(d)=Varxp0[E[f(x)x(d)]]v^{(d)} = \operatorname{Var}_{x \sim p_0}\left[ \mathbb{E}[f(x) \mid x^{(d)}] \right]1.
  2. For each v(d)=Varxp0[E[f(x)x(d)]]v^{(d)} = \operatorname{Var}_{x \sim p_0}\left[ \mathbb{E}[f(x) \mid x^{(d)}] \right]2 and regime v(d)=Varxp0[E[f(x)x(d)]]v^{(d)} = \operatorname{Var}_{x \sim p_0}\left[ \mathbb{E}[f(x) \mid x^{(d)}] \right]3, construct v(d)=Varxp0[E[f(x)x(d)]]v^{(d)} = \operatorname{Var}_{x \sim p_0}\left[ \mathbb{E}[f(x) \mid x^{(d)}] \right]4, v(d)=Varxp0[E[f(x)x(d)]]v^{(d)} = \operatorname{Var}_{x \sim p_0}\left[ \mathbb{E}[f(x) \mid x^{(d)}] \right]5 (1-D KDE).
  3. Compute divergence term v(d)=Varxp0[E[f(x)x(d)]]v^{(d)} = \operatorname{Var}_{x \sim p_0}\left[ \mathbb{E}[f(x) \mid x^{(d)}] \right]6, and weighting coefficients.
  4. Aggregate per regime and normalize across v(d)=Varxp0[E[f(x)x(d)]]v^{(d)} = \operatorname{Var}_{x \sim p_0}\left[ \mathbb{E}[f(x) \mid x^{(d)}] \right]7.

Complexity is v(d)=Varxp0[E[f(x)x(d)]]v^{(d)} = \operatorname{Var}_{x \sim p_0}\left[ \mathbb{E}[f(x) \mid x^{(d)}] \right]8 up to regime multiplicities, a sharp improvement over surrogate-based approaches (Baba et al., 28 Jan 2026, Watanabe et al., 2023).

Dynamic HPI in Multi-Objective Optimization

ParEGO-based dynamic HPI incorporates HyperSHAP:

  • At each iteration (or every v(d)=Varxp0[E[f(x)x(d)]]v^{(d)} = \operatorname{Var}_{x \sim p_0}\left[ \mathbb{E}[f(x) \mid x^{(d)}] \right]9 steps), draw a new scalarization weight vector.
  • Fit a GP surrogate to the scalarized target.
  • Compute Shapley values v(d)/dv(d)v^{(d)} / \sum_{d'} v^{(d')}0 via HyperSHAP, then restrict subsequent search to hyperparameters whose cumulative v(d)/dv(d)v^{(d)} / \sum_{d'} v^{(d')}1 reach a threshold v(d)/dv(d)v^{(d)} / \sum_{d'} v^{(d')}2.
  • This focus is dynamically updated, alternating between phases of full, restricted, and then again unrestricted optimization (Theodorakopoulos et al., 6 Jan 2026, Wever et al., 3 Feb 2025).

Shapley Value-Based Surrogate Decomposition

HyperSHAP uses Monte Carlo approximations to estimate the marginal contribution v(d)/dv(d)v^{(d)} / \sum_{d'} v^{(d')}3 across random permutations or coalitions, both for local (single configuration) and global (across the search space) HPI. It enables extraction of main and interaction effects, as well as temporal/dynamic evolution under the current surrogate (Wever et al., 3 Feb 2025).

4. Computational and Practical Considerations

Method Complexity Conditional/Hierarchical Support Regime Handling
condPED-ANOVA v(d)/dv(d)v^{(d)} / \sum_{d'} v^{(d')}4 Yes Supports any regime split
PED-ANOVA v(d)/dv(d)v^{(d)} / \sum_{d'} v^{(d')}5 No N/A
HyperSHAP v(d)/dv(d)v^{(d)} / \sum_{d'} v^{(d')}6 (MC) Yes (if full context present) By context
Surrogate-based f-ANOVA v(d)/dv(d)v^{(d)} / \sum_{d'} v^{(d')}7 No N/A
  • condPED-ANOVA automatically excludes inactive regimes (i.e., domains equal v(d)/dv(d)v^{(d)} / \sum_{d'} v^{(d')}8), requires sufficient samples per regime, and tolerates domain overlap without ambiguity (Baba et al., 28 Jan 2026).
  • HyperSHAP/ParEGO can become costly when the configuration space is large, due to exponential subsets for Shapley value estimation and repeated surrogate retraining (Theodorakopoulos et al., 6 Jan 2026, Wever et al., 3 Feb 2025).
  • Estimator robustness depends on coverage within regimes and can suffer if rare regimes lack samples. Adaptive binning or hierarchical smoothing is a potential remedy (Baba et al., 28 Jan 2026).

5. Empirical Findings and Benchmark Results

Hierarchical/Conditional Search Spaces

condPED-ANOVA consistently delivers interpretable importance profiles:

  • Gating (branch) parameters dominate at medium quantiles (γ′ ≈ 0.5), while only the active child parameter remains important as γ′ → 0.
  • Existing HPI methods (PED-ANOVA, f-ANOVA, MDI, SHAP) either assign spurious importance to inactive parameters or conflate parent and child effects (Baba et al., 28 Jan 2026).

Multi-Objective Optimization

Dynamic HPI-ParEGO shows:

  • 30–50% reduction in required evaluations to reach the same hypervolume on synthetic (PyMOO: ZDT1–ZDT4) and real (YAHPO-Gym: LCBench, rbv2_ranger) benchmarks.
  • Highest convergence speed and Pareto front quality among Bayesian solvers; matches or exceeds evolutionary baselines in later trials.
  • Fixing unimportant hyperparameters to the incumbent—the reference point from which Shapley values are computed—yields the strictest gain (Theodorakopoulos et al., 6 Jan 2026).

Dynamic RL and Deep Learning

Time-resolved landscape studies show that the dominant hyperparameter varies over training phases and environment/algorithm pairs. For instance, v(d)/dv(d)v^{(d)} / \sum_{d'} v^{(d')}9 (discount factor) dominates late in training for both DQN and SAC, while learning rate becomes less important (Mohan et al., 2023). Population-based training in MBRL realizes >10× gains over static HPO by enabling automatic horizon and learning rate tuning across phases (Zhang et al., 2021).

6. Limitations, Open Problems, and Future Directions

  • Current methods (condPED-ANOVA, dynamic HPI-ParEGO) capture only main effects; generalization to higher-order interactions in dynamic/conditional spaces is nontrivial (Baba et al., 28 Jan 2026).
  • Kernel density estimation and Shapley estimation may yield instability/variance when regime or coalition sample counts are low; adaptive strategies or hierarchical Bayesian smoothing may improve robustness (Baba et al., 28 Jan 2026, Theodorakopoulos et al., 6 Jan 2026).
  • Dynamic HPI for many-objective (m≫3) optimization, non-hard regime transitions, and integration into real-time active HPO remain open avenues.
  • Meta-learning and uncertainty-aware HPI estimators may allow for improved data efficiency and robustness in larger-scale or streaming settings (Theodorakopoulos et al., 6 Jan 2026).
  • Application in AutoRL and online scheduling scenarios could further tune hybrid schedules not just by main effect importance but by directly using HPI change-detection triggers (Mohan et al., 2023).

7. Significance and Impact

Dynamic HPI frameworks enable:

  • Principled, closed-form attribution and reduction in high-dimensional, nonstationary, or structured HPO scenarios.
  • Improved interpretability by disentangling parent/child effects in hierarchical spaces and by exposing context-dependent relevance.
  • Effective dimensionality reduction during search, accelerating HPO and improving final solution quality across supervised, RL, and multi-objective tasks (Theodorakopoulos et al., 6 Jan 2026, Baba et al., 28 Jan 2026).
  • A shift from static, post-hoc analysis of hyperparameter effect toward adaptive, optimization-aware exploitation, thereby opening new directions in AutoML controller and active HPO design.

References:

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Dynamic Hyperparameter Importance (HPI).