Papers
Topics
Authors
Recent
2000 character limit reached

Hyperparameter Importance: Methods and Applications

Updated 13 January 2026
  • Hyperparameter Importance (HPI) is a quantitative framework that identifies key hyperparameters affecting model performance using techniques like functional ANOVA and Shapley-value attribution.
  • HPI methodologies integrate surrogate modeling, variance decomposition, and Monte Carlo methods to estimate both main and interaction effects across hyperparameters.
  • Its integration with optimization strategies such as Bayesian optimization and evolutionary algorithms accelerates convergence while reducing computational costs.

Hyperparameter Importance (HPI) is a quantitative framework for identifying which hyperparameters most significantly affect the performance of a machine learning or deep learning model. HPI enables targeted hyperparameter optimization, reducing computation and accelerating convergence by focusing search on the most impactful parameters. Central methodologies include variance-based decompositions such as functional ANOVA, game-theoretic Shapley attribution, and surrogate-based importance estimates, with recent advances extending HPI to multi-objective, interaction-dependent, and subspace-restricted ML settings.

1. Formal Definitions and Mathematical Foundations

At the core of HPI is the decomposition of an algorithm's configuration space Θ⊂RD\Theta \subset \mathbb{R}^D, where θ∈Θ\theta \in \Theta specifies a DD-dimensional hyperparameter vector. Let f^(θ)\hat f(\theta) denote a surrogate or true performance function (e.g., validation error, accuracy, risk). Functional ANOVA is foundational, expressing f^\hat f as an additive sum over effects of subsets U⊆{1,…,D}U \subseteq \{1,\dots,D\}:

f^(θ)=∑U⊆{1,…,D}f^U(θU)\hat f(\theta) = \sum_{U \subseteq \{1,\dots,D\}} \hat f_U(\theta_U)

For each singleton U={d}U = \{d\}, the main effect is:

f^d(θd)=ad(θd)−f^∅\hat f_d(\theta_d) = a_d(\theta_d) - \hat f_\emptyset

where ad(θd)=1∣Θ−d∣∫f^(θd,θ−d)dθ−da_d(\theta_d) = \frac{1}{|\Theta_{-d}|} \int \hat f(\theta_d, \theta_{-d}) d\theta_{-d} is the marginal mean with other dimensions marginalized out. The importance is then quantified as the normalized variance explained:

Vd=Varθd[f^d(θd)],Id=Vd/VV_d = \text{Var}_{\theta_d}[\hat f_d(\theta_d)], \qquad I_d = V_d / V

where V=∑i=1DViV = \sum_{i=1}^D V_i is the total surrogate-predicted variance. This formalism aligns with the definitions used in HOUSES (Zhang et al., 2019), meta-learning studies (Rijn et al., 2017), and large-scale empirical benchmarks (Bahmani et al., 2021).

Shapley-value based HPI generalizes this to cooperative-game frameworks. The Shapley value ϕj\phi_j for hyperparameter jj measures its expected marginal contribution across all possible contexts:

ϕj(ν)=∑S⊆N∖{j}∣S∣! (n−∣S∣−1)!n![ν(S∪{j})−ν(S)]\phi_j(\nu) = \sum_{S \subseteq N \setminus \{j\}} \frac{|S|!\,(n-|S|-1)!}{n!} [\nu(S \cup \{j\}) - \nu(S)]

where ν(S)\nu(S) is an "explanation game" (e.g., the performance obtainable by tuning hyperparameters in SS) (Wever et al., 3 Feb 2025, Garouani et al., 22 Dec 2025).

2. Methodologies for Quantifying and Computing HPI

HPI estimation in practice utilizes surrogate modeling and numerical integration:

  • Surrogate Modeling: Gaussian Processes, Random Forests, Extremely Randomized Trees, and Gradient Boosted Trees are commonly used surrogates to fit f^(θ)\hat f(\theta) from sampled evaluations. Surrogates can be posterior means (GP-BO) or ensemble predictors (Random Forests) (Zhang et al., 2019, Rijn et al., 2017, Bahmani et al., 2021).
  • Variance Decomposition: Once a surrogate is available, fANOVA decomposes variance into main effects VdV_d and interaction effects Vd,d′V_{d,d'} (Rijn et al., 2017, Jin, 2022).
  • Monte Carlo & Grid Evaluation: Marginal effects ad(θd)a_d(\theta_d) are typically estimated over discrete grids or random subsets to approximate integrals.
  • Shapley-based Attribution: HyperSHAP leverages permutation-sampling or Faithful k-Shapley schemes to compute high-dimensional attribution (Wever et al., 3 Feb 2025), with MetaSHAP using meta-learning to adapt SHAP values to new datasets (Garouani et al., 22 Dec 2025).
  • Subspace and Local Importance: PED-ANOVA enables efficient local HPI estimation in arbitrary subspaces (e.g., top performance quantile) using closed-form Pearson divergence between marginal distributions (Watanabe et al., 2023).
  • Subsampling Estimation: For large datasets, consistent estimation of HPI via repeated subsampling achieves stable rankings at much lower cost (Jin, 2022).

3. Integration with Optimization and Automated ML

Multiple frameworks actively exploit HPI for efficient hyperparameter search:

  • Evolutionary Algorithms: Mutation probabilities are weighted by HPI scores, concentrating search on "important" dimensions (Zhang et al., 2019).
  • Bayesian Optimization: HPI-informed acquisition functions and dimensionality reduction accelerate convergence. HOUSES (Zhang et al., 2019) outperforms random search and stationary GP, converging in 20–30% fewer expensive evaluations.
  • Sequential Grouping: Deep learning (CNN) experiments assign the most budget to the most important hyperparameter groups, yielding up to 31.9% reduction in optimization time with negligible accuracy drop (Wang et al., 7 Mar 2025).
  • Multi-objective Optimization (MOO): Dynamic HPI tracks Pareto trade-offs, identifying context-sensitive hyperparameters under scalarizations from algorithms such as ParEGO (Theodorakopoulos et al., 6 Jan 2026, Theodorakopoulos et al., 2024).
  • Defensive Tuning: Non-inferiority tests for tuning risk validate that some hyperparameters (e.g., RF bootstrap/criterion) can be safely fixed at defaults under typical budgets (Weerts et al., 2020).

4. Empirical Findings and Benchmarks

Comprehensive meta-analyses and ablation studies provide robust evidence for HPI's practical value:

  • Canonical Algorithms: For RF, min_samples_leaf and max_features dominate; for AdaBoost, max_depth and learning_rate; for SVM, γ and C (Rijn et al., 2017, Bahmani et al., 2021).
  • Deep Neural Networks: Final convolutional layer size in CNNs and learning rate are top drivers of performance variance (Zhang et al., 2019, Wang et al., 2024). In QNNs, learning rate and circuit depth are most influential, while entangler types have negligible impact (Moussa et al., 2022).
  • Multi-objective Context: Network size hyperparameters affect speed/energy, while optimizer and augmentation flags appear critical for fairness or energy objectives (Theodorakopoulos et al., 2024).
  • Interaction Effects: Strong pairwise interactions, e.g. between learning rate and gradient clipping in DP-SGD (explaining >12% of variance), and between number of estimators and learning_rate in boosting (Morsbach et al., 2024, Bhattacharyya et al., 2022).
  • Subspace Effects: PED-ANOVA reveals that some hyperparameters become more important only in the highest-performance regime, reversing global orderings (Watanabe et al., 2023).

5. Practical Guidelines and Implications

Empirical studies distill HPI into actionable tuning strategies:

6. Extensions, Limitations, and Future Directions

Recent work charts multiple promising research directions:

7. Reference Table: Typical Importance Ranks for Canonical Algorithms

Algorithm Hyperparameter 1 Importance Hyperparameter 2 Importance
SVM (RBF) γ 0.55 C 0.30
Random Forest min_samples_leaf 0.45 max_features 0.30
AdaBoost max_depth 0.48 learning_rate 0.28
CNN (DL) num_conv_layers 0.39 learning_rate 0.23
DP-SGD clip threshold ≈24% learning rate ≈23%

These values, derived from functional ANOVA and meta-learning studies (Rijn et al., 2017, Wang et al., 2024, Morsbach et al., 2024), illustrate that a small subset of hyperparameters explain the majority of performance variance across models and datasets.


Hyperparameter Importance represents a rigorous, data-driven foundation for understanding, diagnosing, and accelerating hyperparameter optimization in modern machine learning practice and research. Its integration with optimization, multi-objective trade-offs, and explainable AI continues to drive rapid methodological and empirical advances.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Hyperparameter Importance (HPI).