Search-Based Autotuning
- Search-based autotuning is a methodology that formulates optimization as a black-box search in high-dimensional, constraint-rich configuration spaces, enabling tuning of compiler flags, hyperparameters, and system parameters.
- It leverages algorithmic strategies such as Bayesian optimization, genetic algorithms, and CSP-based pruning to efficiently navigate combinatorial spaces and identify high-performing configurations.
- Applications in HPC, ML, and compiler optimization demonstrate significant performance gains and energy savings, highlighting its practical impact on modern computing systems.
Search-based autotuning is a methodology that formulates program or system optimization as a black-box search over a discrete (often combinatorial or mixed) configuration space. Each configuration—whether a set of compiler flags, hyperparameters, workflow rewrites, or system parameters—is evaluated purely empirically by compiling, running, and measuring relevant objectives (performance, energy, cost, accuracy). The search leverages algorithmic or statistical strategies to efficiently identify high-performing configurations, typically without requiring analytic derivative or model information of the underlying objective. This paradigm underpins a wide array of modern HPC, ML, and compiler optimization systems, achieving substantial performance/efficiency gains while maintaining portability across diverse platforms.
1. Formal Problem Statement and Search Space Construction
Search-based autotuning is characterized by its explicit treatment of optimization as search in a high-dimensional, constrained space defined by tunable parameters or “knobs.” Let denote a configuration vector, where is the Cartesian product of valid parameter domains (continuous, integer, ordinal, categorical, or permutation spaces), subject to constraints . The empirical objective may represent runtime, energy, throughput, or composite utility for .
The canonical goal is:
Search space construction itself has emerged as a bottleneck when millions to billions of code variants are possible but most violate constraints. Reformulating construction as a constraint satisfaction problem (CSP) enables solver-optimal enumeration of valid configurations. Runtime parsers can translate user-defined constraints (e.g., lambdas or string expressions) into CSP primitives (MinProduct, MaxSum, ModConstraint), supporting pruning, early domain reduction, and efficient enumeration—even under billions of possible variants with extreme sparsity. Modern autotuning frameworks now integrate C-extension–backed CSP solvers, sub-second enumeration, and direct integration with sampling and neighborhood operators for search algorithms (Willemsen et al., 30 Sep 2025).
The table below summarizes typical search space elements (from CPU/GPU code tuning, ML HPO, and workflow composition):
| Type | Examples | Domains / Constraints |
|---|---|---|
| Integer | tile sizes, unroll factors | |
| Categorical | scheduling flag, model choice | "static", "dynamic" |
| Permutation | loop nest ordering | all orderings |
| Continuous | learning rate | |
| Composite | hybrid MPI/OpenMP params, block-tile pairs | feasible tuples, co-dependencies |
2. Algorithmic Strategies for Search-Based Autotuning
A wide spectrum of search algorithms are deployed for autotuning, determined by budget, search space structure, and objective characteristics:
- Random and Latin-hypercube sampling. Provides initial broad coverage; enables model initialization; effective when evaluation costs permit batch exploration (Koch et al., 2018).
- Model-based optimization. Surrogate models (Gaussian processes, random forests, gradient-boosted trees) predict and quantify uncertainty, enabling acquisition-driven proposals (expected improvement, lower confidence bound) (Wu et al., 2020, Wu et al., 2023, Willemsen et al., 2021, Hellsten et al., 2022).
- Genetic/evolutionary algorithms. Population-based operators (tournament selection, crossover, mutation) navigate discrete, categorical, and permutation-rich spaces, especially when surrogates fail to capture complex interactions (Koch et al., 2018, Tørring et al., 2022).
- Direct and pattern search. Coordinate-wise or neighborhood local refinement (e.g. generating set search, tree-based MCTS, iterated local search) is effective in structured transformation spaces (e.g., loop transformation trees) (Kruse et al., 2020).
- Bandit models and racing. Multi-armed bandits (UCB1), successive halving, and racing methods dynamically allocate budget to promising arms or configurations, with strong utility for resource-constrained or online settings (Hossain et al., 2 Jan 2025).
Distinct adaptation strategies, such as hybridization (e.g. combining GA, pattern search, and LHS), parallel asynchronous evaluation, portfolio selection over multiple acquisition functions, and contextual/transfer learning, further accelerate convergence and robustness under heterogeneity (Koch et al., 2018, Haga et al., 2022, Li et al., 2023).
3. Handling Constraints, Feasibility, and Irregular Spaces
Search-based autotuning must natively support both explicit parameter constraints and latent (hidden) feasibility regions due to hardware/resource limitations, compilation errors, or code correctness. Key developments include:
- CSP-based search space pruning: Direct enumeration of only those satisfying via compiled, AST-optimized constraints; supports arbitrary logic, joint constraints, and common workload patterns (tile-alignment, divisibility, product/range) (Willemsen et al., 30 Sep 2025).
- Hidden constraints: Empirically observed failures (e.g. compile/run errors, OOM) are incorporated via feasibility surrogates (e.g. random forest classifiers) to bias search away from regions with high failure probability (Hellsten et al., 2022).
- Permutation modeling: Distance kernels (Spearman's ) on permutations capture sensitivity to reordering, critical for code generation and loop optimization (Hellsten et al., 2022).
- Hierarchical layering: For complex workflows, autotuning can layer search over architectural, step, and prompt subspaces, with adaptive budget allocation proportionally favoring higher-impact layers (He et al., 12 Feb 2025).
4. Surrogate and Acquisition Model Innovations
Bayesian optimization and its extensions are prominent throughout search-based autotuning frameworks:
- Gaussian processes (GP) with mixed kernels: Capture uncertainty in mixed continuous/discrete spaces, enabling efficient Expected Improvement, Probability of Improvement, or Lower Confidence Bound–driven exploration (Wu et al., 2020, Willemsen et al., 2021, Wu et al., 2023).
- Random forest and gradient-boosted surrogates: Outperform GPs in highly categorical, interaction-rich, or resource-constrained tasks because they natively handle nonlinear effects and avoid degenerate lengthscales (Wu et al., 2023, Wu et al., 2020).
- Acquisition adaptation: Use of contextual variance to scale exploration factors, multi-acquisition portfolios automatically promoting/demoting based on discounted-observation scores, and safe-region construction for live online tuning (Willemsen et al., 2021, Li et al., 2023).
- Transfer learning and ensemble meta-models: History-driven subspace design, VAE-guided initialization, and meta-surrogate ensembles (weighted by task similarity) provide rapid convergence when tuning across tasks, platforms, or input regimes (Li et al., 2022, Dorier et al., 2022, Li et al., 2023).
5. Parallelism, Scalability, and Practical System Integration
Realistic autotuning workflows often require scalable parallel evaluation, distributed surrogate/model retraining, and efficient search space management:
- Asynchronous Bayesian optimization: Manager–worker scheduling with random-forest surrogates supports near-zero overhead and 98–100% utilization at scale, enabling up to 128 parallel evaluations on platforms such as Argonne's Theta supercomputer (Dorier et al., 2022).
- Multi-level parallelism: Simultaneous model training (across batches or workers) and concurrent model exploration, efficiently decoupled to maximize cluster resources (Koch et al., 2018).
- Benchmark-driven evaluation frameworks: Adoption of robust, normalized performance-over-time metrics, simulation modes with trace replay for HPO of the autotuner itself, and FAIR datasets to standardize algorithm comparisons (Willemsen et al., 30 Sep 2025, Willemsen et al., 30 Sep 2025).
6. Domains of Application and Representative Achievements
Search-based autotuning has been validated across a diverse spectrum of application classes:
- Compiler optimization: Bayesian autotuning over loop pragmas and transformation trees delivers 2-3x speedup in computational kernels relative to baseline high-level heuristics using ≤200 empirical evaluations (Wu et al., 2020, Kruse et al., 2020).
- Scientific application and system parameter tuning: Random forest–driven BO frameworks for multi-level hybrid MPI/OpenMP codes achieve up to 91.6% runtime improvement and 37.8% EDP reduction at scale on up to 4,096 nodes (Wu et al., 2023).
- GPU kernel tuning: BO methods (with acquisition adaptation and mixed-type kernels) consistently outperform random, GA, and local search, achieving up to 49.7% mean deviation reduction versus GA, with robust handling of invalid/failed configurations (Willemsen et al., 2021, Schoonhoven et al., 2022).
- ML hyperparameter optimization: Hybrid frameworks combining LHS, GA, and pattern search reduce wall-clock model validation error with high efficiency on neural net and gradient-boosted tree tasks; parallel execution over clusters amortizes expensive training costs (Koch et al., 2018).
- Auto-tuning on edge devices: Bandit learning via UCB1 manages rapid online adaptation under tight computational budgets, converging to near-optimal configurations in 500–1,000 pulls even in high-dimensional spaces (Hossain et al., 2 Jan 2025).
- Transfer learning and workflow autotuning: VAE-augmented and meta-learning approaches achieve >40x speedup over cold random search and 66–95% reduction in the first three trials versus default configurations (Dorier et al., 2022, Li et al., 2023, Li et al., 2022, He et al., 12 Feb 2025).
7. Challenges, Limitations, and Future Directions
While search-based autotuning is a potent and widely applicable optimization paradigm, ongoing technical challenges include:
- Scalability of model-based methods: GP surrogates scale cubically in sample size; exhaustive acquisition ranking in large spaces is expensive, motivating use of tree-based surrogates, sparse GP approximations, or hybrid approaches (Willemsen et al., 2021, Wu et al., 2023).
- Constraint expressivity and solver overhead: Efficiently capturing complex or dynamic constraints remains essential as codes and hardware become more intricate; CSP-based construction is now the dominant approach for flexibility and scalability (Willemsen et al., 30 Sep 2025).
- Hyperparameter optimization of the tuners themselves: Autotuner algorithm selection and meta-strategy tuning can yield >200% improvement in campaign efficiency; integrating robust evaluation and simulation facilities is becoming standard (Willemsen et al., 30 Sep 2025).
- Automation and LLM involvement: Recent advances use prompt-driven LLMs to automatically generate and evolve optimization strategies, with best-in-class LLM-generated optimizers outperforming hand-designed methods by >70% in performance metrics (Willemsen et al., 19 Oct 2025).
- Multi-objective and hybrid goals: Simultaneous optimization for runtime, energy, cost, and latency (with user-defined or Pareto-front tradeoffs) is an active area, supported by scalarizing objectives or layered search with multi-objective acquisition functions (Wu et al., 2023, He et al., 12 Feb 2025).
- Generalization, transfer, and adaptation: Ensuring robust cross-task adaptation, safe transfer without negative performance impact, and resilience to nonstationary environments are focal points as autotuning scales (Li et al., 2022, Dorier et al., 2022).
In summary, search-based autotuning synthesizes empirical algorithmics, statistical modeling, and scalable system integration to deliver practical and theoretically justified optimization across an array of advanced computing systems, codes, and workflows, with continual innovation necessary to match the evolving complexity and scale of modern applications.