Spiky Optimization Functions

Updated 23 December 2025

Spiky optimization functions are characterized by abrupt, localized peaks or dips with steep gradients, resulting in a highly multimodal and non-smooth landscape.
They challenge popular surrogate models, like Gaussian Processes, by causing poor local fidelity and over-smoothed predictions that can mask true optima.
Alternative approaches such as decision-tree surrogates have shown lower optimization regret and near-linear scalability on these complex landscapes.

A spiky optimization function is characterized by a landscape exhibiting abrupt, localized changes—sharp peaks or dips—coupled with high multimodality and regions of steep gradient confined to narrow areas. Such functions fundamentally violate the smoothness assumptions underpinning most popular surrogate models and are widely used as benchmarks to stress-test and discriminate between global optimization methods, acquisition strategies, and surrogate modeling paradigms (Leenders et al., 16 Dec 2025, Jekel et al., 2019). Spikiness is not an intrinsic property of a specific analytic formula but rather a qualitative descriptor informed by the presence of sudden local deviations from an otherwise smooth or moderately wavy surface, often manifesting as numerous local optima or distinct regions of non-smoothness. These pathologies present challenges for both classical and probabilistic optimizers that rely on continuity or differentiability.

1. Mathematical Definition and Benchmark Examples

A spiky function exhibits sharp, localized changes and high multimodality, frequently breaking differentiability or stationary smoothness. In the work of Surjanović & Bingham (2013), as utilized in "Explainable Preference Learning: a Decision Tree–based Surrogate Model for Preferential Bayesian Optimization," eight canonical test functions are presented, each with explicit parametric or structural control over spikiness (Leenders et al., 16 Dec 2025). Key formulations include:

Rosenbrock (5D):

$f(x) = -\sum_{i=1}^{4} \Bigl[100\,(x_{i+1}-x_i^2)^2 + (x_i-1)^2\Bigr]$

The narrow curved valley, controlled by the coefficient 100, induces local search challenges.

Hartmann (6D):

$f(x) = -\sum_{i=1}^{4}\alpha_i\,\exp\Bigl(-\sum_{j=1}^{6}A_{ij}\,(x_j-P_{ij})^2\Bigr)$

The matrices $A_{ij}$ modulate spike width and spikiness.

Branin (2D):

$f(x) = -\Bigl[(x_2 - \frac{5.1}{4\pi^2}\,x_1^2 + \frac{5}{\pi}\,x_1 -6)^2 + 10(1-\frac{1}{8\pi})\cos x_1 +10\Bigr]$

The cosine term introduces periodic multimodality.

Michalewicz (5D):

$f(x) = -\sum_{i=1}^{5} \Bigl[\sin(x_i)\, (\sin(i\,x_i^2/\pi))^{2m}\Bigr]$

Large $m$ (e.g., $m=10$ ) produces extremely sharp valleys.

Other functions (Lévy, Schwefel, Holder Table, De Jong’s #5) are constructed to increase local oscillations, number of sharp wells, or high-order denominator terms, resulting in piecewise or highly oscillatory surfaces. Spiky variants may also be constructed by artificially introducing narrow Gaussian "bumps" at select points, as in the fortified Branin–Hoo function—yielding a new, localized minimum that dramatically increases optimization difficulty without changing global smoothness elsewhere (Jekel et al., 2019).

2. Impact on Surrogate Modeling: Gaussian Processes and Beyond

Spiky functions systematically violate the assumptions of stationarity and smoothness required for effective Gaussian Process (GP) modeling, particularly when standard RBF or Matérn kernels are used (Leenders et al., 16 Dec 2025). Key failure modes include:

Poor Local Fidelity: Smooth kernels are unable to capture rapid transitions, leading to signal averaging over sharp features and underestimation of local variability.
Over-smoothed Posteriors: Inability to resolve narrow spikes produces misleading posterior uncertainty estimates and hinders identification of true optima.
Slow Convergence: Optimization regret remains high because the surrogate model's predictive mean fails to reflect true optima amid sharp discontinuities.

Alternative models, such as skew-GPs (GPs with more flexible, non-stationarity–enabling kernels or hallucinated points), can partially mitigate these issues but still incur cubic computational overhead and struggle when faced with categorical data or high spike density.

3. Decision-Tree Surrogates for Spiky Functions

A decision-tree–based surrogate, as proposed in (Leenders et al., 16 Dec 2025), partitions the input space using a consistency score heuristic designed for preference pairs:

$S_c(k,t) = | n_R(k,t) - n_L(k,t) |$

where $n_R$ and $n_L$ count consistent winner–loser pairs split by feature $k$ at threshold $t$ . This approach discards ambiguous pairs, ensuring each tree leaf is homogeneous with respect to winner/loser status.

Each leaf receives a probabilistic latent utility $f_j$ with an independent Gaussian prior,

$p(f) = \mathcal N(0, \sigma_{\mathrm{prior}}^2 I_m),$

and Bayesian inference is performed via Laplace approximation on the posterior, using the standard Bradley–Terry–Luce likelihood for preference data. The sum-to-zero constraint ensures identifiability among leaf utilities.

Crucially, the tree enforces piecewise-constant modeling, admitting sharp jumps at split boundaries and requiring no global smoothness prior, thus exactly representing discontinuities that typical GPs smooth away.

4. Empirical Assessment and Algorithmic Performance

Comprehensive experiments on eight functions with increasing spikiness demonstrate that the decision-tree surrogate (DT–qEUBO) achieves significantly lower mean regret on spiky landscapes than GP-based methods. Representative results after $T=200$ queries (mean ± std) include:

Function	DT–qEUBO	GP–qEUBO	SkewGP–HB-EI
Michalewicz (5D)	0.05±0.02	0.06±0.03	0.05±0.03
Schwefel (5D)	0.12±0.04	0.25±0.07	0.30±0.10
Holder Table (2D)	0.08±0.03	0.20±0.06	0.28±0.09
De Jong #5 (2D)	0.20±0.05	0.35±0.10	0.40±0.12

Across these functions, the tree surrogate provides $2\times$ – $3\times$ lower regret, and achieves near-linear scaling in CPU time ( $\sim$ 30s for DT–qEUBO vs. $>1,000$ s for GP models), making it tractable for large datasets and mixed-type inputs (Leenders et al., 16 Dec 2025). For functions with low spikiness, GP and skewGP remain slightly superior, but the difference in regret is marginal.

5. Construction and Properties of Fortified (Spiky) Test Functions

Spikiness can be algorithmically induced in classical benchmarks by superposing narrow, high-amplitude Gaussian bumps at or near global optima. For instance, given Branin–Hoo:

$f_{\text{spike}}(x) = f(x) - \frac{10}{e}\, \exp\Bigl(-\frac{\|x - x_0\|^2}{2}\Bigr)$

where $x_0$ is a minimizer (Jekel et al., 2019). This transforms the optimization landscape from one with three symmetric minima to one dominated by a deeper, localized well, leading to substantially lower success rates for standard optimizers (e.g., differential evolution). Empirically, achieving near-certain recovery of the spiked global optimum can require an order of magnitude more function evaluations than the original. Multiple independent short runs can restore much of the lost efficiency, as the aggregated success probability is $1-(1-p_{\mathrm{single}})^m$ , suggesting that for highly multimodal, spiky surfaces, restart strategies are preferable to exhaustive single-run search.

6. Practical Guidelines and Methodological Implications

Tree-based surrogates are strongly recommended when sharp thresholds, discontinuities, or very narrow peaks are anticipated in the latent utility or objective function (Leenders et al., 16 Dec 2025).
For smooth, moderately multimodal functions or when a stationary structure is present, GP models with Matérn or RBF kernels remain preferred.
Categorical and mixed-type variable domains are naturally handled by tree partitioning, avoiding the need for custom composite GP kernels.
Initial space coverage with random sampling (approximately $10$–$20$ pairs) is essential prior to sequential acquisition maximizing expected utility (qEUBO, EI, or UCB).
In applications with heterogeneous user cohorts or contexts, hierarchical trees (user–item decomposition) may further improve both performance and interpretability.

7. Broader Context and Theoretical Considerations

The failure of GP and other smooth surrogate models on spiky functions signifies a fundamental limitation of stationary Bayesian nonparametrics when modeling non-smooth or multimodal targets. Piecewise-constant, interpretable surrogates (e.g., decision trees with probabilistic leaves) provide robust alternatives by sidestepping hand-crafted kernel design, offering both theoretical guarantees and empirical dominance in the spiky regime (Leenders et al., 16 Dec 2025). Stochastic global optimization on fortified test functions reveals the necessity of replication and ensemble strategies—a general insight for optimizing multi-modal or piecewise non-smooth objectives (Jekel et al., 2019). This suggests that, for spiky objective landscapes encountered in engineering, design-of-experiments, or preference elicitation, specialized algorithmic and modeling approaches are not optional, but required for sample-efficient and reliable global optimization.