- The paper introduces a graph-based approach leveraging message-passing GNNs to capture high-order relationships between algorithm modules and performance.
- It demonstrates significant MSE reductions—up to 36.6% for CMA-ES and 24% for modDE—over traditional ELA-based models on benchmark functions.
- The framework’s heterogeneous graph design enables inductive prediction and scalable, interpretable automation in algorithm selection and configuration.
Geometric Learning for Black-Box Optimization: A Graph Neural Network Approach to Algorithm Performance Prediction
This work introduces a graph-based geometric learning framework for predicting the performance of modular black-box optimization algorithms. It presents a shift from traditional tabular methods—most notably those leveraging Exploratory Landscape Analysis (ELA) features—by modeling the interdependencies between problems, algorithms, algorithmic modules, parameters, and performance as a heterogeneous graph. By employing graph neural networks (GNNs) on this structure, the framework aims to account for the high-order relationships inherently present in algorithm configuration and performance modeling.
Problem Formulation and Motivation
Contemporary approaches to performance modeling in numerical black-box optimization predominantly rely on representing each problem as a vector of features (e.g., ELA descriptors) and training a regression or classification model per algorithm or configuration. These approaches, while effective for capturing algorithm-performance landscapes, fundamentally disregard the structural relations between the various entities: algorithmic modules, parameterizations, and the problems themselves.
The modularization of key evolutionary algorithms (i.e., CMA-ES and Differential Evolution via modCMA-ES and modDE frameworks) provides a basis to paper not just whole algorithms, but the contributions of their constituent parts. Encoding this modular algorithm space, along with detailed problem characteristics and actual performance outcomes, in a relational graph enables models to learn richer, context-dependent representations, and potentially to generalize better across unseen configurations or problem instances.
Heterogeneous Graph Representation
The framework constructs a heterogeneous graph comprising:
- Six node types: parameter, parameter class, algorithm execution part, algorithm, performance, and black-box optimization problem.
- Five relation types: has-parameter, has-parameter-class, controls-algorithm-execution-part, has-algorithm, has-problem.
Each node is endowed with a feature vector (e.g., ELA features for problem nodes), and each edge encodes a semantic relation between types (e.g., which parameters belong to which modules). Reverse edges are added to enable bidirectional message passing, transforming the originally directed graph into an undirected one for learning effectiveness.
Each unique problem dimensionality, runtime budget, and algorithm type combination results in a distinct graph instance, better capturing the specificity required for robust performance prediction across a wide set of operating conditions.
GNN Architecture and Training
A message-passing GNN is constructed using the Deep Graph Library (DGL), based on GraphSAGE for inductive node representation learning. The architectural workflow is as follows:
- Layer Stacking: Multiple GNN layers permit information propagation over increasing neighborhood hops, integrating local and global dependency patterns.
- Relation-specific Aggregation: Within each layer, messages are aggregated for each relation type separately, respecting the heterogeneous nature of the graph.
- Inter-relation Aggregation: Aggregated relation-specific representations are fused (summed) per node to update embeddings.
- Prediction Head: Final performance node embeddings are passed through a linear regression head to predict numerical performance outcomes (e.g., function-optimization precision).
Model training uses an L1 loss, Adam optimizer, and nested cross-validation for hyperparameter tuning (dropout rates, embedding sizes). Problem nodes use 46 ELA features, while other nodes are initialized randomly. The protocol is strictly leave-instance-out, ensuring assessment of the model’s ability to generalize to unseen problem instances.
Experimental Results
Experiments utilize 324 modCMA-ES and 576 modDE algorithm variants, evaluated on the 24 BBOB benchmark functions at two problem dimensions (D=5 and D=30) and across six evaluation budgets. The model is compared against Random Forest (RF) regressors trained on ELA features as the tabular baseline.
Key numerical findings:
- The heterogeneous GNN consistently outperforms tabular RF models, with up to 36.6% reduction in MSE for CMA-ES at $30D$ for budget $1000D$.
- For modDE, the GNN achieves up to 24% lower MSE for $30D$/$100D$ settings.
- Gains are most prominent for higher-dimensional problems and mid-to-large budgets.
- The approach demonstrates clear utility in modeling the impact of algorithmic configuration, with results supporting the claim that explicit relational structure provides measurable predictive benefits over feature-only models.
Table: Representative MSE Comparisons
Budget |
CMA-ES 5D |
CMA-ES 30D |
DE 5D |
DE 30D |
|
GNN / RF |
GNN / RF |
GNN / RF |
GNN / RF |
$50D$ |
0.75 / 0.78 |
0.15 / 0.15 |
0.36 / 0.37 |
0.21 / 0.26 |
$100D$ |
1.16 / 1.22 |
0.19 / 0.27 |
0.39 / 0.43 |
0.19 / 0.25 |
$1000D$ |
4.38 / 5.22 |
1.09 / 1.72 |
2.08 / 1.95 |
0.49 / 0.54 |
Theoretical and Practical Implications
Theoretical implications:
- The demonstration that message-passing GNNs, when constructed over a semantically meaningful heterogeneous graph of algorithms, modules, parameters, and problems, capture relational inductive biases neglected by conventional approaches.
- The approach enables not only transductive but also inductive performance prediction—enabling the model to extrapolate to previously unseen algorithm configurations and problem instances.
- Provides a framework for future advances in explainability, as modular graph representations are amenable to attribution and interpretability methods (e.g., GNNExplainer).
Practical implications:
- Directly applicable to meta-learning tasks such as automated algorithm selection, algorithm configuration, and hyperparameter transfer via performance surrogates.
- Offers a scalable approach: new modules, parameterizations, or problems can be added with only local graph augmentation and model retraining.
- The general framework is extensible to any domain where algorithmic configurations and their interdependencies play a critical performance role (e.g., neural architecture search).
Future Directions
The authors propose several directions for enhancement:
- Architectural Innovation: Testing more expressive GNN variants (graph attention networks, graph transformers) to further improve predictive capabilities.
- Explainability: Deploying techniques such as GNNExplainer to identify which nodes, relations, and features most influence predictions, potentially guiding algorithm design.
- Knowledge Transfer: Exploring pretraining and fine-tuning opportunities to improve generalization.
- Extending Beyond Modular Frameworks: Adapting the approach to less-structured algorithms will require community standards for algorithmic module vocabularies and semantic relations.
The work represents a formal integration of geometric machine learning into optimization algorithm meta-learning, offering clear evidence that heterogeneous graph-based modeling is beneficial for performance surrogate modeling. This paradigm is adaptable and extensible, and is likely to influence the design of future algorithm selection and configuration systems in black-box optimization and related settings.