Comparative study of regression vs pairwise models for surrogate-based heuristic optimisation

Published 4 Oct 2024 in cs.NE and cs.AI | (2410.03409v1)

Abstract: Heuristic optimisation algorithms explore the search space by sampling solutions, evaluating their fitness, and biasing the search in the direction of promising solutions. However, in many cases, this fitness function involves executing expensive computational calculations, drastically reducing the reasonable number of evaluations. In this context, surrogate models have emerged as an excellent alternative to alleviate these computational problems. This paper addresses the formulation of surrogate problems as both regression models that approximate fitness (surface surrogate models) and a novel way to connect classification models (pairwise surrogate models). The pairwise approach can be directly exploited by some algorithms, such as Differential Evolution, in which the fitness value is not actually needed to drive the search, and it is sufficient to know whether a solution is better than another one or not. Based on these modelling approaches, we have conducted a multidimensional analysis of surrogate models under different configurations: different machine learning algorithms (regularised regression, neural networks, decision trees, boosting methods, and random forests), different surrogate strategies (encouraging diversity or relaxing prediction thresholds), and compare them for both surface and pairwise surrogate models. The experimental part of the article includes the benchmark problems already proposed for the SOCO2011 competition in continuous optimisation and a simulation problem included in the recent GECCO2021 Industrial Challenge. This paper shows that the performance of the overall search, when using online machine learning-based surrogate models, depends not only on the accuracy of the predictive model but also on both the kind of bias towards positive or negative cases and how the optimisation uses those predictions to decide whether to execute the actual fitness function.

Abstract PDF HTML Upgrade to Chat

Citations (10)

View on Semantic Scholar

Summary

The paper compares regression (surface) vs pairwise (classifier) models for surrogate-based heuristic optimisation, analyzing various ML algorithms and strategies.
Decision Tree Classifiers (DT/C) as pairwise surrogates performed well, often outperforming traditional methods like Kriging and significantly reducing computational cost.
Key findings show that encouraging search space diversity is crucial and that standard ML accuracy metrics don't always correlate directly with optimisation performance.

The paper "Comparative study of regression vs pairwise models for surrogate-based heuristic optimisation" (2410.03409) presents a comprehensive analysis of using regression models (surface surrogates) versus pairwise models (classifier surrogates) within the context of surrogate-based heuristic optimisation. The study explores various machine-learning algorithms, surrogate strategies, and their impact on optimisation performance, using benchmark problems from the SOCO2011 competition and the GECCO2021 Industrial Challenge.

Regression Models (Surface Surrogates) in Detail

Regression models directly approximate the fitness function, predicting a continuous value representing the estimated fitness of a given solution. Algorithm 1 (from the paper) describes a baseline algorithm employing a regression-based surrogate to model the quality function's surface.

Strengths

Regression models offer a direct estimate of fitness, valuable for optimisation algorithms relying on absolute fitness values. They are well-established in surrogate-based optimisation.

Weaknesses

Accurately modeling complex or high-dimensional fitness landscapes may require substantial data. The accuracy of fitness prediction is crucial; overestimation or underestimation can lead to stagnation or inefficient exploration. Errors in approximating the fitness function can disproportionately affect overall algorithm performance.

Impact of ML Algorithm Choice

Regularised Regression (Ridge): Computationally efficient but may struggle with highly non-linear fitness functions. The paper implements L2 regularisation.
Neural Networks (MLP): Can model complex non-linear relationships but require careful tuning and are prone to overfitting, especially with limited data. The paper uses a rectified linear unit function and solves the minimisation problem using a stochastic gradient-based descender
Decision Trees: Non-parametric, capturing non-linearities, but prone to overfitting and instability. Feature selection uses variance reduction and L2 loss minimisation for splitting.
Boosting Methods (XGBoost): Strong predictive performance, robust to overfitting, and can handle high-dimensional data but are computationally expensive to train. The paper uses a parallel tree gradient booster ensemble.
Random Forests: Ensemble of decision trees, reduces overfitting and improves stability, offering a balance between accuracy and computational cost.

Surrogate Strategies

The paper investigates strategies like encouraging diversity and relaxing prediction thresholds.

Encouraging Diversity (Candidate Diversity Strategy): Promotes exploration by selecting solutions distant from previously evaluated solutions, preventing premature convergence but potentially introducing less promising solutions.
Relaxing Prediction Thresholds (Probability-based Strategy, Quality Distance Strategy): Increases the probability of evaluating solutions predicted by the surrogate as not improving the current best, aiding escape from local optima and refining the surrogate model.

When to Prefer Regression Models

Regression models are preferred when the optimisation algorithm relies on absolute fitness values or when a direct approximation of the fitness landscape is desired, and when the available computational budget allows for more complex machine learning models.

Pairwise Models (Classifier Surrogates) in Detail

Pairwise models reformulate the surrogate problem as a binary classification task, predicting whether one solution is better than another. Algorithm 3 (from the paper) provides the pairwise surrogate estimation for deciding whether to evaluate or discard a solution.

Strengths

These models can be directly exploited by optimisation algorithms like Differential Evolution (DE), where only the relative ranking of solutions is needed. They are potentially more efficient than regression models when the primary goal is to guide search direction and can train on $N \times N$ data points by creating all the pairwise combinations of those N samples. They may also take greater profit from a reduced number of samples.

Weaknesses

Pairwise models are unsuitable for all optimisation algorithms, specifically those that require absolute fitness values. Performance depends on the classifier's ability to accurately discriminate between better and worse solutions. The training dataset, when creating pairwise combinations, can scale rapidly, increasing the computational cost.

Impact of ML Algorithm Choice

Regularised Regression (Ridge): Can be used for classification with an acceptance threshold. Simple and efficient.
Neural Networks (MLP): Can learn complex decision boundaries for classification.
Decision Trees: Effective for classification, particularly when the relationship between solutions can be expressed through a set of rules.
Boosting Methods (XGBoost): Strong classification performance, robust to overfitting.
Random Forests: Ensemble of decision trees, improves classification accuracy and stability.

Surrogate Strategies

The strategies are the same as for regression models (encouraging diversity, relaxing thresholds), but their impact might differ due to the different nature of the surrogate model.

When to Prefer Pairwise Models

Pairwise models are preferred when the optimisation algorithm primarily relies on comparing solutions (e.g., DE, local search) and when computational resources are limited.

Key Differences and Considerations

Regression models address a regression problem, while pairwise models address a classification problem. Pairwise models can potentially leverage data more efficiently through pairwise comparisons but are more suitable for algorithms relying on solution comparisons. The computational cost depends on the chosen machine learning algorithm and the dataset size; pairwise models can suffer from a quickly scaling number of combinations.

Experimental Results and Conclusions

The Decision Tree Classifier (DT/C) stands out, performing well as a binary classifier in a pairwise surrogate model, often outperforming other models due to its conservative filtering behavior, allowing the optimisation algorithm to run for more generations within the same budget. The "Candidate Diversity Strategy" generally improves performance, suggesting that exploring diverse regions of the search space is crucial. Standard machine learning accuracy metrics don't always directly translate to better optimisation performance; a surrogate model's ability to effectively guide the search process is more important than simply having high predictive accuracy. The DT/C pairwise surrogate even outperforms traditional Kriging models and reduces computational resource usage by more than 50% while improving results.

In conclusion, the choice between regression and pairwise models depends on the specific optimisation algorithm, the characteristics of the fitness landscape, and the available computational resources. Pairwise models, particularly when combined with appropriate machine learning algorithms (like Decision Trees) and surrogate strategies (like encouraging diversity), can be a powerful and efficient approach for surrogate-based heuristic optimisation, but one should consider its scalability limitations.