Characterising harmful data sources when constructing multi-fidelity surrogate models (2403.08118v1)
Abstract: Surrogate modelling techniques have seen growing attention in recent years when applied to both modelling and optimisation of industrial design problems. These techniques are highly relevant when assessing the performance of a particular design carries a high cost, as the overall cost can be mitigated via the construction of a model to be queried in lieu of the available high-cost source. The construction of these models can sometimes employ other sources of information which are both cheaper and less accurate. The existence of these sources however poses the question of which sources should be used when constructing a model. Recent studies have attempted to characterise harmful data sources to guide practitioners in choosing when to ignore a certain source. These studies have done so in a synthetic setting, characterising sources using a large amount of data that is not available in practice. Some of these studies have also been shown to potentially suffer from bias in the benchmarks used in the analysis. In this study, we present a characterisation of harmful low-fidelity sources using only the limited data available to train a surrogate model. We employ recently developed benchmark filtering techniques to conduct a bias-free assessment, providing objectively varied benchmark suites of different sizes for future research. Analysing one of these benchmark suites with the technique known as Instance Space Analysis, we provide an intuitive visualisation of when a low-fidelity source should be used and use this analysis to provide guidelines that can be used in an applied industrial setting.
- Enhanced instance space analysis for the maximum flow problem. European Journal of Operational Research, 304(2):411–428, 2023.
- Nicolau Andrés-Thió. Bifidelity surrogate modelling benchmark problems, 2023a. available for download at https://github.com/nandresthio/bifiEBBbenchmarks.
- Nicolau Andrés-Thió. Bifidelity surrogate modelling methods, 2023b. available for download at https://github.com/nandresthio/bifiEBBmethods.
- Bifidelity surrogate modelling: Showcasing the need for new test instances. INFORMS Journal on Computing, 34(6):3007–3022, 2022.
- A multi-fidelity surrogate modeling method based on variance-weighted sum for the fusion of multiple non-hierarchical low-fidelity data. Structural and Multidisciplinary Optimization, 64:3797–3818, 2021.
- Multi-fidelity information fusion based on prediction of kriging. Structural and Multidisciplinary Optimization, 51(6):1267–1280, 2015.
- Data fusion with latent map gaussian processes. Journal of Mechanical Design, 144(9):091703, 2022.
- Multi-fidelity optimization via surrogate modelling. Proceedings of the royal society a: mathematical, physical and engineering sciences, 463(2088):3251–3269, 2007.
- Multi-fidelity cost-aware bayesian optimization. Computer Methods in Applied Mechanics and Engineering, 407:115937, 2023.
- Mathieu Lemyre Garneau. Modelling of a solar thermal power plant for benchmarking blackbox optimization solvers. Master’s thesis, Polytechnique Montréal, 2015. URL https://publications.polymtl.ca/1996/. Text available at https://publications.polymtl.ca/1996, code available at https://github.com/bbopt/solar.
- Hierarchical kriging model for variable-fidelity surrogate modeling. AIAA journal, 50(9):1885–1896, 2012.
- COCO: A platform for comparing continuous optimizers in a black-box setting. Optimization Methods and Software, 2020. doi: https://doi.org/10.1080/10556788.2020.1808977.
- Donald R Jones. A taxonomy of global optimization methods based on response surfaces. Journal of global optimization, 21(4):345–383, 2001.
- Predicting the output from a complex computer code when fast approximations are available. Biometrika, 87(1):1–13, 2000.
- Comprehensive feature-based landscape analysis of continuous and constrained optimization problems using the r-package flacco. In Nadja Bauer, Katja Ickstadt, Karsten Lübke, Gero Szepannek, Heike Trautmann, and Maurizio Vichi, editors, Applications in Statistical Computing – From Music Data Analysis to Industrial Quality Improvement, Studies in Classification, Data Analysis, and Knowledge Organization, pages 93 – 123. Springer, 2019. doi: 10.1007/978-3-030-25147-5_7.
- Adaptive active subspace-based efficient multifidelity materials design. Materials & Design, 209:110001, 2021.
- Daniel G Krige. A statistical approach to some basic mine valuation problems on the witwatersrand. Journal of the Southern African Institute of Mining and Metallurgy, 52(6):119–139, 1951.
- A multi-fidelity surrogate-model-assisted evolutionary algorithm for computationally expensive optimization problems. Journal of computational science, 12:28–37, 2016.
- Cope with diverse data structures in multi-fidelity modeling: a gaussian process method. Engineering Applications of Artificial Intelligence, 67:211–225, 2018.
- The dispersion metric and the cma evolution strategy. In Proceedings of the 8th annual conference on Genetic and evolutionary computation, pages 477–484, 2006.
- Multi-fidelity surrogate model based on canonical correlation analysis and least squares. Journal of Mechanical Design, 143(2), 2021.
- Provably convergent multifidelity optimization algorithm not requiring high-fidelity derivatives. AIAA journal, 50(5):1079–1089, 2012.
- Georges Matheron. Principles of geostatistics. Economic geology, 58(8):1246–1266, 1963.
- Exploratory landscape analysis. In Proceedings of the 13th annual conference on Genetic and evolutionary computation, pages 829–836, 2011.
- Exploratory landscape analysis of continuous space optimization problems using information content. IEEE transactions on evolutionary computation, 19(1):74–87, 2014.
- Instance space analysis: A toolkit for the assessment of algorithmic power, 2023. Source code is available at https://github.com/andremun/InstanceSpace.
- Remarks on multi-fidelity surrogates. Structural and Multidisciplinary Optimization, 55(3):1029–1050, 2017.
- Low-fidelity scale factor improves bayesian multi-fidelity prediction by reducing bumpiness of discrepancy function. Structural and Multidisciplinary Optimization, 58:399–414, 2018.
- Multi-fidelity nonlinear unsteady aerodynamic modeling and uncertainty estimation based on hierarchical kriging. Applied Mathematical Modelling, 2023.
- Mike Preuss. Improved topological niching for real-valued global optimization. In Applications of Evolutionary Computation: EvoApplications 2012: EvoCOMNET, EvoCOMPLEX, EvoFIN, EvoGAMES, EvoHOT, EvoIASP, EvoNUM, EvoPAR, EvoRISK, EvoSTIM, and EvoSTOC, Málaga, Spain, April 11-13, 2012, Proceedings, pages 386–395. Springer, 2012.
- A multifidelity gradient-free optimization method and application to aerodynamic design. In 12th AIAA/ISSMO multidisciplinary analysis and optimization conference, page 6020, 2008.
- John R Rice. The algorithm selection problem. In Advances in computers, volume 15, pages 65–118. Elsevier, 1976.
- A multi-fidelity surrogate model based on support vector regression. Structural and Multidisciplinary Optimization, pages 1–13, 2020.
- Novel approach for selecting low-fidelity scale factor in multifidelity metamodeling. AIAA Journal, 57(12):5320–5330, 2019.
- Instance space analysis for algorithm testing: Methodology and software tools. ACM Computing Surveys, 55(12):1–31, 2023.
- Melbourne algorithm test instance library with data analytics (matilda), 2023. Source code is available at https://matilda.unimelb.edu.au/matilda/.
- A radial basis function-based multi-fidelity surrogate model: exploring correlation between high-fidelity and low-fidelity models. Structural and Multidisciplinary Optimization, 60(3):965–981, 2019.
- S. Surjanovic and D. Bingham. Virtual library of simulation experiments: Test functions and datasets. Retrieved December 14, 2020, from http://www.sfu.ca/~ssurjano, 2020.
- Sequential design strategy for kriging and cokriging-based machine learning in the context of reservoir history-matching. Computational Geosciences, 26(5):1101–1118, 2022.
- David JJ Toal. Some considerations regarding the use of multi-fidelity kriging in the construction of surrogate models. Structural and Multidisciplinary Optimization, 51(6):1223–1245, 2015.
- David JJ Toal. Applications of multi-fidelity multi-output kriging to engineering design optimization. Structural and Multidisciplinary Optimization, 66(6):125, 2023.
- Frank Wilcoxon. Individual comparisons by ranking methods. In Breakthroughs in statistics, pages 196–202. Springer, 1992.
- No free lunch theorems for optimization. IEEE transactions on evolutionary computation, 1(1):67–82, 1997.
- An active learning multi-fidelity metamodeling method based on the bootstrap estimator. Aerospace Science and Technology, 106:106116, 2020.
- Sequential design and analysis of high-accuracy and low-accuracy computer codes. Technometrics, 55(1):37–46, 2013.
- A sequential multi-fidelity surrogate-based optimization methodology based on expected improvement reduction. Structural and Multidisciplinary Optimization, 65(5):153, 2022.
- A general multi-fidelity metamodeling framework for models with various output correlation. Structural and Multidisciplinary Optimization, 66(5):101, 2023.