Recent Trends in the Use of Statistical Tests for Comparing Swarm and Evolutionary Computing Algorithms: Practical Guidelines and a Critical Review (2002.09227v1)

Published 21 Feb 2020 in cs.NE and stat.ME

Abstract: A key aspect of the design of evolutionary and swarm intelligence algorithms is studying their performance. Statistical comparisons are also a crucial part which allows for reliable conclusions to be drawn. In the present paper we gather and examine the approaches taken from different perspectives to summarise the assumptions made by these statistical tests, the conclusions reached and the steps followed to perform them correctly. In this paper, we conduct a survey on the current trends of the proposals of statistical analyses for the comparison of algorithms of computational intelligence and include a description of the statistical background of these tests. We illustrate the use of the most common tests in the context of the Competition on single-objective real parameter optimisation of the IEEE Congress on Evolutionary Computation (CEC) 2017 and describe the main advantages and drawbacks of the use of each kind of test and put forward some recommendations concerning their use.

Abstract PDF Chat (Pro)

Citations (374)

View on Semantic Scholar

Summary

The paper reviews and critically evaluates recent trends in statistical tests used for comparing swarm and evolutionary computing algorithms, including frequentist, non-parametric, and Bayesian methods.
It illustrates the application and limitations of these statistical tests using a case study with data from the CEC 2017 competition.
The research provides practical guidelines for researchers and advocates for a conscientious selection and integration of statistical methods to improve algorithm benchmarking and reproducibility.

A Critical Examination of Statistical Tests in Swarm and Evolutionary Algorithm Comparisons

The paper "Recent Trends in the Use of Statistical Tests for Comparing Swarm and Evolutionary Computing Algorithms: Practical Guidelines and a Critical Review" systematically evaluates the statistical methodologies used for benchmarking swarm and evolutionary algorithms. These algorithms have been crucial in solving complex optimization problems. However, their efficacy is determined not just by the innovative algorithms themselves but by robust statistical evaluation methods that validate performance differences in a reliable manner.

Statistical Testing Paradigms

Over the years, the statistical analysis of swarm and evolutionary algorithms has transitioned from conventional frequentist methods to more nuanced approaches, such as non-parametric and Bayesian techniques. The paper scrutinizes:

Frequentist Methods: These classical tests, such as t-tests and ANOVA, are foundational but assume normal distributions and homoscedasticity which may not hold across diverse optimization problem scenarios. Despite their simplicity and ease of interpretation, the rigid assumptions often constrain their applicability in evolutionary computation environments.
Non-Parametric Tests: As a response to the limitations of parametric tests, non-parametric approaches like the Wilcoxon signed-rank and Friedman tests gained prominence. These tests do not assume normality, making them suitable for the varied nature of algorithm performance data. However, they mainly address the ranking of performance rather than quantifying effect sizes.
Bayesian Approaches: The Bayesian framework provides flexibility, allowing for the incorporation of prior knowledge and a probabilistic interpretation of results, which mitigates some limitations of frequentist methods, such as the binary hypothesis decision framework and sensitivity to sample size. Bayesian methods, including the Bayesian Wilcoxon signed-rank test and Bayesian Friedman test, offer richer insights into performance differences.

Case Study: CEC 2017 Competition

The paper exemplifies these statistical methods utilizing the competition data from the 2017 IEEE Congress on Evolutionary Computation. This context illustrates the intricacies involved in selecting appropriate statistical tools for benchmarking algorithms. The study's statistical evaluations revealed that while EBOwithCMAR, jSO, and LSHADE variants consistently performed well, decomposing their performance through different statistical lenses highlighted varying levels of certainty regarding these results.

Implications and Future Directions

The research underlines the need for an integrated statistical toolbox that combines frequentist precision with Bayesian depth, ensuring reproducibility and appropriateness of analyses in varying experimental setups. While Bayesian approaches provide comprehensive insights, their complexity and computational demands pose challenges for widespread adoption.

Going forward, the evolution of statistical methodologies for algorithm benchmarking will likely entail more sophisticated models that account for multi-objective problems and dynamic benchmark challenges. The significance of effect size, reliability of interpretations, and transparency in presenting statistical outcomes are crucial aspects that researchers must emphasize to avoid misinterpretations of algorithm efficacy.

In summary, the review and guidelines proposed in this paper serve as an essential reference for computational intelligence researchers, advocating for a conscientious selection of statistical methods that align with the challenges presented by swarm and evolutionary algorithms. This alignment is vital for advancing algorithm development and ensuring that the technologies are evaluated in a manner that accurately reflects their potential and limitations in solving real-world problems.