Runtime phylogenetic analysis enables extreme subsampling for test-based problems (2402.01610v1)
Abstract: A phylogeny describes the evolutionary history of an evolving population. Evolutionary search algorithms can perfectly track the ancestry of candidate solutions, illuminating a population's trajectory through the search space. However, phylogenetic analyses are typically limited to post-hoc studies of search performance. We introduce phylogeny-informed subsampling, a new class of subsampling methods that exploit runtime phylogenetic analyses for solving test-based problems. Specifically, we assess two phylogeny-informed subsampling methods -- individualized random subsampling and ancestor-based subsampling -- on three diagnostic problems and ten genetic programming (GP) problems from program synthesis benchmark suites. Overall, we found that phylogeny-informed subsampling methods enable problem-solving success at extreme subsampling levels where other subsampling methods fail. For example, phylogeny-informed subsampling methods more reliably solved program synthesis problems when evaluating just one training case per-individual, per-generation. However, at moderate subsampling levels, phylogeny-informed subsampling generally performed no better than random subsampling on GP problems. Our diagnostic experiments show that phylogeny-informed subsampling improves diversity maintenance relative to random subsampling, but its effects on a selection scheme's capacity to rapidly exploit fitness gradients varied by selection scheme. Continued refinements of phylogeny-informed subsampling techniques offer a promising new direction for scaling up evolutionary systems to handle problems with many expensive-to-evaluate fitness criteria.
- Sneha Aenugu and Lee Spector. 2019. Lexicase selection in learning classifier systems. In Proceedings of the Genetic and Evolutionary Computation Conference on - GECCO ’19. ACM Press, Prague, Czech Republic, 356–364. https://doi.org/10.1145/3321707.3321828
- MABE (Modular Agent Based Evolver): A framework for digital evolution research. In Proceedings of the 14th European Conference on Artificial Life ECAL 2017. MIT Press, Lyon, France, 76–83. https://doi.org/10.7551/ecal_a_016
- Informed Down-Sampled Lexicase Selection: Identifying productive training cases for efficient problem solving. http://arxiv.org/abs/2301.01488 arXiv:2301.01488 [cs].
- A Static Analysis of Informed Down-Samples. In Proceedings of the Companion Conference on Genetic and Evolutionary Computation. ACM, Lisbon Portugal, 531–534. https://doi.org/10.1145/3583133.3590751
- Improving Recommendation System Serendipity Through Lexicase Selection. http://arxiv.org/abs/2305.11044 arXiv:2305.11044 [cs].
- On the Trade-Off between Population Size and Number of Generations in GP for Program Synthesis. In Proceedings of the Companion Conference on Genetic and Evolutionary Computation. ACM, Lisbon Portugal, 535–538. https://doi.org/10.1145/3583133.3590681
- Fitness inheritance for noisy evolutionary multi-objective optimization. In Proceedings of the 7th annual conference on Genetic and evolutionary computation. ACM, Washington DC USA, 779–785. https://doi.org/10.1145/1068009.1068141
- Population diversity and inheritance in genetic programming for symbolic regression. Natural Computing (Jan. 2023). https://doi.org/10.1007/s11047-022-09934-x
- Fitness Inheritance in Multi-Objective Optimization. In Proceedings of the 4th Annual Conference on Genetic and Evolutionary Computation (GECCO’02). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 319–326. event-place: New York City, New York.
- Robert Curry and Malcolm Heywood. 2004. Towards Efficient Training on Large Datasets for Genetic Programming. In Advances in Artificial Intelligence, Takeo Kanade, Josef Kittler, Jon M. Kleinberg, Friedemann Mattern, John C. Mitchell, Oscar Nierstrasz, C. Pandu Rangan, Bernhard Steffen, Madhu Sudan, Demetri Terzopoulos, Dough Tygar, Moshe Y. Vardi, Gerhard Weikum, Ahmed Y. Tawfik, and Scott D. Goodwin (Eds.). Vol. 3060. Springer Berlin Heidelberg, Berlin, Heidelberg, 161–174. https://doi.org/10.1007/978-3-540-24840-8_12
- DEAP: a python framework for evolutionary algorithms. In Proceedings of the 14th annual conference companion on Genetic and evolutionary computation. ACM, Philadelphia Pennsylvania USA, 85–92. https://doi.org/10.1145/2330784.2330799
- Li Ding and Lee Spector. 2022. Optimizing Neural Networks with Gradient Lexicase Selection. In International Conference on Learning Representations. https://openreview.net/forum?id=J_2xNmVcY4
- Applying Ecological Principles to Genetic Programming. In Genetic Programming Theory and Practice XV, Wolfgang Banzhaf, Randal S. Olson, William Tozier, and Rick Riolo (Eds.). Springer International Publishing, 73–88.
- Interpreting the Tape of Life: Ancestry-Based Analyses Provide Insights and Intuition about Evolutionary Dynamics. Artificial Life 26, 1 (April 2020), 58–79. https://doi.org/10.1162/artl_a_00313
- Phylotrackpy: a python phylogeny tracker (software). https://doi.org/10.5281/ZENODO.7922091
- Analysis of Genetic Programming Ancestry Using a Graph Database.
- Fitness inheritance in multiple objective evolutionary algorithms: A test bench and real-world evaluation. Applied Soft Computing 8, 1 (Jan. 2008), 337–349. https://doi.org/10.1016/j.asoc.2007.02.003
- Characterizing the Effects of Random Subsampling on Lexicase Selection. In Genetic Programming Theory and Practice XVII, Wolfgang Banzhaf, Erik Goodman, Leigh Sheneman, Leonardo Trujillo, and Bill Worzel (Eds.). Springer International Publishing, 1–23. https://doi.org/10.1007/978-3-030-39958-0_1
- A study on fitness inheritance for enhanced efficiency in real-coded genetic algorithms. In 2012 IEEE Congress on Evolutionary Computation. IEEE, Brisbane, Australia, 1–8. https://doi.org/10.1109/CEC.2012.6256154
- Chris Gathercole and Peter Ross. 1994. Dynamic training subset selection for supervised learning in Genetic Programming. In Parallel Problem Solving from Nature — PPSN III, Gerhard Goos, Juris Hartmanis, Jan Leeuwen, Yuval Davidor, Hans-Paul Schwefel, and Reinhard Männer (Eds.). Vol. 866. Springer Berlin Heidelberg, Berlin, Heidelberg, 312–321. https://doi.org/10.1007/3-540-58484-6_275
- Down-Sampled Epsilon-Lexicase Selection for Real-World Symbolic Regression Problems. http://arxiv.org/abs/2302.04301 arXiv:2302.04301 [cs].
- APOGeT: Automated Phylogeny Over Geological Timescales. (2019). https://doi.org/10.13140/RG.2.2.33781.93921 Publisher: Unpublished.
- Thomas Helmuth and Amr Abdelhady. 2020. Benchmarking parent selection for program synthesis by genetic programming. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion. ACM, Cancún Mexico, 237–238. https://doi.org/10.1145/3377929.3389987
- Thomas Helmuth and Peter Kelly. 2021. PSB2: the second program synthesis benchmark suite. In Proceedings of the Genetic and Evolutionary Computation Conference. ACM, Lille France, 785–794. https://doi.org/10.1145/3449639.3459285
- Thomas Helmuth and Peter Kelly. 2022. Applying genetic programming to PSB2: the next generation program synthesis benchmark suite. Genetic Programming and Evolvable Machines (June 2022). https://doi.org/10.1007/s10710-022-09434-y
- Lexicase selection of specialists. In Proceedings of the Genetic and Evolutionary Computation Conference on - GECCO ’19. ACM Press, Prague, Czech Republic, 1030–1038. https://doi.org/10.1145/3321707.3321875
- Thomas Helmuth and Lee Spector. 2015. General Program Synthesis Benchmark Suite. In Proceedings of the 2015 on Genetic and Evolutionary Computation Conference - GECCO ’15. ACM Press, Madrid, Spain, 1039–1046. https://doi.org/10.1145/2739480.2754769
- Thomas Helmuth and Lee Spector. 2022. Problem-Solving Benefits of Down-Sampled Lexicase Selection. Artificial Life 27, 3 (2022), 183–203. https://doi.org/10.1162/artl_a_00341
- What Can Phylogenetic Metrics Tell us About Useful Diversity in Evolutionary Algorithms? In Genetic Programming Theory and Practice XVIII, Wolfgang Banzhaf, Leonardo Trujillo, Stephan Winkler, and Bill Worzel (Eds.). Springer Nature, 63–82. https://doi.org/10.1007/978-981-16-8113-4_4
- Random subsampling improves performance in lexicase selection. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (New York, NY, USA, 2019-07-13) (GECCO ’19). Association for Computing Machinery, 2028–2031. https://doi.org/10.1145/3319619.3326900
- An Exploration of Exploration: Measuring the Ability of Lexicase Selection to Find Obscure Pathways to Optimality. In Genetic Programming Theory and Practice XVIII, Wolfgang Banzhaf, Leonardo Trujillo, Stephan Winkler, and Bill Worzel (Eds.). Springer Nature Singapore, Singapore, 83–107. https://doi.org/10.1007/978-981-16-8113-4_5 Series Title: Genetic and Evolutionary Computation.
- A suite of diagnostic metrics for characterizing selection schemes. (2022). https://doi.org/10.48550/ARXIV.2204.13839 Publisher: arXiv Version Number: 2.
- Shouyong Jiang and Shengxiang Yang. 2017. Evolutionary Dynamic Multiobjective Optimization: Benchmarks and Algorithm Comparisons. IEEE Transactions on Cybernetics 47, 1 (Jan. 2017), 198–211. https://doi.org/10.1109/TCYB.2015.2510698
- Y. Jin. 2005. A comprehensive survey of fitness approximation in evolutionary computation. Soft Computing 9, 1 (Jan. 2005), 3–12. https://doi.org/10.1007/s00500-003-0328-5
- Yaochu Jin. 2011. Surrogate-assisted evolutionary computation: Recent advances and future challenges. Swarm and Evolutionary Computation 1, 2 (June 2011), 61–70. https://doi.org/10.1016/j.swevo.2011.05.001
- Epsilon-Lexicase Selection for Regression. In Proceedings of the Genetic and Evolutionary Computation Conference 2016. ACM, Denver Colorado USA, 741–748. https://doi.org/10.1145/2908812.2908898
- Supplemental material (GitHub repository). https://doi.org/10.5281/zenodo.10576330 https://lalejini.com/GECCO-2024-phylogeny-informed-subsampling/bookdown/book/.
- Artificial selection methods from evolutionary computing show promise for directed evolution of microbes. eLife 11 (Aug. 2022), e79665. https://doi.org/10.7554/eLife.79665
- Phylogeny-informed fitness estimation for test-based parent selection. In Genetic Programming Theory and Practice XX, Stephan Winkler, Leonardo Trujillo, Charles Ofria, and Ting Hu (Eds.). Springer International Publishing. arXiv:2306.03970
- Alexander Lalejini and Charles Ofria. 2018. Evolving event-driven programs with SignalGP. In Proceedings of the Genetic and Evolutionary Computation Conference on - GECCO ’18. ACM Press, Kyoto, Japan, 1135–1142. https://doi.org/10.1145/3205455.3205523
- Gene duplications drive the evolution of complex traits and regulation. In Proceedings of the 14th European Conference on Artificial Life ECAL 2017. MIT Press, Lyon, France, 257–264. https://doi.org/10.7551/ecal_a_045
- Alexander M Lalejini. 2024. (Dataset) Archived experiment data. https://doi.org/10.17605/OSF.IO/H3F52
- Replacement strategies to preserve useful diversity in steady-state genetic algorithms. Information Sciences 178, 23 (2008), 4421–4433. https://doi.org/10.1016/j.ins.2008.07.031 Including Special Section: Genetic and Evolutionary Computing.
- Faster Convergence with Lexicase Selection in Tree-Based Automated Machine Learning. In Genetic Programming, Gisele Pappa, Mario Giacobini, and Zdenek Vasicek (Eds.). Vol. 13986. Springer Nature Switzerland, Cham, 165–181. https://doi.org/10.1007/978-3-031-29573-7_11 Series Title: Lecture Notes in Computer Science.
- Visualizing genetic programming ancestries using graph databases. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. ACM, Berlin Germany, 245–246. https://doi.org/10.1145/3067695.3075617
- Using Graph Databases to Explore the Dynamics of Genetic Programming Runs. In Genetic Programming Theory and Practice XIII, Rick Riolo, W.P. Worzel, Mark Kotanchek, and Arthur Kordon (Eds.). Springer International Publishing, 185–201. https://doi.org/10.1007/978-3-319-34223-8_11
- Nicholas Freitag McPhee and Nicholas J. Hopper. 1999. Analysis of Genetic Diversity through Population History. In Proceedings of the 1st Annual Conference on Genetic and Evolutionary Computation - Volume 2 (GECCO’99). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1112–1120. event-place: Orlando, Florida.
- Lexicase Selection Beyond Genetic Programming. In Genetic Programming Theory and Practice XVI, Wolfgang Banzhaf, Lee Spector, and Leigh Sheneman (Eds.). Springer International Publishing, Cham, 123–136. https://doi.org/10.1007/978-3-030-04735-1_7 Series Title: Genetic and Evolutionary Computation.
- Jared M. Moore and Adam Stanton. 2017. Lexicase selection outperforms previous strategies for incremental evolution of virtual creature controllers. In Proceedings of the 14th European Conference on Artificial Life ECAL 2017. MIT Press, Lyon, France, 290–297. https://doi.org/10.7551/ecal_a_050
- Jared M. Moore and Adam Stanton. 2019. The Limits of Lexicase Selection in an Evolutionary Robotics Task. In The 2019 Conference on Artificial Life. MIT Press, Newcastle, United Kingdom, 551–558. https://doi.org/10.1162/isal_a_00220
- hstrat: a Python Package for phylogenetic inference ondistributed digital evolution populations. Journal of Open Source Software 7, 80 (Dec. 2022), 4866. https://doi.org/10.21105/joss.04866
- Empirical: C++ library for efficient, reliable, and accessible scientific software. https://doi.org/10.5281/ZENODO.4141943 [Computer Software].
- Where are we now? a large benchmark study of recent symbolic regression methods. In Proceedings of the Genetic and Evolutionary Computation Conference. ACM, Kyoto Japan, 1183–1190. https://doi.org/10.1145/3205455.3205539
- Speeding-Up Expensive Evaluations in High-Level Synthesis Using Solution Modeling and Fitness Inheritance. In Computational Intelligence in Expensive Optimization Problems, Lim Meng Hiot, Yew Soon Ong, Yoel Tenne, and Chi-Keong Goh (Eds.). Vol. 2. Springer Berlin Heidelberg, Berlin, Heidelberg, 701–723. https://doi.org/10.1007/978-3-642-10701-6_26 Series Title: Evolutionary Learning and Optimization.
- Fitness inheritance in evolutionary and multi-objective high-level synthesis. In 2007 IEEE Congress on Evolutionary Computation. IEEE, Singapore, 3459–3466. https://doi.org/10.1109/CEC.2007.4424920
- David M. Raup. 1992. Extinction: Bad Genes or Bad Luck? (1 ed.). W. W. Norton and Company.
- Hussin A. Rothan and Siddappa N. Byrareddy. 2020. The epidemiology and pathogenesis of coronavirus disease (COVID-19) outbreak. Journal of Autoimmunity 109 (May 2020), 102433. https://doi.org/10.1016/j.jaut.2020.102433
- Effects of the Training Set Size: A Comparison of Standard and Down-Sampled Lexicase Selection in Program Synthesis. In 2022 IEEE Congress on Evolutionary Computation (CEC). IEEE, Padua, Italy, 1–8. https://doi.org/10.1109/CEC55065.2022.9870337
- Untangling phylogenetic diversity’s role in evolutionary computation using a suite of diagnostic fitness landscapes. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (New York, NY, USA, 2022-07-19) (GECCO ’22). Association for Computing Machinery, 2322–2325. https://doi.org/10.1145/3520304.3534028
- Liang Shi and Khaled Rasheed. 2008. ASAGA: an adaptive surrogate-assisted genetic algorithm. In Proceedings of the 10th annual conference on Genetic and evolutionary computation. ACM, Atlanta GA USA, 1049–1056. https://doi.org/10.1145/1389095.1389289
- Lee Spector. 2012. Assessment of problem modality by differential performance of lexicase selection in genetic programming: a preliminary report. In Proceedings of the 14th annual conference companion on Genetic and evolutionary computation (2012). ACM, 401–408. http://dl.acm.org/citation.cfm?id=2330846
- Adam Stanton and Jared M. Moore. 2022. Lexicase Selection for Multi-Task Evolutionary Robotics. Artificial Life 28, 4, 479–498. https://doi.org/10.1162/artl_a_00374
- A guide to phylogenetic metrics for conservation, community ecology and macroecology: A guide to phylogenetic metrics for ecology. Biological Reviews 92, 2 (May 2017), 698–715. https://doi.org/10.1111/brv.12252
- Ting-Chen Wang and Chuan-Kang Ting. 2018. Fitness Inheritance Assisted MOEA/D-CMAES for Complex Multi-Objective Optimization Problems. In 2018 IEEE Congress on Evolutionary Computation (CEC). IEEE, Rio de Janeiro, 1–8. https://doi.org/10.1109/CEC.2018.8477898
- E. O. Wiley and Bruce S. Lieberman. 2011. Phylogenetics: Theory and Practice of Phylogenetic Systematics. John Wiley & Sons.