Informed Down-Sampled Lexicase Selection: Identifying productive training cases for efficient problem solving (2301.01488v2)
Abstract: Genetic Programming (GP) often uses large training sets and requires all individuals to be evaluated on all training cases during selection. Random down-sampled lexicase selection evaluates individuals on only a random subset of the training cases allowing for more individuals to be explored with the same amount of program executions. However, creating a down-sample randomly might exclude important cases from the current down-sample for a number of generations, while cases that measure the same behavior (synonymous cases) may be overused despite their redundancy. In this work, we introduce Informed Down-Sampled Lexicase Selection. This method leverages population statistics to build down-samples that contain more distinct and therefore informative training cases. Through an empirical investigation across two different GP systems (PushGP and Grammar-Guided GP), we find that informed down-sampling significantly outperforms random down-sampling on a set of contemporary program synthesis benchmark problems. Through an analysis of the created down-samples, we find that important training cases are included in the down-sample consistently across independent evolutionary runs and systems. We hypothesize that this improvement can be attributed to the ability of Informed Down-Sampled Lexicase Selection to maintain more specialist individuals over the course of evolution, while also benefiting from reduced per-evaluation costs.
- Lexicase selection in learning classifier systems. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’19, page 356–364, New York, NY, USA. Association for Computing Machinery.
- Practical coreset constructions for machine learning. arXiv: Machine Learning.
- The Problem Solving Benefits of Down-sampling Vary by Selection Scheme. In Proceedings of the Companion Conference on Genetic and Evolutionary Computation, pages 527–530, Lisbon Portugal. ACM.
- ryanboldi/Informed-Down-Sampled-Lexicase: Informed Down-Sampling Experimentation Code (GitHub repository). https://github.com/ryanboldi/Informed-Down-Sampled-Lexicase. https://doi.org/10.5281/zenodo.8185133.
- The environmental discontinuity hypothesis for down-sampled lexicase selection.
- A Static Analysis of Informed Down-Samples. In Proceedings of the Companion Conference on Genetic and Evolutionary Computation, pages 531–534, Lisbon Portugal. ACM.
- Brindle, A. (1980). Genetic algorithms for function optimization. PhD thesis, University of Alberta.
- Online continual learning from imbalanced data. In III, H. D. and Singh, A., editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 1952–1961. PMLR.
- A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2):182–197.
- Lexicase selection at scale. In Genetic and Evolutionary Computation Conference Companion (GECCO ’22 Companion), July 9–13, 2022, Boston, MA, USA.
- Optimizing neural networks with gradient lexicase selection. In International Conference on Learning Representations.
- Ecological theory provides insights about evolutionary computation. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, GECCO ’18, page 105–106, New York, NY, USA. Association for Computing Machinery.
- Exploring position independent initialisation in grammatical evolution. In 2016 IEEE Congress on Evolutionary Computation (CEC), pages 5060–5067.
- Ponyge2: Grammatical evolution in python. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, pages 1194–1201.
- Characterizing the effects of random subsampling on lexicase selection. In Banzhaf, W., Goodman, E., Sheneman, L., Trujillo, L., and Worzel, B., editors, Genetic Programming Theory and Practice XVII, pages 1–23. Springer International Publishing, Cham.
- A grammar design pattern for arbitrary program synthesis problems in genetic programming. In European Conference on Genetic Programming, pages 262–277. Springer.
- Grammar design for derivation tree based genetic programming systems. In European Conference on Genetic Programming, pages 199–214. Springer.
- An ecology-based evolutionary algorithm to evolve solutions to complex problems. In Artificial Life 13, pages 171–177. MIT Press.
- Benchmarking parent selection for program synthesis by genetic programming. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion, pages 237–238, Cancún Mexico. ACM.
- PSB2: The second program synthesis benchmark suite. In 2021 Genetic and Evolutionary Computation Conference, GECCO ’21, Lille, France. ACM.
- Applying genetic programming to psb2: The next generation program synthesis benchmark suite. Genetic Programming and Evolvable Machines, 23(3):375–404.
- Population Diversity Leads to Short Running Times of Lexicase Selection. In Rudolph, G., Kononova, A. V., Aguirre, H., Kerschke, P., Ochoa, G., and Tušar, T., editors, Parallel Problem Solving from Nature – PPSN XVII, pages 485–498, Cham. Springer International Publishing.
- Effects of lexicase and tournament selection on diversity recovery and maintenance. In Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion, GECCO ’16 Companion, page 983–990, New York, NY, USA. Association for Computing Machinery.
- Program synthesis using uniform mutation by addition and deletion. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’18, page 1127–1134, New York, NY, USA. Association for Computing Machinery.
- On the importance of specialists for lexicase selection. Genetic Programming and Evolvable Machines, 21(3):349–373.
- General program synthesis benchmark suite. In GECCO ’15: Proceedings of the 2015 conference on Genetic and Evolutionary Computation Conference, pages 1039–1046, Madrid, Spain. ACM.
- Explaining and exploiting the advantages of down-sampled lexicase selection. In Artificial Life Conference Proceedings, pages 341–349. MIT Press.
- Problem-solving benefits of down-sampled lexicase selection. Artificial Life, pages 1–21.
- Solving uncompromising problems with lexicase selection. IEEE Transactions on Evolutionary Computation, 19(5):630–643.
- Random subsampling improves performance in lexicase selection. In GECCO ’19: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pages 2028–2031, Prague, Czech Republic. ACM.
- An Exploration of Exploration: Measuring the Ability of Lexicase Selection to Find Obscure Pathways to Optimality. In Banzhaf, W., Trujillo, L., Winkler, S., and Worzel, B., editors, Genetic Programming Theory and Practice XVIII, pages 83–107. Springer Nature Singapore, Singapore.
- A best possible heuristic for the k-center problem. Math. Oper. Res., 10:180–184.
- Holland, J. H. (1992). Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence. MIT Press, Cambridge, MA, USA.
- A niched Pareto genetic algorithm for multiobjective optimization. In Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence, pages 82–87, Orlando, FL, USA. IEEE.
- Introduction to Coresets: Accurate Coresets. arXiv:1910.08707 [cs, stat].
- Behavioral Program Synthesis: Insights and Prospects, pages 169–183. Springer International Publishing, Cham.
- Epsilon-lexicase selection for regression. In Proceedings of the Genetic and Evolutionary Computation Conference 2016, GECCO ’16, page 741–748, New York, NY, USA. Association for Computing Machinery.
- Artificial selection methods from evolutionary computing show promise for directed evolution of microbes. eLife, 11:e79665.
- Online batch selection for faster training of neural networks. ArXiv, abs/1511.06343.
- Lexicase selection beyond genetic programming. In Banzhaf, W., Spector, L., and Sheneman, L., editors, Genetic Programming Theory and Practice XVI, pages 123–136. Springer International Publishing, Cham.
- Lexicase selection outperforms previous strategies for incremental evolution of virtual creature controllers. In Knibbe, C., Beslon, G., Parsons, D. P., Misevic, D., Rouzaud-Cornabas, J., Bredèche, N., Hassas, S., 0001, O. S., and Soula, H., editors, Proceedings of the Fourteenth European Conference Artificial Life, ECAL 2017, Lyon, France, September 4-8, 2017, pages 290–297. MIT Press.
- Deep learning on a data diet: Finding important examples early in training. Advances in Neural Information Processing Systems, 34:20596–20607.
- Ruder, S. (2017). An overview of gradient descent optimization algorithms. arXiv:1609.04747 [cs].
- Grammatical evolution: Evolving programs for an arbitrary language. In European conference on genetic programming, pages 83–96. Springer.
- Co-evolution of fitness maximizers and fitness predictors. In Rothlauf, F., editor, Late breaking paper at Genetic and Evolutionary Computation Conference (GECCO’2005), Washington, D.C., USA.
- Coevolution of fitness predictors. IEEE Transactions on Evolutionary Computation, 12:736–749.
- Population diversity in an immune system model: Implications for genetic search. In WHITLEY, L. D., editor, Foundations of Genetic Algorithms, volume 2 of Foundations of Genetic Algorithms, pages 153–165. Elsevier.
- Choose your programming copilot: a comparison of the program synthesis performance of github copilot and genetic programming. In Proceedings of the Genetic and Evolutionary Computation Conference, pages 1019–1027.
- Challenges of program synthesis with grammatical evolution. In European Conference on Genetic Programming (Part of EvoStar), pages 211–227. Springer.
- Program synthesis with genetic programming: The influence of batch sizes. In Genetic Programming: 25th European Conference, EuroGP 2022, Held as Part of EvoStar 2022, Madrid, Spain, April 20–22, 2022, Proceedings, page 118–129, Berlin, Heidelberg. Springer-Verlag.
- A comprehensive survey on program synthesis with evolutionary algorithms. IEEE Transactions on Evolutionary Computation.
- Spector, L. (2012). Assessment of problem modality by differential performance of lexicase selection in genetic programming: A preliminary report. In Proceedings of the 14th Annual Conference Companion on Genetic and Evolutionary Computation, GECCO ’12, page 401–408, New York, NY, USA. Association for Computing Machinery.
- Push 3.0 programming language description. Technical Report HC-CSTR-2004-02, School of Cognitive Science, Hampshire College, USA.
- Genetic programming and autoconstructive evolution with the push programming language. Genetic Programming and Evolvable Machines, 3(1):7–40.
- Lexicase selection with weighted shuffle. In Banzhaf, W., Olson, R. S., Tozier, W., and Riolo, R., editors, Genetic Programming Theory and Practice XV, Genetic and Evolutionary Computation, pages 89–104, University of Michigan in Ann Arbor, USA. Springer.
- A survey of semantic methods in genetic programming. Genetic Programming and Evolvable Machines, 15(2):195–214.
- Whigham, P. A. et al. (1995). Grammatically-based genetic programming. In Proceedings of the workshop on genetic programming: from theory to real-world applications, volume 16, pages 33–41. Citeseer.
- Doing more with less: characterizing dataset downsampling for AutoML. Proceedings of the VLDB Endowment, 14(11):2059–2072.
- Coevolution in Cartesian Genetic Programming. In Moraglio, A., Silva, S., Krawiec, K., Machado, P., and Cotta, C., editors, Genetic Programming, Lecture Notes in Computer Science, pages 182–193, Berlin, Heidelberg. Springer.