Optimizing Neural Networks with Gradient Lexicase Selection
Abstract: One potential drawback of using aggregated performance measurement in machine learning is that models may learn to accept higher errors on some training cases as compromises for lower errors on others, with the lower errors actually being instances of overfitting. This can lead to both stagnation at local optima and poor generalization. Lexicase selection is an uncompromising method developed in evolutionary computation, which selects models on the basis of sequences of individual training case errors instead of using aggregated metrics such as loss and accuracy. In this paper, we investigate how lexicase selection, in its general form, can be integrated into the context of deep learning to enhance generalization. We propose Gradient Lexicase Selection, an optimization framework that combines gradient descent and lexicase selection in an evolutionary fashion. Our experimental results demonstrate that the proposed method improves the generalization performance of various widely-used deep neural network architectures across three image classification benchmarks. Additionally, qualitative analysis suggests that our method assists networks in learning more diverse representations. Our source code is available on GitHub: https://github.com/ld-ing/gradient-lexicase.
- Lexicase selection in learning classifier systems. In Proceedings of the Genetic and Evolutionary Computation Conference, pp. 356–364, 2019.
- Léon Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010, pp. 177–186. Springer, 2010.
- Evolutionary stochastic gradient descent for optimization of deep neural networks. Advances in Neural Information Processing Systems, 31, 2018.
- Li Ding and Lee Spector. Evolving neural selection with adaptive regularization. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 1717–1725, 2021.
- Neural architecture search: A survey. The Journal of Machine Learning Research, 20(1):1997–2017, 2019.
- Strength through diversity: Disaggregation and multi-objectivisation approaches for genetic programming. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, pp. 1031–1038, 2015.
- Neuroevolution: from architectures to learning. Evolutionary intelligence, 1(1):47–62, 2008.
- Using semantics in the selection mechanism in genetic programming: a simple method for promoting semantic diversity. In 2013 IEEE Congress on Evolutionary Computation, pp. 2972–2979. IEEE, 2013.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Psb2: the second program synthesis benchmark suite. In Proceedings of the Genetic and Evolutionary Computation Conference, pp. 785–794, 2021.
- General program synthesis benchmark suite. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, pp. 1039–1046, 2015.
- Solving uncompromising problems with lexicase selection. IEEE Transactions on Evolutionary Computation, 19(5):630–643, 2014.
- Lexicase selection for program synthesis: a diversity analysis. In Genetic Programming Theory and Practice XIII, pp. 151–167. Springer, 2016.
- Program synthesis using uniform mutation by addition and deletion. In Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1127–1134, 2018.
- John H Holland. Genetic algorithms. Scientific american, 267(1):66–73, 1992.
- Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141, 2018.
- Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708, 2017.
- Evolving multimodal robot behavior via many stepping stones with the combinatorial multi-objective evolutionary algorithm. arXiv preprint arXiv:1807.03392, 2018.
- Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pp. 448–456. PMLR, 2015.
- Population based training of neural networks. arXiv preprint arXiv:1711.09846, 2017.
- Automatic derivation of search objectives for test-based genetic programming. In European Conference on Genetic Programming, pp. 53–65. Springer, 2015.
- Learning multiple layers of features from tiny images. Tech. Report, 2009.
- Behavioral search drivers and the role of elitism in soft robotics. In ALIFE 2018: The 2018 Conference on Artificial Life, pp. 206–213. MIT Press, 2018.
- Genetic programming approaches to learning fair classifiers. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference, pp. 967–975, 2020a.
- Learning feature spaces for regression with genetic programming. Genetic Programming and Evolvable Machines, 21(3):433–467, 2020b.
- Epsilon-lexicase selection for regression. In Proceedings of the Genetic and Evolutionary Computation Conference 2016, pp. 741–748, 2016.
- A probabilistic and multi-objective analysis of lexicase selection and ε𝜀\varepsilonitalic_ε-lexicase selection. Evolutionary Computation, 27(3):377–402, 2019.
- Deep learning. nature, 521(7553):436–444, 2015.
- Comparison of semantic-aware selection methods in genetic programming. In Proceedings of the Companion Publication of the 2015 Annual Conference on Genetic and Evolutionary Computation, pp. 1301–1307, 2015.
- Hierarchical representations for efficient architecture search. arXiv preprint arXiv:1711.00436, 2017.
- An improved analysis of stochastic gradient descent with momentum. arXiv preprint arXiv:2007.07989, 2020.
- SGDR: stochastic gradient descent with warm restarts. In 5th International Conference on Learning Representations (ICLR), 2017.
- Adaptive gradient methods with dynamic bound of learning rate. In International Conference on Learning Representations, 2018.
- Lexicase selection beyond genetic programming. In Genetic Programming Theory and Practice XVI, pp. 123–136. Springer, 2019.
- Evolving deep neural networks. In Artificial intelligence in the age of neural networks and brain computing, pp. 293–312. Elsevier, 2019.
- Genetic algorithms, tournament selection, and the effects of noise. Complex systems, 9(3):193–212, 1995.
- Reading digits in natural images with unsupervised feature learning. NeurIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011.
- Genetically-trained deep neural networks. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 63–64, 2018.
- Large-scale evolution of image classifiers. In International Conference on Machine Learning, pp. 2902–2911. PMLR, 2017.
- Regularized evolution for image classifier architecture search. In Proceedings of the aaai conference on artificial intelligence, volume 33, pp. 4780–4789, 2019.
- Learning internal representations by error propagation. Technical report, California Univ San Diego La Jolla Inst for Cognitive Science, 1985.
- Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211–252, 2015.
- Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510–4520, 2018.
- Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015.
- Lee Spector. Assessment of problem modality by differential performance of lexicase selection in genetic programming: a preliminary report. In Proceedings of the 14th annual conference companion on Genetic and evolutionary computation, pp. 401–408, 2012.
- Evolving neural networks through augmenting topologies. Evolutionary computation, 10(2):99–127, 2002.
- Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv preprint arXiv:1712.06567, 2017.
- On the importance of initialization and momentum in deep learning. In International conference on machine learning, pp. 1139–1147. PMLR, 2013.
- Efficientnet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning, pp. 6105–6114. PMLR, 2019.
- The marginal value of adaptive gradient methods in machine learning. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 4151–4161, 2017.
- Group normalization. In Proceedings of the European conference on computer vision (ECCV), pp. 3–19, 2018.
- Genetic cnn. In Proceedings of the IEEE international conference on computer vision, pp. 1379–1388, 2017.
- Understanding deep learning requires rethinking generalization. In 5th International Conference on Learning Representations (ICLR), 2017.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.