Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Neural Loss Function Evolution for Large-Scale Image Classifier Convolutional Neural Networks (2403.08793v1)

Published 30 Jan 2024 in cs.CV and cs.LG

Abstract: For classification, neural networks typically learn by minimizing cross-entropy, but are evaluated and compared using accuracy. This disparity suggests neural loss function search (NLFS), the search for a drop-in replacement loss function of cross-entropy for neural networks. We apply NLFS to image classifier convolutional neural networks. We propose a new search space for NLFS that encourages more diverse loss functions to be explored, and a surrogate function that accurately transfers to large-scale convolutional neural networks. We search the space using regularized evolution, a mutation-only aging genetic algorithm. After evolution and a proposed loss function elimination protocol, we transferred the final loss functions across multiple architectures, datasets, and image augmentation techniques to assess generalization. In the end, we discovered three new loss functions, called NeuroLoss1, NeuroLoss2, and NeuroLoss3 that were able to outperform cross-entropy in terms of a higher mean test accuracy as a simple drop-in replacement loss function across the majority of experiments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL https://www.tensorflow.org/. Software available from tensorflow.org.
  2. Anonymous. Neural loss function evolution for large scale image classifier convolutional neural networks. https://github.com.invalid/Anon/NeuralLossFunctionEvolution, 2023.
  3. Neural optimizer search with reinforcement learning. In Precup, D. and Teh, Y. W. (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp.  459–468. PMLR, 06–11 Aug 2017. URL https://proceedings.mlr.press/v70/bello17a.html.
  4. Evolutionary optimization of deep learning activation functions. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference, GECCO ’20, pp.  289–296, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450371285. doi: 10.1145/3377930.3389841. URL https://doi.org/10.1145/3377930.3389841.
  5. Meta learning via learned loss. In International Conference on Pattern Recognition (ICPR), pp.  4161–4168, 2019.
  6. Analysis of DAWNBench, a time-to-accuracy machine learning performance benchmark. SIGOPS Oper. Syst. Rev., 53(1):14–25, July 2019. ISSN 0163-5980. doi: 10.1145/3352020.3352024. URL https://doi.org/10.1145/3352020.3352024.
  7. Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp.  702–703, 2020.
  8. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp.  248–255, 2009. doi: 10.1109/CVPR.2009.5206848.
  9. Neural architecture search: A survey. The Journal of Machine Learning Research, 20(1):1997–2017, 2019.
  10. Searching for robustness: Loss learning for noisy classification tasks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.  6670–6679, October 2021.
  11. Loss function learning for domain generalization by implicit gradient. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., and Sabato, S. (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp.  7002–7016. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/gao22b.html.
  12. Improved training speed, accuracy, and data utilization through loss function optimization. In 2020 IEEE Congress on Evolutionary Computation (CEC), pp.  1–8, 2020. doi: 10.1109/CEC48606.2020.9185777.
  13. Optimizing loss functions through multi-variate Taylor polynomial parameterization. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’21, pp.  305–313, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450383509. doi: 10.1145/3449639.3459277. URL https://doi.org/10.1145/3449639.3459277.
  14. AutoLoss-GMS: Searching generalized margin-based softmax loss function for person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  4744–4753, June 2022.
  15. Identity mappings in deep residual networks. In European Conference on Computer Vision, pp.  630–645. Springer, 2016.
  16. AutoML: A survey of the state-of-the-art. Knowledge-Based Systems, 212:106622, 2021.
  17. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  7132–7141, 2018.
  18. Kendall, M. G. A new measure of rank correlation. Biometrika, 30(1/2):81–93, 1938. ISSN 00063444. URL http://www.jstor.org/stable/2332226.
  19. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  20. Learning multiple layers of features from tiny images. Technical report, University of Toronto, Toronto, Ontario, 2009.
  21. Polyloss: A polynomial expansion perspective of classification loss functions. arXiv preprint, 2022. doi: 10.48550/ARXIV.2204.12511. URL https://arxiv.org/abs/2204.12511.
  22. Autoloss-zero: Searching loss functions from scratch for generic tasks. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  999–1008, 2022. doi: 10.1109/CVPR52688.2022.00108.
  23. Li, W. cifar-10-cnn: Play deep learning with CIFAR datasets. https://github.com/BIGBALLON/cifar-10-cnn, 2017.
  24. Li, W. Keras EfficientNetV2. https://github.com/leondgarse/keras_efficientnet_v2, 2022.
  25. Evolving normalization-activation layers. Advances in Neural Information Processing Systems, 33:13539–13550, 2020.
  26. Loss function discovery for object detection via convergence-simulation driven search. arXiv preprint arXiv:2102.04700, 2021.
  27. A comparison of bloat control methods for genetic programming. Evolutionary Computation, 14(3):309–344, 2006.
  28. Unsupervised elimination of redundant features using genetic programming. In Nicholson, A. and Li, X. (eds.), AI 2009: Advances in Artificial Intelligence, pp.  432–442, Berlin, Heidelberg, 2009. Springer Berlin Heidelberg. ISBN 978-3-642-10439-8.
  29. Learning symbolic model-agnostic loss functions via meta-learning. arXiv preprint arXiv:2209.08907, 2022.
  30. Regularized evolution for image classifier architecture search. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pp.  4780–4789, 2019.
  31. Evolving neural networks through augmenting topologies. Evolutionary Computation, 10(2):99–127, 2002.
  32. Efficientnetv2: Smaller models and faster training. In International Conference on Machine Learning, pp.  10096–10106. PMLR, 2021.
  33. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  6023–6032, 2019.
  34. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.
  35. Neural architecture search with reinforcement learning. arXiv preprint, abs/1611.01578, 2016. doi: 10.48550/ARXIV.1611.01578. URL https://arxiv.org/abs/1611.01578.
  36. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
Citations (2)

Summary

We haven't generated a summary for this paper yet.