Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fast and Efficient Local Search for Genetic Programming Based Loss Function Learning (2403.00865v1)

Published 1 Mar 2024 in cs.NE, cs.AI, cs.CV, and cs.LG

Abstract: In this paper, we develop upon the topic of loss function learning, an emergent meta-learning paradigm that aims to learn loss functions that significantly improve the performance of the models trained under them. Specifically, we propose a new meta-learning framework for task and model-agnostic loss function learning via a hybrid search approach. The framework first uses genetic programming to find a set of symbolic loss functions. Second, the set of learned loss functions is subsequently parameterized and optimized via unrolled differentiation. The versatility and performance of the proposed framework are empirically validated on a diverse set of supervised learning tasks. Results show that the learned loss functions bring improved convergence, sample efficiency, and inference performance on tabulated, computer vision, and natural language processing problems, using a variety of task-specific neural network architectures.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Low Data Drug Discovery with One-Shot Learning. ACS central science 3, 4 (2017), 283–293.
  2. Learning to learn by gradient descent by gradient descent. In Advances in neural information processing systems. 3981–3989.
  3. MetaReg: Towards Domain Generalization using Meta-Regularization. Advances in Neural Information Processing Systems 31 (2018), 998–1008.
  4. Meta-Learning via Learned Loss. In 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 4161–4168.
  5. Use of Genetic Programming for the Search of a New Learning Rule for Neural Networks. In Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence. IEEE, 324–327.
  6. Gradient-based bi-level optimization for deep learning: A survey. arXiv preprint arXiv:2207.11719 (2022).
  7. Generalisation and Domain Adaptation in GP with Gradient Descent for Symbolic Regression. In 2015 IEEE congress on evolutionary computation (CEC). IEEE, 1137–1144.
  8. Evolving Reinforcement Learning Algorithms. arXiv preprint arXiv:2101.03958 (2021).
  9. Loss meta-learning for forecasting. https://openreview.net/forum?id=rczz7TUKIIB
  10. Stein Unbiased GrAdient estimator of the Risk (SUGAR) for multiple parameter selection. SIAM Journal on Imaging Sciences 7, 4 (2014), 2448–2487.
  11. Learning to learn by jointly optimizing neural architecture and weights. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 129–138.
  12. Justin Domke. 2012. Generic methods for optimization-based modeling. In Artificial Intelligence and Statistics. PMLR, 318–326.
  13. Meta-learning of neural architectures for few-shot learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12365–12375.
  14. Chelsea Finn. 2018. Learning to learn with gradients. Ph. D. Dissertation. UC Berkeley.
  15. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In International Conference on Machine Learning. PMLR, 1126–1135.
  16. DEAP: Evolutionary Algorithms Made Easy. The Journal of Machine Learning Research 13, 1 (2012), 2171–2175.
  17. Forward and reverse gradient-based hyperparameter optimization. In International Conference on Machine Learning. PMLR, 1165–1173.
  18. Bilevel programming for hyperparameter optimization and meta-learning. In International Conference on Machine Learning. PMLR, 1568–1577.
  19. Searching for Robustness: Loss Learning for Noisy Classification Tasks. arXiv preprint arXiv:2103.00243 (2021).
  20. Loss function learning for domain generalization by implicit gradient. In International Conference on Machine Learning. PMLR, 7002–7016.
  21. Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 249–256.
  22. Santiago Gonzalez and Risto Miikkulainen. 2020. Improved Training Speed, Accuracy, and Data Utilization Through Loss Function Optimization. In 2020 IEEE Congress on Evolutionary Computation (CEC). IEEE, 1–8.
  23. Santiago Gonzalez and Risto Miikkulainen. 2021. Optimizing Loss Functions through Multi-Variate Taylor Polynomial Parameterization. In Proceedings of the Genetic and Evolutionary Computation Conference. 305–313.
  24. Learning Surrogate Losses. arXiv preprint arXiv:1905.10108 (2019).
  25. Generalized Inner Loop Meta-Learning. arXiv preprint arXiv:1910.01727 (2019).
  26. Meta-learning in neural networks: A survey. arXiv preprint arXiv:2004.05439 (2020).
  27. Evolved Policy Gradients. In Advances in Neural Information Processing Systems, Vol. 31. 5405–5414. https://proceedings.neurips.cc/paper/2018/file/7876acb66640bad41f1e1371ef30c180-Paper.pdf
  28. Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment. In International Conference on Machine Learning. PMLR, 2891–2900.
  29. AI Benchmark: All About Deep Learning on Smartphones in 2019. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE, 3617–3635.
  30. Auto-meta: Automated gradient based meta learner search. arXiv preprint arXiv:1806.06927 (2018).
  31. John R Koza. 1992. Genetic programming: on the programming of computers by means of natural selection. Vol. 1. MIT press.
  32. Alex Krizhevsky and Geoffrey Hinton. 2009. Learning Multiple Layers of Features from Tiny Images. (2009).
  33. Gradient-Based Learning Applied to Document Recognition. Proc. IEEE 86, 11 (1998), 2278–2324.
  34. AutoLoss-Zero: Searching Loss Functions from Scratch for Generic Tasks. arXiv preprint arXiv:2103.14026 (2021).
  35. Feature-critic networks for heterogeneous domain generalization. In International Conference on Machine Learning. PMLR, 3915–3924.
  36. Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search. arXiv preprint arXiv:2102.04700 (2021).
  37. Optimizing millions of hyperparameters by implicit differentiation. In International Conference on Artificial Intelligence and Statistics. PMLR, 1540–1552.
  38. Gradient-based hyperparameter optimization through reversible learning. In International conference on machine learning. PMLR, 2113–2122.
  39. The Use of an Analytic Quotient Operator in Genetic Programming. IEEE Transactions on Evolutionary Computation 17, 1 (2012), 146–152.
  40. On First-Order Meta-Learning Algorithms. arXiv preprint arXiv:1803.02999 (2018).
  41. Pytorch: An Imperative Style, High-Performance Deep Learning Library. Advances in neural information processing systems 32 (2019), 8026–8037.
  42. Huimin Peng. 2020. A Comprehensive Overview and Survey of Recent Advances in Meta-Learning. arXiv preprint arXiv:2004.11149 (2020).
  43. Meta-learning with implicit gradients. (2019).
  44. Searching for Activation Functions. arXiv preprint arXiv:1710.05941 (2017).
  45. Delip Rao and Brian McMahan. 2019. Natural Language Processing with PyTorch: Build Intelligent Language Applications using Deep Learning. ” O’Reilly Media, Inc.”.
  46. Online Loss Function Learning. arXiv e-prints, Article arXiv:2301.13247 (Jan. 2023), arXiv:2301.13247 pages. https://doi.org/10.48550/arXiv.2301.13247 arXiv:2301.13247 [cs.LG]
  47. AutoML-Zero: Evolving Machine Learning Algorithms From Scratch. In International Conference on Machine Learning. PMLR, 8007–8019.
  48. Learning representations by back-propagating errors. nature 323, 6088 (1986), 533–536.
  49. Jürgen Schmidhuber. 1987. Evolutionary Principles in Self-Referential Learning. Ph. D. Dissertation. Technische Universität München.
  50. Jürgen Schmidhuber. 1992. Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks. Neural Computation 4, 1 (1992), 131–139.
  51. The Curse of Unrolling: Rate of Differentiating Through Optimization. arXiv preprint arXiv:2209.13271 (2022).
  52. Truncated back-propagation for bilevel optimization. In The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 1723–1732.
  53. Will Smart and Mengjie Zhang. 2004. Applying Online Gradient Descent Search to Genetic Programming for Object Recognition. In Proceedings of the second workshop on Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation-Volume 32. 133–138.
  54. Designing neural networks through neuroevolution. Nature Machine Intelligence 1, 1 (2019), 24–35.
  55. Alexander Topchy and William F Punch. 2001. Faster Genetic Programming based on Local Gradient Search of Numeric Leaf Values. In Proceedings of the genetic and evolutionary computation conference (GECCO-2001), Vol. 155162. Morgan Kaufmann.
  56. Joaquin Vanschoren. 2018. Meta-learning: A survey. arXiv preprint arXiv:1810.03548 (2018).
  57. Ricardo Vilalta and Youssef Drissi. 2002. A perspective view and survey of meta-learning. Artificial intelligence review 18, 2 (2002), 77–95.
  58. A comprehensive survey of loss functions in machine learning. Annals of Data Science 9, 2 (2022), 187–212.
  59. Robert Edwin Wengert. 1964. A simple automatic derivative evaluation program. Commun. ACM 7, 8 (1964), 463–464.
  60. Mengjie Zhang and Will Smart. 2005. Learning Weights in Genetic Programs Using Gradient Descent for Object Recognition. In Workshops on Applications of Evolutionary Computation. Springer, 417–427.
Citations (1)

Summary

We haven't generated a summary for this paper yet.