Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Empirical Loss Landscape Analysis of Neural Network Activation Functions (2306.16090v1)

Published 28 Jun 2023 in cs.LG, cs.AI, and cs.NE

Abstract: Activation functions play a significant role in neural network design by enabling non-linearity. The choice of activation function was previously shown to influence the properties of the resulting loss landscape. Understanding the relationship between activation functions and loss landscape properties is important for neural architecture and training algorithm design. This study empirically investigates neural network loss landscapes associated with hyperbolic tangent, rectified linear unit, and exponential linear unit activation functions. Rectified linear unit is shown to yield the most convex loss landscape, and exponential linear unit is shown to yield the least flat loss landscape, and to exhibit superior generalisation performance. The presence of wide and narrow valleys in the loss landscape is established for all activation functions, and the narrow valleys are shown to correlate with saturated neurons and implicitly regularised network configurations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Understanding deep neural networks with rectified linear units. In Proceedings of the International Conference on Learning Representations. Vancouver, Canada, 1–17.
  2. Visualising basins of attraction for the cross-entropy and the squared error neural network loss functions. Neurocomputing 400 (2020), 113–136. https://doi.org/10.1016/j.neucom.2020.02.113
  3. Search space boundaries in neural network error landscape analysis. In Proceedings of the IEEE Symposium Series on Computational Intelligence. IEEE, Piscataway, USA, 1–8.
  4. Progressive gradient walk for neural network fitness landscape analysis. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. ACM, 1473–1480.
  5. Loss Surface Modality of Feed-Forward Neural Network Architectures. In 2020 International Joint Conference on Neural Networks (IJCNN). 1–8. https://doi.org/10.1109/IJCNN48605.2020.9206727
  6. On the flatness of loss surface for two-layered relu networks. In Asian Conference on Machine Learning. PMLR, 545–560.
  7. Fast and accurate deep network learning by exponential linear units (ELUs). In Proceedings of the International Conference on Learning Representations. 1–14.
  8. Ronald A Fisher. 1936. The use of multiple measurements in taxonomic problems. Annals of eugenics 7, 2 (1936), 179–188.
  9. Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 249–256.
  10. Deep sparse rectifier neural networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 315–323.
  11. Leonard G C Hamey. 1998. XOR has no local minima: A case study in neural network error surface analysis. Neural Networks 11, 4 (1998), 669–681.
  12. S. Hochreiter. 1998. The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 6, 02 (1998), 107–116.
  13. Multilayer feedforward networks are universal approximators. Neural Networks 2, 5 (1989), 359–366.
  14. Mirosław Kordos and Wlodzisław Duch. 2004. A survey of factors influencing MLP error surface. Control and Cybernetics 33, 4 (2004), 611–631.
  15. Thomas Laurent and James Brecht. 2018. The multilinear structure of ReLU networks. In International conference on machine learning. PMLR, 2908–2916.
  16. MNIST handwritten digit database. AT&T Labs (2010).
  17. Understanding the Loss Surface of Neural Networks for Binary Classification. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 80), Jennifer Dy and Andreas Krause (Eds.). PMLR, 2835–2843.
  18. Bo Liu. 2021. Understanding the loss landscape of one-hidden-layer ReLU networks. Knowledge-Based Systems 220 (2021), 106923.
  19. Katherine Mary Malan. 2014. Characterising continuous optimisation problems for particle swarm optimisation performance prediction. Ph. D. Dissertation. University of Pretoria.
  20. Katherine Mary Malan. 2021. A survey of advances in landscape analysis for optimisation. Algorithms 14, 2 (2021), 40.
  21. Katherine M Malan and Andries P Engelbrecht. 2009. Quantifying ruggedness of continuous landscapes using entropy. In IEEE Congress on Evolutionary Computation. IEEE, 1440–1447.
  22. Katherine M Malan and Andries P Engelbrecht. 2013. A survey of techniques for characterising fitness landscapes and some possible ways forward. Information Sciences 241 (2013), 148–163.
  23. Katherine M Malan and Andries P Engelbrecht. 2014. A progressive random walk algorithm for sampling continuous fitness landscapes. In Proceedings of the IEEE Congress on Evolutionary Computation. IEEE, 2507–2514.
  24. Loss surface of XOR artificial neural networks. Physical Review E 97, 5 (2018), 052307.
  25. Tristan Milne. 2019. Piecewise strong convexity of neural networks. Advances in Neural Information Processing Systems 32 (2019).
  26. Algorithm selection for black-box continuous optimization problems: A survey on methods and challenges. Information Sciences 317 (2015), 224–245.
  27. Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning. 807–814.
  28. Erik Pitzer and Michael Affenzeller. 2012. A comprehensive survey on fitness landscape analysis. In Recent Advances in Intelligent Engineering Systems. Springer, 161–191.
  29. Lutz Prechelt. 1994. Proben1 – A Set of Neural Network Benchmark Problems and Benchmarking Rules. Technical Report. Universität Karlsruhe, Karlsruhe, Germany.
  30. Anna Rakitianskaia and Andries Engelbrecht. 2015a. Measuring saturation in neural networks. In Proceedings of the IEEE Symposium Series on Computational Intelligence. IEEE, 1423–1430.
  31. Anna Rakitianskaia and Andries Engelbrecht. 2015b. Saturation in PSO Neural Network Training: Good or Evil?. In Proceedings of the IEEE Congress on Evolutionary Computation. IEEE, Sendai, Japan, 125–132.
  32. Blaine Rister and Daniel L. Rubin. 2017. Piecewise convexity of artificial neural networks. Neural Networks 94 (2017), 34 – 45. https://doi.org/10.1016/j.neunet.2017.06.009
  33. Learning internal representations by error propagation. Technical Report. California University, San Diego La Jolla Institute for Cognitive Science.
  34. Itay Safran and Ohad Shamir. 2018. Spurious local minima are common in two-layer relu neural networks. In International Conference on Machine Learning. PMLR, 4433–4441.
  35. Not measuring evolvability: Initial investigation of an evolutionary robotics search space. In Proceedings of the IEEE Congress on Evolutionary Computation, Vol. 1. IEEE, 9–16.
  36. Frank Spitzer. 2013. Principles of random walk. Vol. 34. Springer Science & Business Media.
  37. Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853 (2015).
Citations (3)

Summary

We haven't generated a summary for this paper yet.