Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Training Artificial Neural Networks by Coordinate Search Algorithm (2402.12646v1)

Published 20 Feb 2024 in cs.LG and cs.AI

Abstract: Training Artificial Neural Networks poses a challenging and critical problem in machine learning. Despite the effectiveness of gradient-based learning methods, such as Stochastic Gradient Descent (SGD), in training neural networks, they do have several limitations. For instance, they require differentiable activation functions, and cannot optimize a model based on several independent non-differentiable loss functions simultaneously; for example, the F1-score, which is used during testing, can be used during training when a gradient-free optimization algorithm is utilized. Furthermore, the training in any DNN can be possible with a small size of the training dataset. To address these concerns, we propose an efficient version of the gradient-free Coordinate Search (CS) algorithm, an instance of General Pattern Search methods, for training neural networks. The proposed algorithm can be used with non-differentiable activation functions and tailored to multi-objective/multi-loss problems. Finding the optimal values for weights of ANNs is a large-scale optimization problem. Therefore instead of finding the optimal value for each variable, which is the common technique in classical CS, we accelerate optimization and convergence by bundling the weights. In fact, this strategy is a form of dimension reduction for optimization problems. Based on the experimental results, the proposed method, in some cases, outperforms the gradient-based approach, particularly, in situations with insufficient labeled training data. The performance plots demonstrate a high convergence rate, highlighting the capability of our suggested method to find a reasonable solution with fewer function calls. As of now, the only practical and efficient way of training ANNs with hundreds of thousands of weights is gradient-based algorithms such as SGD or Adam. In this paper we introduce an alternative method for training ANN.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. M. Kaveh and M. S. Mesgari, “Application of meta-heuristic algorithms for training neural networks and deep learning architectures: A comprehensive review,” Neural Processing Letters, pp. 1–104, 2022.
  2. A. Javanshir, T. T. Nguyen, M. P. Mahmud, and A. Z. Kouzani, “Training spiking neural networks with metaheuristic algorithms,” Applied Sciences, vol. 13, no. 8, p. 4809, 2023.
  3. F. H.-F. Leung, H.-K. Lam, S.-H. Ling, and P. K.-S. Tam, “Tuning of the structure and parameters of a neural network using an improved genetic algorithm,” IEEE Transactions on Neural networks, vol. 14, no. 1, pp. 79–88, 2003.
  4. M. Meissner, M. Schmuker, and G. Schneider, “Optimized particle swarm optimization (opso) and its application to artificial neural network training,” BMC bioinformatics, vol. 7, no. 1, pp. 1–11, 2006.
  5. M. Geethanjali, S. M. R. Slochanal, and R. Bhavani, “Pso trained ann-based differential protection scheme for power transformers,” Neurocomputing, vol. 71, no. 4-6, pp. 904–918, 2008.
  6. J. Yu, S. Wang, and L. Xi, “Evolving artificial neural networks using an improved pso and dpso,” Neurocomputing, vol. 71, no. 4-6, pp. 1054–1060, 2008.
  7. J. Zhou, Z. Duan, Y. Li, J. Deng, and D. Yu, “Pso-based neural network optimization and its utilization in a boring machine,” Journal of Materials Processing Technology, vol. 178, no. 1-3, pp. 19–23, 2006.
  8. M. Carvalho and T. B. Ludermir, “Particle swarm optimization of neural network architectures andweights,” in 7th International Conference on Hybrid Intelligent Systems (HIS 2007).   IEEE, 2007, pp. 336–339.
  9. E. A. Grimaldi, F. Grimaccia, M. Mussetta, and R. Zich, “Pso as an effective learning algorithm for neural network applications,” in Proceedings. ICCEA 2004. 2004 3rd International Conference on Computational Electromagnetics and Its Applications, 2004.   IEEE, 2004, pp. 557–560.
  10. J. Ilonen, J.-K. Kamarainen, and J. Lampinen, “Differential evolution training algorithm for feed-forward neural networks,” Neural Processing Letters, vol. 17, pp. 93–105, 2003.
  11. A. Jameson, “Gradient based optimization methods,” MAE Technical Report No, no. 2057, 1995.
  12. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  13. C. Bogani, M. Gasparo, and A. Papini, “Generalized pattern search methods for a class of nonsmooth optimization problems with structure,” Journal of Computational and Applied Mathematics, vol. 229, no. 1, pp. 283–293, 2009.
  14. E. Tzinis, “Bootstrapped coordinate search for multidimensional scaling,” arXiv preprint arXiv:1902.01482, 2019.
  15. A. A. Bidgoli and S. Rahnamayan, “Memetic differential evolution using coordinate descent,” in 2021 IEEE Congress on Evolutionary Computation (CEC).   IEEE, 2021, pp. 359–366.
  16. E. Frandi and A. Papini, “Coordinate search algorithms in multilevel optimization,” Optimization Methods and Software, vol. 29, no. 5, pp. 1020–1041, 2014.
  17. P. Tseng, “Convergence of a block coordinate descent method for nondifferentiable minimization,” Journal of optimization theory and applications, vol. 109, no. 3, pp. 475–494, 2001.
  18. F. Nikbakhtsarvestani, A. Asilian Bidgoli, M. Ebrahimi, and S. Rahnamayan, “Multi-objective coordinate search optimization,” pp. 1–7, 2023.
  19. S. Rahnamayan and G. G. Wang, “Toward effective initialization for large-scale search spaces,” Trans Syst, vol. 8, no. 3, pp. 355–367, 2009.
  20. S. Mahdavi, S. Rahnamayan, and K. Deb, “Center-based initialization of cooperative co-evolutionary algorithm for large-scale optimization,” in 2016 IEEE Congress on Evolutionary Computation (CEC).   IEEE, 2016, pp. 3557–3565.
  21. S. Rahnamayan and G. G. Wang, “Center-based initialization for large-scale black-box problems,” in Proceedings of the 8th WSEAS international conference on Artificial intelligence, knowledge engineering and data bases, 2009, pp. 531–541.
  22. D. Masters and C. Luschi, “Revisiting small batch training for deep neural networks,” arXiv preprint arXiv:1804.07612, 2018.
  23. Y. Lu and R. Yang, “Not all features are equal: Feature leveling deep neural networks for better interpretation,” arXiv preprint arXiv:1905.10009, 2019.
  24. L. Deng, “The mnist database of handwritten digit images for machine learning research,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 141–142, 2012.
  25. A. F. Agarap, “Deep learning using rectified linear units (relu),” arXiv preprint arXiv:1803.08375, 2018.
  26. E. Rokhsatyazdi, S. Rahnamayan, H. Amirinia, and S. Ahmed, “Optimizing lstm based network for forecasting stock market,” in 2020 IEEE congress on evolutionary computation (CEC).   IEEE, 2020, pp. 1–7.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets