Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Neural Acceleration of Incomplete Cholesky Preconditioners (2403.00743v1)

Published 1 Mar 2024 in cs.DC, cs.NA, and math.NA

Abstract: The solution of a sparse system of linear equations is ubiquitous in scientific applications. Iterative methods, such as the Preconditioned Conjugate Gradient method (PCG), are normally chosen over direct methods due to memory and computational complexity constraints. However, the efficiency of these methods depends on the preconditioner utilized. The development of the preconditioner normally requires some insight into the sparse linear system and the desired trade-off of generating the preconditioner and the reduction in the number of iterations. Incomplete factorization methods tend to be black box methods to generate these preconditioners but may fail for a number of reasons. These reasons include numerical issues that require searching for adequate scaling, shifting, and fill-in while utilizing a difficult to parallelize algorithm. With a move towards heterogeneous computing, many sparse applications find GPUs that are optimized for dense tensor applications like training neural networks being underutilized. In this work, we demonstrate that a simple artificial neural network trained either at compile time or in parallel to the running application on a GPU can provide an incomplete sparse Cholesky factorization that can be used as a preconditioner. This generated preconditioner is as good or better in terms of reduction of iterations than the one found using multiple preconditioning techniques such as scaling and shifting. Moreover, the generated method also works and never fails to produce a preconditioner that does not reduce the iteration count.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. M. R. Hestenes and E. Stiefel, “Methods of conjugate gradients for solving linear systems,” Journal of research of the National Bureau of Standards, vol. 49, pp. 409–436, 1952.
  2. J. D. Booth and G. Bolet, “An on-node scalable sparse incomplete LU factorization for a many-core iterative solver with javelin,” Parallel Comput., vol. 94-95, p. 102622, 2020.
  3. H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger, “Neural acceleration for general-purpose approximate programs,” Communications of the ACM, vol. 58, no. 1, pp. 105–115, 2014.
  4. A. Yazdanbakhsh, J. Park, H. Sharma, P. Lotfi-Kamran, and H. Esmaeilzadeh, “Neural acceleration for gpu throughput processors,” in 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2015, pp. 482–493.
  5. K. Kiningham, P. Levis, and C. Ré, “Grip: A graph neural network accelerator architecture,” IEEE Transactions on Computers, vol. 72, no. 4, pp. 914–925, 2023.
  6. Y. Saad and M. Schultz, “Gmres: A generalized minimal residual algorithm for solving nonsymmetric linear systems,” SIAM Journal on Scientific and Statistical Computing, vol. 7, no. 3, pp. 856–869, 1986.
  7. I. S. Duff and G. A. Meurant, “The effect of ordering on preconditioned conjugate gradients,” BIT Numerical Mathematics, vol. 29, pp. 635–657, 1989.
  8. T. A. Manteuffel, “Shifted incomplete cholesky factorization,” no. 1, 5 1978. [Online]. Available: https://www.osti.gov/biblio/6839576
  9. A. George and J. W. H. Liu, “An automatic nested dissection algorithm for irregular finite element problems,” SIAM Journal on Numerical Analysis, vol. 15, no. 5, pp. 1053–1069, 1978.
  10. A. Azad, M. Jacquelin, A. Buluç, and E. G. Ng, “The reverse cuthill-mckee algorithm in distributed-memory,” in 2017 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2017, Orlando, FL, USA, May 29 - June 2, 2017.   IEEE Computer Society, 2017, pp. 22–31.
  11. P. R. Amestoy, T. A. Davis, and I. S. Duff, “An approximate minimum degree ordering algorithm,” SIAM Journal on Matrix Analysis and Applications, vol. 17, no. 4, pp. 886–905, 1996.
  12. M. Benzi, D. B. Szyld, and A. van Duin, “Orderings for incomplete factorization preconditioning of nonsymmetric problems,” SIAM SISC, vol. 20, no. 5, pp. 1652–1670, 1999.
  13. E. G. Boman, D. Chen, B. Hendrickson, and S. Toledo, “Maximum-weight-basis preconditioners,” Numerical Linear Algebra with Applications, vol. 11, no. 8-9, pp. 695–721, 2004.
  14. D. Chen and S. Toledo, “Vaidya’s preconditioners: implementation and experimental study,” Electron. Trans. Numer. Anal., vol. 16, pp. 30–49, 2003.
  15. Y. Zhang, Z. Zhao, and Z. Feng, “Sf-grass: Solver-free graph spectral sparsification,” in 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD), 2020, pp. 1–8.
  16. J. J. Hopfield and D. W. Tank, “”neural” computation of decisions in optimization problems,” Biol. Cybern., vol. 52, no. 3, p. 141–152, jul 1985.
  17. J. Wang, “Recurrent neural networks for solving linear matrix equations,” Computers & Mathematics with Applications, vol. 26, no. 9, pp. 23–34, 1993. [Online]. Available: https://www.sciencedirect.com/science/article/pii/089812219390003E
  18. Y. Zhang, Z. Li, K. Chen, and B. Cai, “Common nature of learning exemplified by bp and hopfield neural networks for solving online a system of linear equations,” in 2008 IEEE International Conference on Networking, Sensing and Control, 2008, pp. 832–836.
  19. A. Cichocki and R. Unbehauen, “Neural networks for solving systems of linear equations and related problems,” IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, vol. 39, no. 2, pp. 124–138, 1992.
  20. M. GÃtz and H. Anzt, “Machine learning-aided numerical linear algebra: Convolutional neural networks for the efficient preconditioner generation,” in SuperComputing.   IEEE, 2018, pp. 49–56.
  21. J. D. Booth and G. Bolet, “Neural acceleration of graph based utility functions for sparse matrices,” IEEE Access, vol. 11, pp. 31 619–31 635, 2023. [Online]. Available: https://doi.org/10.1109/ACCESS.2023.3262453
  22. Y. Chen, T. A. Davis, W. W. Hager, and S. Rajamanickam, “Algorithm 887: Cholmod, supernodal sparse cholesky factorization and update/downdate,” ACM Trans. Math. Softw., vol. 35, no. 3, oct 2008.
  23. E. Chow and A. Patel, “Fine-grained parallel incomplete lu factorization,” SIAM Journal on Scientific Computing, vol. 37, no. 2, pp. C169–C193, 2015.
  24. J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization,” Journal of Machine Learning Research, vol. 12, no. 61, pp. 2121–2159, 2011. [Online]. Available: http://jmlr.org/papers/v12/duchi11a.html
  25. Y. LeCun and C. Cortes, “The MNIST database of handwritten digits,” 1998. [Online]. Available: http://yann.lecun.com/exdb/mnist/
  26. T. A. Davis, “Algorithm 915, suitesparseqr: Multifrontal multithreaded rank-revealing sparse qr factorization,” ACM Trans. Math. Softw., vol. 38, no. 1, dec 2011. [Online]. Available: https://doi.org/10.1145/2049662.2049670
  27. J. D. Booth, S. Rajamanickam, and H. Thornquist, “Basker: A threaded sparse LU factorization utilizing hierarchical parallelism and data layouts,” in 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPS Workshops 2016, Chicago, IL, USA, May 23-27, 2016.   IEEE Computer Society, 2016, pp. 673–682.
  28. T. Davis and Y. Hu, “The university of florida sparse matrix collection,” ACM Trans. Math. Softw., vol. 38, p. 1, 11 2011.
  29. H. Kabir, J. D. Booth, and P. Raghavan, “A multilevel compressed sparse row format for efficient sparse computations on multicore processors,” in 21st International Conference on High Performance Computing, HiPC 2014, Goa, India, December 17-20, 2014.   IEEE Computer Society, 2014, pp. 1–10.
  30. P. A. Lane and J. D. Booth, “Heterogeneous sparse matrix-vector multiplication via compressed sparse row format,” Parallel Comput., vol. 115, p. 102997, 2023. [Online]. Available: https://doi.org/10.1016/j.parco.2023.102997

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com