An O(N) distributed-memory parallel direct solver for planar integral equations (2310.15458v2)
Abstract: Boundary value problems involving elliptic PDEs such as the Laplace and the Helmholtz equations are ubiquitous in mathematical physics and engineering. Many such problems can be alternatively formulated as integral equations that are mathematically more tractable. However, an integral-equation formulation poses a significant computational challenge: solving large dense linear systems that arise upon discretization. In cases where iterative methods converge rapidly, existing methods that draw on fast summation schemes such as the Fast Multipole Method are highly efficient and well-established. More recently, linear complexity direct solvers that sidestep convergence issues by directly computing an invertible factorization have been developed. However, storage and computation costs are high, which limits their ability to solve large-scale problems in practice. In this work, we introduce a distributed-memory parallel algorithm based on an existing direct solver named ``strong recursive skeletonization factorization.'' Specifically, we apply low-rank compression to certain off-diagonal matrix blocks in a way that minimizes computation and data movement. Compared to iterative algorithms, our method is particularly suitable for problems involving ill-conditioned matrices or multiple right-hand sides. Large-scale numerical experiments are presented to show the performance of our Julia implementation.
- V. Minden, K. L. Ho, A. Damle, and L. Ying, “A recursive skeletonization factorization based on strong admissibility,” Multiscale Modeling & Simulation, vol. 15, no. 2, pp. 768–796, 2017.
- L. Greengard and V. Rokhlin, “A fast algorithm for particle simulations,” Journal of computational physics, vol. 73, no. 2, pp. 325–348, 1987.
- ——, “A new version of the fast multipole method for the Laplace equation in three dimensions.” YALE UNIV NEW HAVEN CT DEPT OF COMPUTER SCIENCE, Tech. Rep., 1996.
- S. Ambikasaran and E. Darve, “The inverse fast multipole method,” arXiv preprint arXiv:1407.1572, 2014.
- E. Corona, P.-G. Martinsson, and D. Zorin, “An O(N) direct solver for integral equations on the plane,” Applied and Computational Harmonic Analysis, vol. 38, no. 2, pp. 284–317, 2015.
- K. L. Ho and L. Ying, “Hierarchical interpolative factorization for elliptic operators: integral equations,” Comm. Pure Appl. Math, vol. 69, no. 7, pp. 1314–1353, 2016.
- P. Coulier, H. Pouransari, and E. Darve, “The inverse fast multipole method: using a fast approximate direct solver as a preconditioner for dense linear systems,” SIAM Journal on Scientific Computing, vol. 39, no. 3, pp. A761–A796, 2017.
- J. Bezanson, S. Karpinski, V. B. Shah, and A. Edelman, “Julia: A fast dynamic language for technical computing,” arXiv preprint arXiv:1209.5145, 2012.
- J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah, “Julia: A fresh approach to numerical computing,” SIAM review, vol. 59, no. 1, pp. 65–98, 2017.
- H. Pouransari, P. Coulier, and E. Darve, “Fast hierarchical solvers for sparse matrices using extended sparsification and low-rank approximation,” SIAM Journal on Scientific Computing, vol. 39, no. 3, pp. A797–A830, 2017.
- D. A. Sushnikova and I. V. Oseledets, ““compress and eliminate” solver for symmetric positive definite sparse matrices,” SIAM Journal on Scientific Computing, vol. 40, no. 3, pp. A1742–A1762, 2018.
- Q. Ma, S. Deshmukh, and R. Yokota, “Scalable linear time dense direct solver for 3-d problems without trailing sub-matrix dependencies,” in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ser. SC ’22. IEEE Press, 2022.
- T. Takahashi, C. Chen, and E. Darve, “Parallelization of the inverse fast multipole method with an application to boundary element method,” Computer Physics Communications, vol. 247, p. 106975, 2020.
- W. Hackbusch, “A sparse matrix arithmetic based on H-matrices. part I: Introduction to H-matrices,” Computing, vol. 62, no. 2, pp. 89–108, 1999.
- W. Hackbusch and S. Börm, “Data-sparse approximation by adaptive H2superscript𝐻2H^{2}italic_H start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-matrices,” Computing, vol. 69, no. 1, pp. 1–35, 2002.
- W. Hackbusch and B. N. Khoromskij, “A sparse H-matrix arithmetic. part ii: Application to multi-dimensional problems,” Computing, vol. 64, no. 1, p. 21–47, Jan. 2000.
- L. Grasedyck, R. Kriemann, and S. Le Borne, “Parallel black box-lu preconditioning for elliptic boundary value problems,” Computing and visualization in science, vol. 11, no. 4-6, pp. 273–291, 2008.
- R. Kriemann, “H-lu factorization on many-core systems,” Computing and Visualization in Science, vol. 16, no. 3, pp. 105–117, 2013.
- ——, “Parallel-matrix arithmetics on shared memory systems,” Computing, vol. 74, pp. 273–297, 2005.
- S. Chandrasekaran, M. Gu, and T. Pals, “A fast ULV decomposition solver for hierarchically semiseparable representations,” SIAM Journal on Matrix Analysis and Applications, vol. 28, no. 3, pp. 603–622, 2006.
- S. Chandrasekaran, P. Dewilde, M. Gu, W. Lyons, and T. Pals, “A fast solver for HSS representations via sparse matrices,” SIAM Journal on Matrix Analysis and Applications, vol. 29, no. 1, pp. 67–81, 2007.
- J. Xia, S. Chandrasekaran, M. Gu, and X. S. Li, “Superfast multifrontal method for large structured linear systems of equations,” SIAM Journal on Matrix Analysis and Applications, vol. 31, no. 3, pp. 1382–1411, 2010.
- S. Ambikasaran and E. Darve, “An\\\backslash\mathcal o (n\\\backslash\log n) o (n log n) fast direct solver for partial hierarchically semi-separable matrices: With application to radial basis function interpolation,” Journal of Scientific Computing, vol. 57, pp. 477–501, 2013.
- A. Aminfar, S. Ambikasaran, and E. Darve, “A fast block low-rank dense solver with applications to finite-element matrices,” Journal of Computational Physics, vol. 304, pp. 170–188, 2016.
- Y. Chen, “A fast, direct algorithm for the lippmann–schwinger integral equation in two dimensions,” Advances in Computational Mathematics, vol. 16, pp. 175–190, 2002.
- J. Bremer, “A fast direct solver for the integral equations of scattering theory on planar curves with corners,” Journal of Computational Physics, vol. 231, no. 4, pp. 1879–1899, 2012.
- F.-H. Rouet, X. S. Li, P. Ghysels, and A. Napov, “A distributed-memory package for dense hierarchically semi-separable matrix computations using randomization,” ACM Transactions on Mathematical Software (TOMS), vol. 42, no. 4, pp. 1–35, 2016.
- P. Ghysels, X. S. Li, F.-H. Rouet, S. Williams, and A. Napov, “An efficient multicore implementation of a novel hss-structured multifrontal solver using randomized sampling,” SIAM Journal on Scientific Computing, vol. 38, no. 5, pp. S358–S384, 2016.
- X. Liu, J. Xia, and M. V. De Hoop, “Parallel randomized and matrix-free direct solvers for large structured dense linear systems,” SIAM Journal on Scientific Computing, vol. 38, no. 5, pp. S508–S538, 2016.
- D. Cai, E. Chow, L. Erlandson, Y. Saad, and Y. Xi, “Smash: Structured matrix approximation by separation and hierarchy,” Numerical Linear Algebra with Applications, vol. 25, no. 6, p. e2204, 2018.
- S. Ambikasaran, K. R. Singh, and S. S. Sankaran, “Hodlrlib: A library for hierarchical matrices,” Journal of Open Source Software, vol. 4, no. 34, p. 1167, 2019.
- D. Y. Chenhan, S. Reiz, and G. Biros, “Distributed o (n) linear solver for dense symmetric hierarchical semi-separable matrices,” in 2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC). IEEE, 2019, pp. 1–8.
- C. Chen and P.-G. Martinsson, “Solving linear systems on a gpu with hierarchically off-diagonal low-rank approximations,” in SC22: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2022, pp. 1–15.
- S. Deshmukh, R. Yokota, G. Bosilca, and Q. Ma, “O (n) distributed direct factorization of structured dense matrices using runtime systems.” in Proceedings of the 52nd International Conference on Parallel Processing, 2023, pp. 1–10.
- P.-G. Martinsson and V. Rokhlin, “A fast direct solver for boundary integral equations in two dimensions,” Journal of Computational Physics, vol. 205, no. 1, pp. 1–23, 2005.
- L. Greengard, D. Gueyffier, P.-G. Martinsson, and V. Rokhlin, “Fast direct solvers for integral equations in complex three-dimensional domains,” Acta Numerica, vol. 18, pp. 243–275, 2009.
- T. Mary, “Block low-rank multifrontal solvers: complexity, performance, and scalability,” Ph.D. dissertation, Université Paul Sabatier-Toulouse III, 2017.
- K. Akbudak, H. Ltaief, A. Mikhalev, and D. Keyes, “Tile low rank cholesky factorization for climate/weather modeling applications on manycore architectures,” in International Conference on High Performance Computing. Springer, 2017, pp. 22–40.
- P. R. Amestoy, A. Buttari, J.-Y. L’excellent, and T. Mary, “Performance and scalability of the block low-rank multifrontal factorization on multicore architectures,” ACM Transactions on Mathematical Software (TOMS), vol. 45, no. 1, pp. 1–26, 2019.
- W. Boukaram, S. Zampini, G. Turkiyyah, and D. Keyes, “H2opus-tlr: High performance tile low rank symmetric factorizations using adaptive randomized approximation,” arXiv preprint arXiv:2108.11932, 2021.
- Q. Cao, R. Alomairy, Y. Pei, G. Bosilca, H. Ltaief, D. Keyes, and J. Dongarra, “A framework to exploit data sparsity in tile low-rank cholesky factorization,” in 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 2022, pp. 414–424.
- Q. Cao, S. Abdulah, R. Alomairy, Y. Pei, P. Nag, G. Bosilca, J. Dongarra, M. G. Genton, D. E. Keyes, H. Ltaief, and Y. Sun, “Reshaping geostatistical modeling and prediction for extreme-scale environmental applications,” in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ser. SC ’22. IEEE Press, 2022.
- D. Sushnikova, L. Greengard, M. O’Neil, and M. Rachh, “Fmm-lu: A fast direct solver for multiscale boundary integral equations in three dimensions,” arXiv preprint arXiv:2201.07325, 2022.
- H. Cheng, Z. Gimbutas, P.-G. Martinsson, and V. Rokhlin, “On the compression of low rank matrices,” SIAM Journal on Scientific Computing, vol. 26, no. 4, pp. 1389–1404, 2005.
- L. Cambier, C. Chen, E. G. Boman, S. Rajamanickam, R. S. Tuminaro, and E. Darve, “An algebraic sparsified nested dissection algorithm using low-rank approximations,” SIAM Journal on Matrix Analysis and Applications, vol. 41, no. 2, pp. 715–746, 2020.
- M. Gu and S. C. Eisenstat, “Efficient algorithms for computing a strong rank-revealing qr factorization,” SIAM Journal on Scientific Computing, vol. 17, no. 4, pp. 848–869, 1996.
- Y. Dong and P.-G. Martinsson, “Simpler is better: a comparative study of randomized algorithms for computing the cur decomposition,” arXiv preprint arXiv:2104.05877, 2021.
- H. Cheng, W. Y. Crutchfield, Z. Gimbutas, L. F. Greengard, J. F. Ethridge, J. Huang, V. Rokhlin, N. Yarvin, and J. Zhao, “A wideband fast multipole method for the helmholtz equation in three dimensions,” Journal of Computational Physics, vol. 216, no. 1, pp. 300–325, 2006.
- W. Fong and E. Darve, “The black-box fast multipole method,” Journal of Computational Physics, vol. 228, no. 23, pp. 8712–8725, 2009.
- R. Wang, C. Chen, J. Lee, and E. Darve, “PBBFMM3D: a parallel black-box algorithm for kernel matrix-vector multiplication,” Journal of Parallel and Distributed Computing, vol. 154, pp. 64–73, 2021.
- L. Ying, G. Biros, and D. Zorin, “A kernel-independent adaptive fast multipole algorithm in two and three dimensions,” Journal of Computational Physics, vol. 196, no. 2, pp. 591–626, 2004.
- E. Ayguadé, N. Copty, A. Duran, J. Hoeflinger, Y. Lin, F. Massaioli, X. Teruel, P. Unnikrishnan, and G. Zhang, “The design of openmp tasks,” IEEE Transactions on Parallel and Distributed systems, vol. 20, no. 3, pp. 404–418, 2008.
- W. Lu, L. E. Peña, P. Shamis, V. Churavy, B. Chapman, and S. Poole, “Bring the bitcode – moving compute and data in distributed heterogeneous systems,” 2022. [Online]. Available: https://arxiv.org/abs/2208.01154