Orthogonal layers of parallelism in large-scale eigenvalue computations (2209.01974v2)
Abstract: We address the communication overhead of distributed sparse matrix-(multiple)-vector multiplication in the context of large-scale eigensolvers, using filter diagonalization as an example. The basis of our study is a performance model which includes a communication metric that is computed directly from the matrix sparsity pattern without running any code. The performance model quantifies to which extent scalability and parallel efficiency are lost due to communication overhead. To restore scalability, we identify two orthogonal layers of parallelism in the filter diagonalization technique. In the horizontal layer the rows of the sparse matrix are distributed across individual processes. In the vertical layer bundles of multiple vectors are distributed across separate process groups. An analysis in terms of the communication metric predicts that scalability can be restored if, and only if, one implements the two orthogonal layers of parallelism via different distributed vector layouts. Our theoretical analysis is corroborated by benchmarks for application matrices from quantum and solid state physics, road networks, and nonlinear programming. We finally demonstrate the benefits of using orthogonal layers of parallelism with two exemplary application cases -- an exciton and a strongly correlated electron system -- which incur either small or large communication overhead.
- Improving performance of sparse matrix dense matrix multiplication on large-scale parallel systems. Parallel Comput. 59 (2016), 71–96. https://doi.org/10.1016/j.parco.2016.10.001 Theory and Practice of Irregular Applications.
- Level-Based Blocking for Sparse Matrices: Sparse Matrix-Power-Vector Multiplication. IEEE Transactions on Parallel and Distributed Systems 34, 2 (2023), 581–597. https://doi.org/10.1109/TPDS.2022.3223512
- Andreas Alvermann. 2022. ScaMaC – A Scalable Matrix Collection. www.bitbucket.org/essex/matrixcollection
- Andreas Alvermann and Holger Fehske. 2018. Exciton mass and exciton spectrum in the cuprous oxide. J. Phys. B 51, 4 (2018), 044001. https://doi.org/10.1088/1361-6455/aaa060
- Variational discrete variable representation for excitons on a lattice. Phys. Rev. B 84 (Jul 2011), 035126. Issue 3.
- Adolfo Avella and Ferdinando Mancini (Eds.). 2012. Strongly Correlated Systems. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21831-6
- Anasazi software for the numerical solution of large-scale eigenvalue problems. ACM Trans. Math. Software 36, 3 (2009), 1–23. https://doi.org/10.1145/1527286.1527287
- Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. SIAM, Philadelphia, PA.
- ScaLAPACK User’s Guide. Society for Industrial and Applied Mathematics, USA. https://netlib.org/scalapack/
- ILUPACK—Preconditioning Software Package. www.icm.tu-bs.de/~bolle/ilupack/.
- The Zoltan and Isorropia Parallel Toolkits for Combinatorial Scientific Computing: Partitioning, Ordering and Coloring. Scientific Programming 20 (2012), 129–150. https://doi.org/10.3233/SPR-2012-0342
- Jane K. Cullum and Ralph A. Willoughby. 1985. Lanczos Algorithms for Large Symmetric Eigenvalue Computations. Vol. I & II. Birkhäuser, Boston.
- E. Dagotto. 1994. Correlated electrons in high-temperature superconductors. Rev. Mod. Phys. 66 (1994), 763.
- T. A. Davis and Yifan Hu. 2011. The University of Florida sparse matrix collection. ACM Trans. Math. Software 38, 1 (2011), 1–25. https://doi.org/10.1145/2049662.2049663
- Colloquium: Nonthermal pathways to ultrafast control in quantum materials. Rev. Mod. Phys. 93 (Oct 2021), 041002. Issue 4. https://doi.org/10.1103/RevModPhys.93.041002
- Communication-optimal Parallel and Sequential QR and LU Factorizations. SIAM J. Sci. Comp. 34 (Feb. 2012), 206–239.
- SuiteSparse : A Suite of Sparse matrix software. https://github.com/DrTimothyAldenDavis/SuiteSparse Accessed: 2023-02-27.
- Jacobi-Davidson style QR and QZ algorithms for the reduction of matrix pencils. SIAM J. Sci. Comp. 20 (1998), 94–125.
- Improved Coefficients for Polynomial Filtering in ESSEX. In Eigenvalue Problems: Algorithms, Software and Applications in Petascale Computing, Tetsuya Sakurai, Shao-Liang Zhang, Toshiyuki Imamura, Yusaku Yamamoto, Yoshinobu Kuramashi, and Takeo Hoshi (Eds.). Springer International Publishing, Cham, 63–79.
- Georg Hager and Gerhard Wellein. 2010. Introduction to High Performance Computing for Scientists and Engineers. CRC Press, Boca Raton.
- SLEPc: A scalable and flexible toolkit for the solution of eigenvalue problems. ACM Trans. Math. Software 31, 3 (2005), 351–362.
- Flexible subspace iteration with moments for an effective contour integration-based eigensolver. Numerical Linear Algebra with Applications 29, 6 (2022), e2447. https://doi.org/10.1002/nla.2447 (in press).
- George Karypis. 2011. METIS and ParMETIS. In Encyclopedia of Parallel Computing, David Padua (Ed.). Springer US, Boston, MA, 1117–1124. https://doi.org/10.1007/978-0-387-09766-4_500
- Giant Rydberg excitons in the copper oxide Cu2OsubscriptCu2O\mathrm{Cu}_{2}\mathrm{O}roman_Cu start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_O. Nature 514 (2014), 343–347.
- Claus Klingshirn. 2007. Semiconductor Optics (third ed.). Springer, Berlin.
- Chebyshev Filter Diagonalization on Modern Manycore Processors and GPGPUs. In High Performance Computing, Rio Yokota, Michèle Weiland, David Keyes, and Carsten Trinitis (Eds.). Springer International Publishing, Cham, 329–349.
- A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units. SIAM Journal on Scientific Computing 36, 5 (2014), C401–C423. https://doi.org/10.1137/130930352
- GHOST: Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems. International Journal of Parallel Programming 45, 5 (1 Oct 2017), 1046–1072. https://doi.org/10.1007/s10766-016-0464-z
- N. J. Lehmann. 1963. Optimale Eigenwerteinschließungen. Numer. Math. 5 (1963), 246–272.
- ARPACK Users’ Guide. http://www.caam.rice.edu/software/ARPACK/. https://doi.org/10.1137/1.9780898719628
- The Eigenvalues Slicing Library (EVSL): Algorithms, Implementation, and Software. SIAM Journal on Scientific Computing 41, 4 (2019), C393–C415. https://doi.org/10.1137/18M1170935 arXiv:https://doi.org/10.1137/18M1170935
- David J. Luitz. 2021. Polynomial filter diagonalization of large Floquet unitary operators. SciPost Phys. 11 (2021), 021. https://doi.org/10.21468/SciPostPhys.11.2.021
- John D. McCalpin. 1995. Memory Bandwidth and Machine Balance in Current High Performance Computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, December 1995.
- Minimizing Communication in Sparse Matrix Solvers. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (Portland, Oregon) (SC ’09). Association for Computing Machinery, New York, NY, USA, Article 36, 12 pages. https://doi.org/10.1145/1654059.1654096
- A domain-specific language and matrix-free stencil code for investigating electronic properties of Dirac and topological materials. The International Journal of High Performance Computing Applications 35, 1 (2021), 60–77. https://doi.org/10.1177/1094342020959423 arXiv:https://doi.org/10.1177/1094342020959423
- High-Performance Implementation of Chebyshev Filter Diagonalization for Interior Eigenvalue Computations. J. Comp. Phys. 325 (2016), 226–243. http://dx.doi.org/10.1016/j.jcp.2016.08.027
- Shift-invert diagonalization of large many-body localizing spin chains. SciPost Phys. 5 (2018), 45. Issue 5. https://doi.org/10.21468/SciPostPhys.5.5.045
- E. Polizzi. 2009. Density-matrix-based algorithm for solving eigenvalue problems. Phys. Rev. B 79 (2009), 115112.
- PARDISO Solver Project. 2022. http://www.pardiso-project.org/. Accessed: 2022-08-04.
- Increasing the performance of the Jacobi-Davidson method by blocking. SIAM J. Sci. Comp. 37, 6 (2015), 206–239. https://doi.org/10.1137/140976017
- Yousef Saad. 2006. Filtered Conjugate Residual‐type Algorithms with Applications. SIAM J. Matrix Anal. Appl. 28, 3 (2006), 845–870. https://doi.org/10.1137/060648945 arXiv:https://doi.org/10.1137/060648945
- Yousef Saad. 2011. Numerical Methods for Large Eigenvalue Problems (revised ed.). Classics in Applied Mathematics, Vol. 66. Society for Industrial and Applied Mathematics (SIAM), Philadelphia.
- On Large-Scale Diagonalization Techniques for the Anderson Model of Localization. SIAM Rev. 50 (2008), 91–112.
- Observing non-ergodicity due to kinetic constraints in tilted Fermi-Hubbard chains. Nature Communications 12 (2021), 4490. https://doi.org/10.1038/s41467-021-24726-0
- Observation of many-body localization of interacting fermions in a quasirandom optical lattice. Science 349, 6250 (2015), 842–845. https://doi.org/10.1126/science.aaa7432
- Polynomially Filtered Exact Diagonalization Approach to Many-Body Localization. Phys. Rev. Lett. 125 (Oct 2020), 156601. Issue 15. https://doi.org/10.1103/PhysRevLett.125.156601
- Gerard L. G. Sleijpen and Henk A. van der Vorst. 1996. A Jacobi-Davidson iteration method for linear eigenvalue problems. SIAM J. Matrix Anal. Appl. 17 (1996), 401–425.
- Danny C. Sorensen. 2002. Numerical methods for large eigenvalues problems. Acta Numerica 11 (2002), 519–584.
- A. Stathopoulos and K. Wu. 2002. A block orthogonalization procedure with constant synchronization requirements. SIAM J. Sci. Comp. 23 (2002), 2165–2182. Issue 6.
- The kernel polynomial method. Rev. Mod. Phys. 78 (2006), 275–306.
- ChASE: Chebyshev Accelerated Subspace Iteration Eigensolver for Sequences of Hermitian Eigenvalue Problems. ACM Trans. Math. Softw. 45, 2, Article 21 (apr 2019), 34 pages. https://doi.org/10.1145/3313828
- Matrix Powers Kernels for Thick-Restart Lanczos with Explicit External Deflation. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, New York City, 472–481. https://doi.org/10.1109/IPDPS.2019.00057
- Mixed-Precision Orthogonalization Scheme and Adaptive Step Size for Improving the Stability and Performance of CA-GMRES on GPUs. In High Performance Computing for Computational Science – VECPAR 2014, Michel Daydé, Osni Marques, and Kengo Nakajima (Eds.). Springer International Publishing, Cham, 17–30.
- Self-consistent-field calculations using Chebychev-filtered subspace iteration. J. Comp. Phys. 219 (2006), 172–184.