Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
92 tokens/sec
Gemini 2.5 Pro Premium
50 tokens/sec
GPT-5 Medium
15 tokens/sec
GPT-5 High Premium
23 tokens/sec
GPT-4o
97 tokens/sec
DeepSeek R1 via Azure Premium
87 tokens/sec
GPT OSS 120B via Groq Premium
466 tokens/sec
Kimi K2 via Groq Premium
201 tokens/sec
2000 character limit reached

Orthogonal layers of parallelism in large-scale eigenvalue computations (2209.01974v2)

Published 5 Sep 2022 in cs.DC

Abstract: We address the communication overhead of distributed sparse matrix-(multiple)-vector multiplication in the context of large-scale eigensolvers, using filter diagonalization as an example. The basis of our study is a performance model which includes a communication metric that is computed directly from the matrix sparsity pattern without running any code. The performance model quantifies to which extent scalability and parallel efficiency are lost due to communication overhead. To restore scalability, we identify two orthogonal layers of parallelism in the filter diagonalization technique. In the horizontal layer the rows of the sparse matrix are distributed across individual processes. In the vertical layer bundles of multiple vectors are distributed across separate process groups. An analysis in terms of the communication metric predicts that scalability can be restored if, and only if, one implements the two orthogonal layers of parallelism via different distributed vector layouts. Our theoretical analysis is corroborated by benchmarks for application matrices from quantum and solid state physics, road networks, and nonlinear programming. We finally demonstrate the benefits of using orthogonal layers of parallelism with two exemplary application cases -- an exciton and a strongly correlated electron system -- which incur either small or large communication overhead.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Improving performance of sparse matrix dense matrix multiplication on large-scale parallel systems. Parallel Comput. 59 (2016), 71–96. https://doi.org/10.1016/j.parco.2016.10.001 Theory and Practice of Irregular Applications.
  2. Level-Based Blocking for Sparse Matrices: Sparse Matrix-Power-Vector Multiplication. IEEE Transactions on Parallel and Distributed Systems 34, 2 (2023), 581–597. https://doi.org/10.1109/TPDS.2022.3223512
  3. Andreas Alvermann. 2022. ScaMaC – A Scalable Matrix Collection. www.bitbucket.org/essex/matrixcollection
  4. Andreas Alvermann and Holger Fehske. 2018. Exciton mass and exciton spectrum in the cuprous oxide. J. Phys. B 51, 4 (2018), 044001. https://doi.org/10.1088/1361-6455/aaa060
  5. Variational discrete variable representation for excitons on a lattice. Phys. Rev. B 84 (Jul 2011), 035126. Issue 3.
  6. Adolfo Avella and Ferdinando Mancini (Eds.). 2012. Strongly Correlated Systems. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21831-6
  7. Anasazi software for the numerical solution of large-scale eigenvalue problems. ACM Trans. Math. Software 36, 3 (2009), 1–23. https://doi.org/10.1145/1527286.1527287
  8. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. SIAM, Philadelphia, PA.
  9. ScaLAPACK User’s Guide. Society for Industrial and Applied Mathematics, USA. https://netlib.org/scalapack/
  10. ILUPACK—Preconditioning Software Package. www.icm.tu-bs.de/~bolle/ilupack/.
  11. The Zoltan and Isorropia Parallel Toolkits for Combinatorial Scientific Computing: Partitioning, Ordering and Coloring. Scientific Programming 20 (2012), 129–150. https://doi.org/10.3233/SPR-2012-0342
  12. Jane K. Cullum and Ralph A. Willoughby. 1985. Lanczos Algorithms for Large Symmetric Eigenvalue Computations. Vol. I & II. Birkhäuser, Boston.
  13. E. Dagotto. 1994. Correlated electrons in high-temperature superconductors. Rev. Mod. Phys. 66 (1994), 763.
  14. T. A. Davis and Yifan Hu. 2011. The University of Florida sparse matrix collection. ACM Trans. Math. Software 38, 1 (2011), 1–25. https://doi.org/10.1145/2049662.2049663
  15. Colloquium: Nonthermal pathways to ultrafast control in quantum materials. Rev. Mod. Phys. 93 (Oct 2021), 041002. Issue 4. https://doi.org/10.1103/RevModPhys.93.041002
  16. Communication-optimal Parallel and Sequential QR and LU Factorizations. SIAM J. Sci. Comp. 34 (Feb. 2012), 206–239.
  17. SuiteSparse : A Suite of Sparse matrix software. https://github.com/DrTimothyAldenDavis/SuiteSparse Accessed: 2023-02-27.
  18. Jacobi-Davidson style QR and QZ algorithms for the reduction of matrix pencils. SIAM J. Sci. Comp. 20 (1998), 94–125.
  19. Improved Coefficients for Polynomial Filtering in ESSEX. In Eigenvalue Problems: Algorithms, Software and Applications in Petascale Computing, Tetsuya Sakurai, Shao-Liang Zhang, Toshiyuki Imamura, Yusaku Yamamoto, Yoshinobu Kuramashi, and Takeo Hoshi (Eds.). Springer International Publishing, Cham, 63–79.
  20. Georg Hager and Gerhard Wellein. 2010. Introduction to High Performance Computing for Scientists and Engineers. CRC Press, Boca Raton.
  21. SLEPc: A scalable and flexible toolkit for the solution of eigenvalue problems. ACM Trans. Math. Software 31, 3 (2005), 351–362.
  22. Flexible subspace iteration with moments for an effective contour integration-based eigensolver. Numerical Linear Algebra with Applications 29, 6 (2022), e2447. https://doi.org/10.1002/nla.2447 (in press).
  23. George Karypis. 2011. METIS and ParMETIS. In Encyclopedia of Parallel Computing, David Padua (Ed.). Springer US, Boston, MA, 1117–1124. https://doi.org/10.1007/978-0-387-09766-4_500
  24. Giant Rydberg excitons in the copper oxide Cu2⁢OsubscriptCu2O\mathrm{Cu}_{2}\mathrm{O}roman_Cu start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_O. Nature 514 (2014), 343–347.
  25. Claus Klingshirn. 2007. Semiconductor Optics (third ed.). Springer, Berlin.
  26. Chebyshev Filter Diagonalization on Modern Manycore Processors and GPGPUs. In High Performance Computing, Rio Yokota, Michèle Weiland, David Keyes, and Carsten Trinitis (Eds.). Springer International Publishing, Cham, 329–349.
  27. A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units. SIAM Journal on Scientific Computing 36, 5 (2014), C401–C423. https://doi.org/10.1137/130930352
  28. GHOST: Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems. International Journal of Parallel Programming 45, 5 (1 Oct 2017), 1046–1072. https://doi.org/10.1007/s10766-016-0464-z
  29. N. J. Lehmann. 1963. Optimale Eigenwerteinschließungen. Numer. Math. 5 (1963), 246–272.
  30. ARPACK Users’ Guide. http://www.caam.rice.edu/software/ARPACK/. https://doi.org/10.1137/1.9780898719628
  31. The Eigenvalues Slicing Library (EVSL): Algorithms, Implementation, and Software. SIAM Journal on Scientific Computing 41, 4 (2019), C393–C415. https://doi.org/10.1137/18M1170935 arXiv:https://doi.org/10.1137/18M1170935
  32. David J. Luitz. 2021. Polynomial filter diagonalization of large Floquet unitary operators. SciPost Phys. 11 (2021), 021. https://doi.org/10.21468/SciPostPhys.11.2.021
  33. John D. McCalpin. 1995. Memory Bandwidth and Machine Balance in Current High Performance Computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, December 1995.
  34. Minimizing Communication in Sparse Matrix Solvers. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (Portland, Oregon) (SC ’09). Association for Computing Machinery, New York, NY, USA, Article 36, 12 pages. https://doi.org/10.1145/1654059.1654096
  35. A domain-specific language and matrix-free stencil code for investigating electronic properties of Dirac and topological materials. The International Journal of High Performance Computing Applications 35, 1 (2021), 60–77. https://doi.org/10.1177/1094342020959423 arXiv:https://doi.org/10.1177/1094342020959423
  36. High-Performance Implementation of Chebyshev Filter Diagonalization for Interior Eigenvalue Computations. J. Comp. Phys. 325 (2016), 226–243. http://dx.doi.org/10.1016/j.jcp.2016.08.027
  37. Shift-invert diagonalization of large many-body localizing spin chains. SciPost Phys. 5 (2018), 45. Issue 5. https://doi.org/10.21468/SciPostPhys.5.5.045
  38. E. Polizzi. 2009. Density-matrix-based algorithm for solving eigenvalue problems. Phys. Rev. B 79 (2009), 115112.
  39. PARDISO Solver Project. 2022. http://www.pardiso-project.org/. Accessed: 2022-08-04.
  40. Increasing the performance of the Jacobi-Davidson method by blocking. SIAM J. Sci. Comp. 37, 6 (2015), 206–239. https://doi.org/10.1137/140976017
  41. Yousef Saad. 2006. Filtered Conjugate Residual‐type Algorithms with Applications. SIAM J. Matrix Anal. Appl. 28, 3 (2006), 845–870. https://doi.org/10.1137/060648945 arXiv:https://doi.org/10.1137/060648945
  42. Yousef Saad. 2011. Numerical Methods for Large Eigenvalue Problems (revised ed.). Classics in Applied Mathematics, Vol. 66. Society for Industrial and Applied Mathematics (SIAM), Philadelphia.
  43. On Large-Scale Diagonalization Techniques for the Anderson Model of Localization. SIAM Rev. 50 (2008), 91–112.
  44. Observing non-ergodicity due to kinetic constraints in tilted Fermi-Hubbard chains. Nature Communications 12 (2021), 4490. https://doi.org/10.1038/s41467-021-24726-0
  45. Observation of many-body localization of interacting fermions in a quasirandom optical lattice. Science 349, 6250 (2015), 842–845. https://doi.org/10.1126/science.aaa7432
  46. Polynomially Filtered Exact Diagonalization Approach to Many-Body Localization. Phys. Rev. Lett. 125 (Oct 2020), 156601. Issue 15. https://doi.org/10.1103/PhysRevLett.125.156601
  47. Gerard L. G. Sleijpen and Henk A. van der Vorst. 1996. A Jacobi-Davidson iteration method for linear eigenvalue problems. SIAM J. Matrix Anal. Appl. 17 (1996), 401–425.
  48. Danny C. Sorensen. 2002. Numerical methods for large eigenvalues problems. Acta Numerica 11 (2002), 519–584.
  49. A. Stathopoulos and K. Wu. 2002. A block orthogonalization procedure with constant synchronization requirements. SIAM J. Sci. Comp. 23 (2002), 2165–2182. Issue 6.
  50. The kernel polynomial method. Rev. Mod. Phys. 78 (2006), 275–306.
  51. ChASE: Chebyshev Accelerated Subspace Iteration Eigensolver for Sequences of Hermitian Eigenvalue Problems. ACM Trans. Math. Softw. 45, 2, Article 21 (apr 2019), 34 pages. https://doi.org/10.1145/3313828
  52. Matrix Powers Kernels for Thick-Restart Lanczos with Explicit External Deflation. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, New York City, 472–481. https://doi.org/10.1109/IPDPS.2019.00057
  53. Mixed-Precision Orthogonalization Scheme and Adaptive Step Size for Improving the Stability and Performance of CA-GMRES on GPUs. In High Performance Computing for Computational Science – VECPAR 2014, Michel Daydé, Osni Marques, and Kengo Nakajima (Eds.). Springer International Publishing, Cham, 17–30.
  54. Self-consistent-field calculations using Chebychev-filtered subspace iteration. J. Comp. Phys. 219 (2006), 172–184.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.