Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CoLA: Exploiting Compositional Structure for Automatic and Efficient Numerical Linear Algebra (2309.03060v2)

Published 6 Sep 2023 in cs.LG, cs.NA, math.NA, and stat.ML

Abstract: Many areas of machine learning and science involve large linear algebra problems, such as eigendecompositions, solving linear systems, computing matrix exponentials, and trace estimation. The matrices involved often have Kronecker, convolutional, block diagonal, sum, or product structure. In this paper, we propose a simple but general framework for large-scale linear algebra problems in machine learning, named CoLA (Compositional Linear Algebra). By combining a linear operator abstraction with compositional dispatch rules, CoLA automatically constructs memory and runtime efficient numerical algorithms. Moreover, CoLA provides memory efficient automatic differentiation, low precision computation, and GPU acceleration in both JAX and PyTorch, while also accommodating new objects, operations, and rules in downstream packages via multiple dispatch. CoLA can accelerate many algebraic operations, while making it easy to prototype matrix structures and algorithms, providing an appealing drop-in tool for virtually any computational effort that requires linear algebra. We showcase its efficacy across a broad range of applications, including partial differential equations, Gaussian processes, equivariant model construction, and unsupervised learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL https://www.tensorflow.org/. Software available from tensorflow.org.
  2. Scalable Second Order Optimization for Deep Learning. Preprint arXiv 2002.09018v2, 2020.
  3. Walter Edwin Arnoldi. The principle of minimized iterations in the solution of the matrix eigenvalue problem. Quarterly of applied mathematics, 9(1):17–29, 1951.
  4. S Becker and Yann Lecun. Improving the convergence of back-propagation learning with second-order methods. In Proceedings of the 1988 Connectionist Models Summer School, San Mateo, pages 29–37. Morgan Kaufmann, 1989.
  5. PyAMG: Algebraic Multigrid Solvers in Python. Journal of Open Source Software, 2023.
  6. Julia: A Fresh Approach to Numerical Computing. arXiv preprint arXiv:1411.1607, 2014.
  7. Multi-task Gaussian Process Prediction. Advances in Neural Information Processing Systems (NeurIPS), 2007.
  8. JAX: composable transformations of Python+NumPy programs. SoftwareX, 2018. URL http://github.com/google/jax.
  9. Adaptive Smoothed Aggregation (α𝛼\alphaitalic_αSA) Multigrid. SIAM Review, 2005.
  10. Kernel operations on the GPU, with autodiff, without memory overflows. Journal of Machine Learning Research, 22(1):3457–3462, 2021.
  11. Fast gaussian process methods for point process intensity estimation. In International Conference on Machine Learning (ICML), pages 192–199, 2008.
  12. Marco Cuturi. Sinkhorn Distances: Lightspeed Computation of Optimal Transport. Advances in Neural Information Processing Systems (NeurIPS), 2013.
  13. Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations. International Conference on Machine Learning (ICML), 2019.
  14. Timothy A Davis. Direct methods for sparse linear systems. SIAM, 2006.
  15. Evolutional Deep Neural Network. Physical Review E, 104(4):045303, 2021.
  16. A Practical Method for Constructing Equivariant Multilayer Perceptrons for Arbitrary Matrix Groups. International Conference on Machine Learning (ICML), 2021.
  17. A Stable and Scalable Method for Solving Initial Value PDEs with Neural Networks. International Conference on Learning Representations (ICLR), 2023.
  18. Hungry Hungry Hippos: Towards Language Modeling with State Space Models. Preprint arXiv 2212.14052v3, 2023.
  19. Product kernel interpolation for scalable gaussian processes. In International Conference on Artificial Intelligence and Statistics, pages 1407–1416. PMLR, 2018a.
  20. GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration. Advances in Neural Information Processing Systems (NeurIPS), 2018b.
  21. Gene H Golub and Charles F Van Loan. Matrix Computations. The Johns Hopkins University Press, 2018. Fourth Edition.
  22. Michael F Hutchinson. A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines. Communications in Statistics-Simulation and Computation, 18(3):1059–1076, 1989.
  23. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction. Advances in Neural Information Processing Systems (NeurIPS), 2013.
  24. SKIing on Simplices: Kernel Interpolation on the Permutohedral Lattice for Scalable Gaussian Processes. International Conference on Machine Learning (ICML), 2021.
  25. A general framework for vecchia approximations of gaussian processes. Statistical science, 36(1):124–141, 2021.
  26. Andrew Knyazev. Toward The Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method. SIAM Journal on Scientific Computing, 2000.
  27. Neural Operator: Learning Maps Between Function Spaces. Preprint arXiv 2108.08481v3, 2021.
  28. Measuring the Intrinsic Dimension of Objective Landscapes. International Conference on Learning Representations (ICLR), 2018.
  29. A general linear-time inference method for gaussian processes on one dimension. The Journal of Machine Learning Research, 22(1):10580–10615, 2021.
  30. Dougal Maclaurin. Modeling, inference and optimization with composable differentiable procedures. PhD thesis, School of Engineering and Applied Sciences, Harvard University, 2016.
  31. Gradient-based hyperparameter optimization through reversible learning. In International conference on machine learning, pages 2113–2122. PMLR, 2015.
  32. Low-Precision Arithmetic for Fast Gaussian Processes. Conference on Uncertainty in Artificial Intelligence (UAI), 2022.
  33. James Martens. Deep learning via hessian-free optimization. In Proceedings of the 27th International Conference on International Conference on Machine Learning, pages 735–742, 2010.
  34. Optimizing neural networks with kronecker-factored approximate curvature. In International conference on machine learning, pages 2408–2417. PMLR, 2015.
  35. Randomized Numerical Linear Algebra: Foundations & Algorithms. arXiv 2002.01387v3, 2020.
  36. On Spectral Clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems (NeurIPS), 2001.
  37. S4ND: Modeling Images and Videos as Multidimensional Signals Using State Spaces. Preprint arXiv 2210.06583v2, 2022.
  38. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  39. Film: Visual reasoning with a general conditioning layer. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pages 3942–3951. AAAI Press, 2018.
  40. Random Features for Large-Scale Kernel Machines. Advances in Neural Information Processing Systems, 2007.
  41. PyLops—A linear-operator Python library for scalable algebra and optimization. SoftwareX, 11:100361, 2020. ISSN 2352-7110. doi: https://doi.org/10.1016/j.softx.2019.100361. URL https://www.sciencedirect.com/science/article/pii/S2352711019301086.
  42. Topmoumoute online natural gradient algorithm. Advances in neural information processing systems, 20, 2007.
  43. Gmres: A generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM Journal on scientific and statistical computing, 7(3):856–869, 1986.
  44. Yousef Saad. Iterative methods for sparse linear systems. SIAM, 2003.
  45. Ohad Shamir. A Stochastic PCA and SVD Algorithm with an Exponential Convergence Rate. arXiv preprint arXiv:1409.2848v5, 2015.
  46. Sparse gaussian processes using pseudo-inputs. Advances in neural information processing systems, 18, 2005.
  47. Chainer: a next-generation open source framework for deep learning. In NeurIPS Workshop on Machine Learning Systems (LearningSys), volume 5, pages 1–6, 2015.
  48. Numerical Linear Algebra. SIAM, 1997.
  49. Ewout van den Berg and Michael P. Friedlander. Spot – A Linear-Operator Toolbox. SoftwareX, 2013. URL http://www.cs.ubc.ca/labs/scl/spot/.
  50. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17:261–272, 2020. doi: 10.1038/s41592-019-0686-2.
  51. Exact gaussian processes on a million data points. Advances in neural information processing systems, 32, 2019.
  52. Kernel interpolation for scalable structured gaussian processes (kiss-gp). In International conference on machine learning, pages 1775–1784. PMLR, 2015.
  53. Fast kernel learning for multidimensional pattern extrapolation. Advances in neural information processing systems, 27, 2014.
  54. Max A Woodbury. Inverting modified matrices. Department of Statistics, Princeton University, 1950.
  55. Accelerated stochastic power iteration. In International Conference on Artificial Intelligence and Statistics, pages 58–67. PMLR, 2018.
Citations (5)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Youtube Logo Streamline Icon: https://streamlinehq.com