Sparse additive function decompositions facing basis transforms (2403.15563v2)
Abstract: High-dimensional real-world systems can often be well characterized by a small number of simultaneous low-complexity interactions. The analysis of variance (ANOVA) decomposition and the anchored decomposition are typical techniques to find sparse additive decompositions of functions. In this paper, we are interested in a setting, where these decompositions are not directly spare, but become so after an appropriate basis transform. Noting that the sparsity of those additive function decompositions is equivalent to the fact that most of its mixed partial derivatives vanish, we can exploit a connection to the underlying function graphs to determine an orthogonal transform that realizes the appropriate basis change. This is done in three steps: we apply singular value decomposition to minimize the number of vertices of the function graph, and joint block diagonalization techniques of families of matrices followed by sparse minimization based on relaxations of the zero ''norm'' for minimizing the number of edges. For the latter one, we propose and analyze minimization techniques over the manifold of special orthogonal matrices. Various numerical examples illustrate the reliability of our approach for functions having, after a basis transform, a sparse additive decomposition into summands with at most two variables.
- P. Ablin and G. Peyré. Fast and accurate optimization on the orthogonal manifold without retraction. In G. Camps-Valls, F. J. R. Ruiz, and I. Valera, editors, Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, pages 5636–5657. PMLR, 2022.
- Convergence of the iterates of descent methods for analytic cost functions. SIAM Journal on Optimization, 16(2):531–547, 2005.
- Neural additive models: Interpretable machine learning with neural nets. Advances in Neural Information Processing Systems, 34:4699–4711, 2021.
- Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality. Mathematics of Operations Research, 35(2):438–457, 2010.
- F. A. Ba and M. Quellmalz. Accelerating the sinkhorn algorithm for sparse multi-marginal optimal transport via fast Fourier transforms. Algorithms, 15(9), 2022.
- J. Baldeaux and M. Gnewuch. Optimal randomized multilevel algorithms for infinite-dimensional integration on function spaces with ANOVA-type decomposition. SIAM Journal on Numerical Analysis, 52(3):1128–1155, 2014.
- Grouped transformations and regularization in high-dimensional explainable ANOVA approximation. SIAM Journal on Scientific Computing, 44(3):A1606–A1631, 2022.
- Unbalanced multi-marginal optimal transport. Journal of Mathematical Imaging and Vision, 65(3):394–413, Jun 2023.
- J. Bilmes and C. Bartels. Graphical model architectures for speech recognition. IEEE Signal Processing Magazine, 22(5):89–100, 2005.
- J. Bilmes and G. Zweig. The graphical models toolkit: An open source software system for speech and time-series processing. In 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 4, pages IV–3916–IV–3919, 2002.
- N. Boumal. An Introduction to Optimization on Smooth manifolds. Cambridge University Press, 2023.
- Global rates of convergence for nonconvex optimization on manifolds. IMA Journal of Numerical Analysis, 39(1):1–33, 02 2018.
- R. E. Caflisch. Valuation of morgage backed securities using Brownian bridges to reduce effective dimension. The Journal of Computational Finance, 1:27–46, 1997.
- Node-gam: Neural generalized additive model for interpretable deep learning. arXiv:2106.01613, 2021.
- P. M. Cohn. Basic Algebra: Groups, Rings and Fields. Springer Science & Business Media, 2012.
- High-dimensional integration: The quasi-Monte Carlo way. Acta Numerica, 22:133–288, 2013.
- J. Enouen and Y. Liu. Sparse interaction additive networks via feature interaction detection and sparse selection. Advances in Neural Information Processing Systems, 35:13908–13920, 2022.
- M. Griebel and M. Holtz. Dimension-wise integration of high-dimensional functions with applications to finance. Journal of Complexity, 26(5):455–489, 2010. SI: HDA 2009.
- The ANOVA decomposition of a non-smooth function of infinitely many variables can have every term smooth. Mathematics of Computations, 86(306):1855–1876, 2017.
- Array programming with NumPy. Nature, 585(7825):357–362, Sept. 2020.
- Parseval proximal neural networks. Journal of Fourier Analysis and Applications, 26(59):1–36, 2020.
- On equivalence of weighted anchored and ANOVA spaces of functions with mixed smoothness of order one in l1subscript𝑙1l_{1}italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT or l∞subscript𝑙l_{\infty}italic_l start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT. Journal of Complexity, 32(1):1–19, 2016.
- Sparse mixture models inspired by ANOVA decompositions. Electronic Transactions on Numerical Analysis, pages 142–168, 2022.
- On tractability of weighted integration over bounded and unbounded regions in ℝssuperscriptℝ𝑠\mathbb{R}^{s}blackboard_R start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT. Mathematics of Computation, 73(248):1885–1901, 2004.
- The strong tractability of multivariate integration using lattice rules. In H. Niederreiter, editor, Monte Carlo and Quasi-Monte Carlo Methods 2002, pages 259–273, Berlin, Heidelberg, 2004. Springer Berlin Heidelberg.
- A brief introduction to manifold optimization. Journal of the Operations Research Society of China, 8:199–248, 2020.
- A. Hurwitz. Über die Erzeugung der Invarianten durch Integration. Nachrichten von der Gesellschaft der Wissenschaften zu Göttingen, Mathematisch-Physikalische Klasse, 1897:71–2, 1897.
- A probabilistic graphical model foundation for enabling predictive digital twins at scale. Nature Computational Science, 1(5):337–347, 2021.
- Geoopt: Riemannian optimization in pytorch. arXiv:2005.02819, 2020.
- A primer of real analytic functions. Birkhäuser Advanced Texts Basler Lehrbücher. Birkhäuser Boston, MA, 2nd ed. edition, 2002.
- Infinite-dimensional integration and the multivariate decomposition method. Journal of Computational and Applied Mathematics, 326:217–234, 2017.
- On decompositions of multivariate functions. Mathematics of Computation, 79(270):953–966, 2010.
- G. Li and T. K. Pong. Calculus of the exponent of Kurdyka–Łojasiewicz inequality and its applications to linear convergence of first-order methods. Foundations of Computational Mathematics, 18(5):1199–1232, 2018.
- Probabilistic graphical models in energy systems: A review. Building Simulation, 15(5):699–728, 2022.
- L. Lippert and D. Potts. Variable transformations in combination with wavelets and ANOVA for high-dimensional approximation. arXiv:2207.12826, 2022.
- Fast hyperbolic wavelet regression meets ANOVA. Numerische Mathematik, 154(1):155–207, Jun 2023.
- T. Maehara and K. Murota. Algorithm for error-controlled simultaneous block-diagonalization of matrices. SIAM Journal on Matrix Analysis and Applications, 32(2):605–620, 2011.
- A numerical algorithm for block-diagonal decomposition of matrix ∗∗\ast∗-algebras with application to semidefinite programming. Japan Journal of Industrial and Applied Mathematics, 27(1):125–160, 2010.
- Learning in high-dimensional feature spaces using ANOVA-based fast matrix-vector multiplication. Foundations of Data Science, 4(3):423–440, 2022.
- Bayesian structure learning for climate model evaluation. Earth and Space Science Open Archive, 2023.
- Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019.
- D. Potts and M. Schmischke. Approximation of high-dimensional periodic functions with Fourier-based methods. SIAM Journal on Numerical Analysis, 59(5):2393–2429, 2021.
- D. Potts and M. Schmischke. Interpretable transformed ANOVA approximation on the example of the prevention of forest fires. Frontiers in Applied Mathematics and Statistics, 8:795250, 2022.
- Q. Rebjock and N. Boumal. Fast convergence to non-isolated minima: four equivalent conditions for C2superscript𝐶2C^{2}italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT functions. arXiv 2303.00096, 2023.
- R. Schneider and A. Uschmajew. Convergence results for projected line-search methods on varieties of low-rank matrices via Łojasiewicz inequality. SIAM Journal on Optimization, 25(1):622–646, 2015.
- H. H. Sohrab. Basic Real Analysis. Birkhäuser New York, NY, 2nd ed. edition, 2014.
- S. Sullivant. Algebraic Statistics. Graduate Studies in Mathematics. American Mathematical Society, Providence, Rhode Island, 2018.
- Multi-omics characterization of response to pd-1 inhibitors in advanced melanoma. Cancers, 15(17), 2023.
- Approximate matrix and tensor diagonalization by unitary transformations: Convergence of jacobi-type algorithms. SIAM Journal on Optimization, 30(4):2998–3028, 2020.
- tntorch: Tensor network learning with PyTorch. Journal of Machine Learning Research, 23(208):1–6, 2022.
- S.-T. Yau. Non-existence of continuous convex functions on certain riemannian manifolds. Mathematische Annalen, 207(4):269–270, 1974.
- Kurdyka-Łojasiewicz Exponent via Inf-projection. Foundations of Computational Mathematics, 22(4):1171–1217, 2022.