2000 character limit reached
Theory and applications of the Sum-Of-Squares technique (2306.16255v3)
Published 28 Jun 2023 in math.OC, cs.IT, math.IT, math.ST, and stat.TH
Abstract: The Sum-of-Squares (SOS) approximation method is a technique used in optimization problems to derive lower bounds on the optimal value of an objective function. By representing the objective function as a sum of squares in a feature space, the SOS method transforms non-convex global optimization problems into solvable semidefinite programs. This note presents an overview of the SOS method. We start with its application in finite-dimensional feature spaces and, subsequently, we extend it to infinite-dimensional feature spaces using reproducing kernels (k-SOS). Additionally, we highlight the utilization of SOS for estimating some relevant quantities in information theory, including the log-partition function.
- J. B. Lasserre, Global optimization with polynomials and the problem of moments, SIAM Journal on Optimization 11, 796 (2001).
- P. A. Parrilo, Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization (California Institute of Technology, 2000).
- The orthogonality is with respect to the following dot product in matrix space ⟨A,B⟩=tr[A∗B]𝐴𝐵trdelimited-[]superscript𝐴∗𝐵\langle A,B\rangle=\mathop{\rm tr}[A^{\ast}B]⟨ italic_A , italic_B ⟩ = roman_tr [ italic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_B ].
- L. Fejér, Über trigonometrische polynome., Journal für die reine und angewandte Mathematik 146, 53 (1916).
- U. Grenander and G. Szegö, Toeplitz Forms and Their Applications (University of California Press).
- We indicate with 𝔓(⋅)𝔓⋅\mathfrak{P}(\cdot)fraktur_P ( ⋅ ) the power set.
- M. Putinar, Positive polynomials on compact semi-algebraic sets, Indiana University Mathematics Journal 42, 969 (1993).
- M. X. Goemans and D. P. Williamson, Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming, Journal of the ACM (JACM) 42, 1115 (1995).
- G. S. Kimeldorf and G. Wahba, A correspondence between bayesian estimation on stochastic processes and smoothing by splines, The Annals of Mathematical Statistics 41, 495 (1970).
- N. Aronszajn, Theory of reproducing kernels, Transactions of the American mathematical society 68, 337 (1950).
- F. Bach, Learning theory from first principles, Draft of a book, version of Sept 6, 2021 (2021).
- U. Marteau-Ferey, F. Bach, and A. Rudi, Non-parametric models for non-negative functions, Advances in Neural Information Processing systems 33, 12816 (2020).
- U. Marteau-Ferey, F. Bach, and A. Rudi, Second order conditions to decompose smooth functions as sums of squares, arXiv preprint arXiv:2202.13729 (2022).
- A. Rudi, U. Marteau-Ferey, and F. Bach, Finding global minima via kernel approximations, arXiv preprint arXiv:2012.11978 (2020).
- P. Del Moral and A. Niclas, A taylor expansion of the square root matrix function, Journal of Mathematical Analysis and Applications 465, 259 (2018).
- D. Liberzon, Calculus of variations and optimal control theory: a concise introduction (Princeton University Press, 2011).
- R. Vinter, Convex duality and nonlinear optimal control, SIAM Journal on Control and Optimization 31, 518 (1993).
- F. Bach, Sum-of-squares relaxations for information theory and variational inference (2022a).
- F. Bach, Information theory with kernel methods (2022b).
- F. R. Bach and M. I. Jordan, Kernel independent component analysis, Journal of machine learning research 3, 1 (2002).
- K. Matsumoto, A new quantum version of f-divergence, in Nagoya Winter Workshop: Reality and Measurement in Algebraic Quantum Theory (Springer, 2015) pp. 229–273.
- B. Simon, Orthogonal polynomials on the unit circle. part 1: Classical theory (2005) pp. xxvi+466.
- T. M. Cover, Elements of information theory (John Wiley & Sons, 1999).