Term Sparsity-Based Hierarchy

Updated 29 November 2025

Term sparsity-based hierarchy is a framework that exploits the inherent sparsity in polynomial terms to decompose and reduce the dimensionality of convex relaxations.
It constructs term-adjacency graphs and employs chordal completions to block-diagonalize semidefinite programs, significantly enhancing computational efficiency.
Extensions include symmetry adaptation, noncommutative formulations, and applications to complex polynomial systems, dynamical systems, and regression models.

The term sparsity-based hierarchy is a class of optimization methodologies and algorithmic frameworks that exploit the sparsity of polynomial terms to achieve dramatic decomposition and dimensionality reduction in polynomial optimization and related convex relaxations. Term sparsity refers to the property that, in high-dimensional polynomial systems, only a small subset of all possible monomials is present in each polynomial. By precisely tracking the interactions between these nonzero monomials, rather than variables, one builds graph-theoretic representations which may be iteratively refined and chordally completed; semidefinite programming (SDP) relaxations are then decomposed over maximal cliques in these graphs, yielding block-diagonalized constraints with size dictated by the local term structure and symmetry, rather than overall variable count or relaxation order. Recent advances further extend such term sparsity methodologies to symmetry-adapted bases, complex polynomial, noncommutative polynomial, and hierarchical representations, producing convergent hierarchies of relaxations, block-decomposable moment and localizing matrices, and strong guarantees of tightness under mild regularity conditions (Klep et al., 22 Nov 2025, Wang et al., 2020, Wang et al., 2019, Magron et al., 2021, Wang et al., 2021, Wang et al., 2020, Wang et al., 2018, Balagansky et al., 30 May 2025).

1. Fundamental Framework: Polynomial Optimization and Moment-SOS Hierarchies

Polynomial optimization problems (POPs) are typically formulated as

$\min_{x \in \mathbb{R}^n} f(x) \quad \text{s.t.} \quad g_j(x) \ge 0,\; j=1,...,m,$

where $f,g_j \in \mathbb{R}[x_1,...,x_n]$ and the feasible set $K = \{x: g_j(x) \ge 0\}$ is typically assumed compact (Archimedean). The moment-SOS (sum-of-squares) relaxation hierarchy constructs moment matrices $M_t(y)$ of order $t$ indexed by multi-indices $\alpha,\beta$ , which encode the candidate moments $y_\alpha$ associated to a measure on $K$ ; localizing matrices $M_{t-d_j}(g_j y)$ are similarly built for the constraints.

Dense SOS relaxations enforce PSD constraints on $M_t(y)$ and each $M_{t-d_j}(g_j y)$ , with matrix sizes scaling as $\binom{n+t}{n}$ for $n$ variables and order $t$ . This rapidly becomes intractable for many moderate-to-large scale applications.

2. Term Sparsity: Graph-Based Decomposition and Block-Diagonalization

Term sparsity is detected by constructing the term-support set

$\mathcal{A} = \operatorname{supp}(f) \cup \bigcup_{j=1}^m \operatorname{supp}(g_j),$

where $\operatorname{supp}(p)$ denotes the set of exponent vectors $\alpha$ appearing in polynomial $p$ . We then build a term-adjacency graph $G$ whose vertices are basis monomials $x^\alpha$ , and an undirected edge $\{\alpha,\beta\}$ is present if $x^{\alpha+\beta}$ appears in $f$ , some $g_j$ , or their product: $\{\alpha, \beta\} \in E(G) \iff \alpha + \beta \in \mathcal{A}.$

The sparsity pattern of $G$ encodes interactions between term pairs. Chordal completion and clique decomposition of $G$ allow the moment and localizing matrices to be split into much smaller blocks indexed by maximal cliques. Each block corresponds to a set of monomials which interact according to the true algebraic structure of the POP data.

For example, in (Wang et al., 2019), the TSSOS (Term Sparsity SOS) hierarchy uses a support-extension process followed by block-closure to partition the monomial basis into connected components, each giving rise to a PSD constraint for that component; as the process iterates, the block structure refines and eventually matches that of the dense SOS.

The hierarchy is constructed via an iterative process:

Support extension: At each sparse order $k$ , update the support set by including all $\alpha+\beta$ monomials present in current blocks. For each block, declare an edge between indices $j, j'$ if $w_{1,j} w_{1,j'} g_k$ has nonempty intersection with current support.
Block (clique) closure: Refine each term-sparsity graph by completing connected components to cliques (or by chordal extension).
Sparse block SDP construction: For each block, assemble the restricted moment matrix and impose PSD conditions only on these blocks.

The resulting relaxation has the structure: $\rho_t^{ts} = \inf_{y} L_y(f) \quad \text{s.t.} \quad M_t^{\mathcal{A}}(y) \succeq 0, \;M_{t-d_j}^{\mathcal{A}}(g_j y) \succeq 0,\; j=1,...,m, \; y_0 = 1,$ where the blocks are much smaller than the full matrix (Klep et al., 22 Nov 2025, Wang et al., 2019, Wang et al., 2020).

This process can be extended to a two-level hierarchy: first exploit variable-based correlative sparsity to partition $n$ -dimensional problems into variable cliques, and second, apply term sparsity within each clique to further reduce block sizes, yielding the CS-TSSOS methodology (Wang et al., 2020).

4. Symmetry-Adapted Hierarchy and Extensions

When the POP exhibits finite group invariance (e.g., under permutation, cyclic, dihedral, or symmetric groups), further decomposition can be achieved by passing to a symmetry-adapted basis. The isotypic decomposition of $\mathbb{R}[x]$ splits the polynomial space into $G$ -invariant subspaces $W^{(i)}$ . The associated basis organizes the moment matrix into blocks indexed by irreducible representations and their multiplicities.

Term sparsity can be exploited directly within each symmetry-adapted block: one builds block-specific support sets $\mathcal{A}_t^{(i)}$ , constructs block-specific term-sparsity graphs and cliques, and imposes PSD conditions only on principal submatrices defined by these cliques. The resulting relaxation reads

$\rho_t^{sym, ts} = \inf_y L_y(f) \quad \text{s.t.} \quad \bigoplus_{i=1}^h M_{t,i}^{\mathcal{A}}(y) \succeq 0, \bigoplus_{i=1}^h M_{t-d_j, i}^{\mathcal{A}}(g_j y) \succeq 0, \quad y_0 =1,$

where each block is restricted to its reduced support. This yields dramatic, often problem-size-independent, block-size reduction for highly symmetric instances (e.g., maximal block size $~5$ for 1D Ising quartic with dihedral symmetry even as $n$ grows) (Klep et al., 22 Nov 2025).

5. Convergence, Theoretical Guarantees, and Computational Trade-Offs

For the hierarchy where maximal chordal extension is used at each block closure:

Finite-step recovery: For fixed $t$ , the hierarchy converges after finitely many sparse order steps $s$ , i.e., $\rho_{t,s}^{sym,ts}$ matches the full symmetric SOS bound $\rho_t^{sym}$ at some finite $s$ .
Asymptotic convergence: As $t \to \infty$ and $s$ maximal per $t$ , the sparse symmetric hierarchy satisfies $\lim_{t \to \infty} \rho_t^{sym,ts} = f^*$ , recovering the global optimum.
Monotonicity: For each $d$ , optimal values increase with the sparse order $k$ and are bounded above by the dense SOS bound. Increasing relaxation order $d$ leads to monotone increase towards the global optimum (Klep et al., 22 Nov 2025, Wang et al., 2020, Wang et al., 2019).

Computationally, block sizes decrease from $O(\binom{n+t}{n})$ to $O(\max_i |\mathcal{A}_t^{(i)}|)$ , often yielding orders of magnitude speedup. Applications to symmetric quartic POPs, torus grid quartics, and large contact-rich motion planning demonstrate the scalability and efficacy of the approach (Klep et al., 22 Nov 2025, Kang et al., 5 Feb 2025).

6. Applications and Extensions: Complex, Noncommutative, Dynamical, and Regression Models

Term sparsity-based hierarchies extend beyond real polynomial optimization:

Complex POPs: The complex moment-HSOS hierarchy adapts term sparsity via graphs on Hermitian monomials and supports, with convergence and monotonicity guarantees in the sparse order. Implementations achieve block sizes (e.g., $8$ in complex vs. $21$ in real for random quartics) and run-time reduction of $10\times$ or more (Wang et al., 2021).
Noncommutative POPs: NCTSSOS exploits term sparsity patterns among noncommutative words and products (e.g., $u^*v$ ), yielding small blocks (typically size $6$–$15$) and finite-step convergence to NPA-style relaxations (Wang et al., 2020).
Dynamical Systems: Term sparsity yields block-decomposed relaxations for SDP-based region of attraction or invariant set computation; sign symmetries and causality can be layered on top for further savings and convergence (Wang et al., 2021).
Hierarchical Regression Priors: Bayesian regression with term-level hierarchical sparsity priors (latent scales $\Psi_j^{(l)}$ ) encode heredity (strong/weak) and enable adaptivity, borrowing strength, and model-consistent shrinkage (Griffin et al., 2013).
Neural Interpretable Sparse Models: Recent work (HierarchicalTopK) trains a single sparse autoencoder across a spectrum of sparsity budgets, leveraging a prefix-averaged reconstruction loss to maintain disentangled monosemantic features at all sparsity regimes; practical Pareto-optimality in tradeoff between reconstruction error and number of active terms is empirically observed (Balagansky et al., 30 May 2025).

7. Algorithmic Implementation, Software Ecosystem, and Practical Considerations

Efficient implementation involves:

Basis construction (standard or Newton polytope basis up to degree $t$ ).
Extraction of initial support, iterative sparse-extension and block closure (or chordal extension), extraction of maximal cliques.
Block-assembly of moment and localizing matrices for each clique.
Solve block-wise SDP using commercial solvers (MOSEK).
Empirical observation: only few iterations ( $k=1,2$ ) suffice for equality with dense SOS in most cases.

Major libraries include the Julia package TSSOS for real, complex, and noncommutative polynomial optimization, with options for minimal/maximal chordal completion, block-closure, and model selection via flat extension (Magron et al., 2021, Wang et al., 2019, Wang et al., 2020).

A practical implication is the ability to scale global polynomial optimization (and dynamical system verification) to instances with thousands of variables and constraints previously inaccessible to dense methods, at minimal loss of relaxation tightness.

References

"Exploiting Term Sparsity in Symmetry-Adapted Basis for Polynomial Optimization" (Klep et al., 22 Nov 2025)
"CS-TSSOS: Correlative and term sparsity for large-scale polynomial optimization" (Wang et al., 2020)
"TSSOS: A Moment-SOS hierarchy that exploits term sparsity" (Wang et al., 2019)
"TSSOS: a Julia library to exploit sparsity for large-scale polynomial optimization" (Magron et al., 2021)
"Exploiting Sparsity in Complex Polynomial Optimization" (Wang et al., 2021)
"Exploiting term sparsity in Noncommutative Polynomial Optimization" (Wang et al., 2020)
"A New Sparse SOS Decomposition Algorithm Based on Term Sparsity" (Wang et al., 2018)
"Hierarchical sparsity priors for regression models" (Griffin et al., 2013)
"Train One Sparse Autoencoder Across Multiple Sparsity Budgets to Preserve Interpretability and Accuracy" (Balagansky et al., 30 May 2025)