Papers
Topics
Authors
Recent
2000 character limit reached

Sparse Orthogonal Matrix Factorization

Updated 27 November 2025
  • Sparse, orthogonal matrix factorization is the decomposition of a data matrix into an orthogonal factor and a row-sparse matrix, balancing geometric and interpretable structures.
  • The methodology leverages coupon collector analysis to establish sample complexity thresholds under varying sparsity regimes, providing crucial statistical insights.
  • Algorithms such as Householder reflections, Givens rotations, and hierarchical compression enable efficient, scalable recovery in high-dimensional settings.

Sparse, orthogonal matrix factorization (SOMF) refers to the decomposition of a matrix YY into a product VXV X where VV is an orthogonal matrix and XX is sparse. This paradigm, blending the geometric constraints of orthogonality with the interpretable structure of sparsity, underpins crucial advances in statistics, signal processing, unsupervised learning, and large-scale numerical linear algebra. The existence and tractability of such factorizations, the sample complexity required for recovery, and the design of scalable, provably correct algorithms are determined by a precise interplay between randomness, combinatorial coverage properties, and manifold optimization.

1. Mathematical Formulation and Central Problem

Given YRn×pY\in\mathbb{R}^{n\times p}, sparse, orthogonal matrix factorization seeks VRn×nV\in\mathbb{R}^{n\times n} and XRn×pX\in\mathbb{R}^{n\times p} such that

Y=VX,VV=In,X (row-wise) sparse.Y = V X, \quad V^\top V = I_n, \quad X \text{ (row-wise) sparse}.

This can equivalently be posed as the nonconvex optimization

minVO(n),XRn×pYVXF2,\min_{V\in O(n),\, X\in\mathbb{R}^{n\times p}} \|Y - V X\|_F^2,

where O(n)O(n) is the orthogonal group and F\|\cdot\|_F denotes the Frobenius norm. In the worst case, this problem is NP-hard; tractability emerges only under structured generative models and favorable regimes of matrix sparsity and size (Dash, 21 May 2024).

A canonical sparsity model draws Xij=BijZijX_{ij} = B_{ij}Z_{ij} where BijBernoulli(θ)B_{ij} \sim \text{Bernoulli}(\theta) and ZijZ_{ij} is arbitrary when Bij=1B_{ij} = 1; thus, θ\theta governs the density of nonzeros row-wise and column-wise.

2. Fundamental Limits: Coupon Collector Analysis and Sample Complexity

A distinguishing question is: what minimal number of columns pp is required for successful recovery of both VV and XX, with high probability, under a random sparsity model? This is formalized by the coverage of all rows ("row-coupons")—recovery necessitates that every row sees at least one nonzero across the pp columns. The analysis yields (Dash, 21 May 2024): p=Ω(max{n1(1θ)n,  1θlogn}).p = \Omega \left( \max\left\{ \frac{n}{1-(1-\theta)^n}, \;\frac{1}{\theta}\log n \right\} \right). Here,

  • n1(1θ)n\frac{n}{1-(1-\theta)^n} corresponds to the expected time to collect all row-coupons when each column covers a row with probability 1(1θ)n1-(1-\theta)^n,
  • 1θlogn\frac{1}{\theta}\log n arises in the regime of very sparse XX (θ1\theta \ll 1), corresponding to the classic coupon collector's lower tail bound.

This result delineates three regimes:

  • Dense (θ\theta constant): p=Ω(n)p = \Omega(n),
  • Sparse (θ=O(1/n)\theta = O(1/n)): p=Ω(nlogn)p = \Omega(n \log n),
  • Intermediate: Both terms are comparable when θ(logn)/n\theta \sim (\log n)/n.

These thresholds establish information-theoretic sample complexity barriers that any algorithm, regardless of computational power, cannot cross (Dash, 21 May 2024).

3. Algorithmic Approaches and Structural Exploitation

Householder-structured Orthogonal Factorization

Under the constraint that VV is a (possibly product of) Householder reflection(s), remarkable sample complexity reductions are achieved (Dash et al., 13 May 2024). For V=H=I2uuV = H = I - 2uu^\top (with unit vector uu) and binary XX, the following hold:

  • Exact (combinatorial) recovery: p=Ω(1)p=\Omega(1) columns suffice (in fact, p=2p=2) by brute-force search over 2n2^n binary vectors.
  • Approximate (polynomial-time) recovery: p=Ω(lognt2θ2c2)p = \Omega\big(\frac{\log n}{t^2 \theta^2 c^2}\big) suffices for \ell_\infty-closeness, c=iui0c=\sum_i u_i \neq 0. The procedure is entirely non-iterative: one estimates θ\theta and uu using empirical second moments and signs, then reconstructs XX by inversion (Dash et al., 13 May 2024).

This bypasses the Ω(nlogn)\Omega(n\log n) lower bound required for fully general VV by restricting structure, providing an explicit, closed-form, initialization-free route to SOMF in specialized cases.

Optimization on Manifolds and Sparse PCA Approaches

For unconstrained VO(n)V \in O(n) and 1\ell_1- or 0\ell_0-sparsity, several algorithmic frameworks emerge:

  • Coordinate Descent via Givens Rotations: Sparse PCA and related SOMF tasks can be addressed via coordinate descent directly on the Stiefel manifold using Givens rotation updates; each update preserves orthogonality and affects only two coordinates, with theoretical convergence guarantees for smooth objectives (Shalit et al., 2013, Frerix et al., 2019).
  • Majorization-Minimization on Stiefel Manifolds: Sparse PCA with exact orthogonality constraints is formulated using smoothed 0\ell_0-type penalties, and solved by alternating construction of tight surrogate objectives and closed-form Procrustes updates (Benidis et al., 2016).
  • Stagewise Divide-and-Conquer Regression: High-dimensional sparse factor regression is decomposed into a sequence of co-sparse, unit-rank estimation problems (CURE) with theoretical error bounds. Sequential or parallel pursuit ensures orthogonality of estimated factors, and a contended stagewise path algorithm ensures computational efficiency (Chen et al., 2020).

These methods maintain orthogonality exactly (by design) rather than via approximate post-processing, thus overcoming the degeneracy typically induced by direct thresholding or greedy truncation.

4. Extensions: Nonnegativity, Facility Location, and Structured Constraints

When combined with nonnegativity (as in ONMF/ONMF+), the SOMF problem admits reformulation as a capacity-constrained facility-location problem. In this setting, orthogonality and sparsity correspond to strict assignment and row support constraints, respectively, and are enforced via control-barrier functions (for feasibility) and maximum-entropy principles (for soft-to-hard assignment transitions) (Basiri et al., 2022). Rank selection is accomplished via phase transitions in assignment "hardening".

Additional regularization (e.g., group sparsity, block structures) and multi-matrix/tensor formulations (as in solrCMF) leverage block ADMM with manifold projections, enabling efficient SOMF in data integration and collective factorizations (Held et al., 16 May 2024).

5. Hierarchical and Large-Scale Sparse-Orthogonal Factorizations

Numerical linear algebra applications require scalable sparse-orthogonal factorizations for very large and structured systems. The spaQR family of algorithms utilizes nested dissection ordering and hierarchical blockwise compression:

  • At each separator or interface, a combination of block Householder QR and low-rank approximations yields sparse orthogonal factors and sparse (block) upper-triangular factors (Gnanasekaran et al., 2021, Gnanasekaran et al., 2020).
  • The cumulative work scales as O(MlogN)O(M\log N) (for M×NM\times N matrices arising from, e.g., discretized PDEs), with memory proportional to system size, and error controlled via the compression tolerance.

This approach provides nearly optimal preconditioners for Krylov solvers and highlights the additional computational advantages of designing algorithms that directly leverage and preserve sparse-orthogonal structure at all scales.

6. Practical Implications and Theoretical Insights

Regime/Algorithm Sample Complexity Structural/Algorithmic Property
General OMF, random XX Ω(max{n/(1(1θ)n),logn/θ})\Omega(\max\{n/(1-(1-\theta)^n), \log n/\theta\}) Coupon collector limit (Dash, 21 May 2024)
Householder OMF Exact: O(1)O(1), Approx: O(logn)O(\log n) Fast, non-iterative (Dash et al., 13 May 2024)
Givens/Procrustes/CD pnlognp \sim n\log n (typ.) Manifold structure, global optima not guaranteed (Shalit et al., 2013, Benidis et al., 2016, Frerix et al., 2019)
Hierarchical spaQR O(NlogN)O(N\log N) flops Nested dissection, low-rank compression (Gnanasekaran et al., 2021, Gnanasekaran et al., 2020)

The central insight is that the synergy between sparsity and orthogonality introduces an inherent tension: increased sparsity reduces identifiability, requiring more samples, but careful exploitation of algebraic and geometric structure (e.g., Householder, Givens, blockwise decompositions) reduces algorithmic burden and can attain theoretical minima in specialized cases. The theoretical results and algorithmic innovations delineate clear information-theoretic and structural thresholds, guide algorithm selection, and directly inform the design of scalable solvers for both statistical and numerical applications.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Sparse, Orthogonal Matrix Factorization.