Tangent Subspace Descent: Optimization Unveiled

Updated 13 February 2026

Tangent Subspace Descent is an optimization method that extends steepest and block coordinate descent by iteratively descending along low-dimensional tangent subspaces on manifolds.
It constructs tangent subspaces capturing directions of maximal energy decrease and applies Galerkin or retraction updates to efficiently solve tensor and manifold-based problems.
TSD offers strong convergence guarantees with reduced computational complexity, making it effective for solving SPD systems and manifold-constrained learning tasks.

Tangent Subspace Descent (TSD) is a class of optimization and numerical methods for high-dimensional and constrained problems, characterized by iterative descent along subspaces of the ambient or manifold tangent space. TSD generalizes steepest and block coordinate descent from Euclidean to manifold and tensorial settings, enabling efficient numerical linear algebra and nonlinear optimization in contexts such as tensor network representations, subspace tracking, and constrained matrix manifolds. The central mechanism of TSD is the construction of low-dimensional tangent subspaces capturing directions of maximal energy decrease, followed by Galerkin-type or retraction-based updates within these subspaces.

1. Mathematical Formulation and Principle

Tangent Subspace Descent was originally developed as an extension of steepest descent and block coordinate descent to non-Euclidean domains, specifically to the solution of large-scale symmetric positive-definite (SPD) systems in tensor formats and more generally to optimization over Riemannian manifolds.

Consider the minimization of a smooth objective $f: M \to \mathbb{R}$ over a Riemannian manifold $M$ . At each iteration, TSD constructs one or more tangent subspaces $S_k \subset T_{x}M$ at the current point $x \in M$ , typically determined by projections of the gradient and/or structure of the current iterate. A step is taken by optimizing $f$ within the span of these subspaces, either in sequence (as in block descent) or jointly.

In the Tensor-Train (TT) context for $A x = b$ , the solution $x$ minimizing $E(x) = \frac{1}{2} \langle x, A x \rangle - \langle b, x \rangle$ is sought within the smooth nonlinear manifold $M_r$ of TT-tensors of fixed rank. The tangent space $T_x M_r$ at $x$ splits as a direct sum of “one-core-at-a-time” variations; concretely, any variation $\delta x \in T_x M_r$ decomposes as $\delta x = \sum_{k=1}^d P_{\neq k}(X) \, \delta X^{(k)}$ , where $P_{\neq k}(X)$ map variations in the $k$ -th core to the ambient tensor space (Dolgov et al., 2013).

Similarly, on matrix manifolds such as the Stiefel manifold $St(p,n)$ or the Grassmann and orthogonal groups, TSD uses the projection of the ambient gradient onto the tangent space, with the step computed via retractions or exponential maps (Birtea et al., 2017, Gutman et al., 2019).

2. Core Algorithmic Structures

At the heart of TSD is an alternation between (i) tangent subspace identification and (ii) energy decrease via restricted updates.

Tangent Subspace Construction

Tensor-Train TSD: Given an iterate in TT format, the residual $r = b - Ax$ is compressed to a manageable TT-rank. For each tensor mode $k$ , the $k$ -th core of the TT-residual is used to form a block $V_k = P_{\neq k}(X) \, \mathrm{vec}(Z^{(k)})$ spanning variations in $T_{x}M_r$ . The union of these $d$ blocks yields a matrix $V \in \mathbb{R}^{n \times m}$ defining an enriched descent subspace (Dolgov et al., 2013).
Manifold TSD: For $M$ a matrix manifold, at $X$ the Riemannian gradient is derived by orthogonally projecting the Euclidean gradient, e.g., $P_{T_X}(\nabla F(X)) = \nabla F(X) - X \, \mathrm{Sym}(X^\top \nabla F(X))$ for Stiefel (Birtea et al., 2017); more general TSD selects coordinate or randomized projections $P_k$ of the gradient, respecting particular convergence-enabling conditions (Gutman et al., 2019).

Galerkin or Retraction Step

Galerkin Correction: In the TT case, the new iterate is computed as $x_{\text{new}} = x + V y$ where $y$ solves the small Galerkin system $(V^\top A V) y = V^\top r$ (Dolgov et al., 2013).
Retraction or Exponential Update: On manifolds, the update is $X_{k+1} = R_{X_k}(-\alpha_k P_{T_{X_k}}(\nabla F(X_k)))$ , where $R$ is a retraction (e.g., QR or Cayley-based) ensuring the result stays on the manifold (Birtea et al., 2017, Gutman et al., 2019).

3. Convergence Guarantees

The convergence of TSD is underpinned by geometric contraction bounds and surrogate norm equivalence principles:

SPD/TT Manifold: For SPD problems, the error in $A$ -norm contracts at $\rho = (\kappa - 1)/(\kappa + 1)$ per iteration, with $\kappa = \lambda_{\max}/\lambda_{\min}$ the condition number of $A$ (Dolgov et al., 2013).
General Manifolds: When the sequence of tangent subspaces satisfies a gap-ensuring or randomized-norm condition, one attains global convergence to stationarity (under $L_f$ -smoothness) and $O(1/t)$ rates for geodesically convex $f$ (Gutman et al., 2019). For block MM methods on Stiefel/Grassmann, monotonic decrease and convergence to stationary points are guaranteed (Blocker et al., 2023).

Small perturbations due to rounding or inexact projections remain contractive provided local truncation tolerances are controlled ( $O(1/\kappa)$ threshold for TT TSD).

4. Computational Complexity

TSD achieves favorable computational scaling by restricting major steps to low-dimensional subspaces.

Tensor-Train TSD: Each iteration scales as $O(d n r^2 (r_A + \rho))$ with $d$ the tensor order, $n$ the mode size, $r$ the TT-rank, $r_A$ the TT-rank of $A$ , and residual compression rank $\rho$ . The key advantage is linear scaling in both $d$ and $n$ for constant ranks (Dolgov et al., 2013).
Matrix/Manifold TSD: For Stiefel and orthogonal manifold cases, projections, gradient computation, and retractions cost $O(n p^2)$ – $O(n^3)$ per step or cycle, with randomized variants costing $O(n)$ per inner step (Gutman et al., 2019, Birtea et al., 2017).

In all cases, the low-dimensional restriction of the Galerkin or retraction step ensures scalability to high dimension.

5. Subspace Selection Strategies and Theoretical Foundations

Ensuring sufficient coverage of the descent directions at each step is critical for TSD convergence.

Gap-Ensuring Deterministic Rules: On orthogonal group manifolds, partitioning the tangent basis into blocks and updating each block per cycle yields strong norm-equivalence properties, formalized by gap parameters $(\gamma, r)$ (Gutman et al., 2019).
Randomized Subspace Selection: On Stiefel manifolds, sampling among tangent directions with positive probabilities guarantees that, in expectation, important directions are not missed; associated constants $C$ control the convergence rate (Gutman et al., 2019).
Explicit Stiefel TSD: The dependence of the local update on the choice of a full-rank submatrix (local coordinate system) connects the scheme to classical coordinate descent and Givens-rotation-based methods (Birtea et al., 2017).

TSD generalizes and, in various settings, improves upon standard alternating and coordinate-type algorithms:

Versus ALS/DMRG in TT Tensor Algorithms: Alternating Least Squares (ALS) and Density Matrix Renormalization Group (DMRG) are classic high-dimensional solvers. ALS fixes all but one core and solves a large local system; DMRG merges cores followed by adaptive SVD splitting. TSD instead constructs a global enriched tangent subspace and solves one reduced problem per iteration, providing geometric convergence and automatic rank adaptivity via TT-rounding (Dolgov et al., 2013).
Versus Riemannian Gradient Descent: In orthogonal Procrustes problems, Riemannian gradient descent incurs high per-cycle costs due to global updates, while TSD's block or coordinate-wise steps achieve rapid progress per unit computation, as demonstrated by the number of cycles/iterations to reach a given suboptimality (Gutman et al., 2019).
Block MM on Grassmannian: Methods like Geodesic Subspace Estimation (GSE) use tangent-space updates (Riemannian gradient steps) interleaved with problem-structure-specific block updates (e.g., MM steps for geodesic parameters) (Blocker et al., 2023).

7. Applications and Empirical Results

TSD has a wide range of applications in high-dimensional numerical linear algebra, statistical signal processing, and manifold-constrained learning.

Tensor Linear System Solvers: For $A x = b$ with TT-structured $A$ and $b$ , TSD delivers scalable, rank-adaptive solutions with geometric convergence and practical iteration counts matching or outperforming DMRG (Dolgov et al., 2013).
Orthogonal and Stiefel Manifold Optimization: Problems such as the orthogonal Procrustes and matrix regression benefit from TSD's efficient subspace steps and theoretical global convergence (Gutman et al., 2019, Birtea et al., 2017).
Dynamic Subspace Tracking: In time-varying principal component and subspace tracking, geodesic models with tangent-space-based block MM updates exceed traditional SVD and online trackers in noise-robustness and sample efficiency, especially for geodesically structured data (Blocker et al., 2023).

A practical implication is that TSD enables numerically stable and scalable optimization whenever the computational bottleneck is the dimension of the search space and the ambient structure admits efficient tangent projections and subspace solves.

Cited works:

"Alternating minimal energy methods for linear systems in higher dimensions. Part I: SPD systems" (Dolgov et al., 2013)
"Coordinate Descent Without Coordinates: Tangent Subspace Descent on Riemannian Manifolds" (Gutman et al., 2019)
"Steepest descent algorithm on orthogonal Stiefel manifolds" (Birtea et al., 2017)
"Dynamic Subspace Estimation with Grassmannian Geodesics" (Blocker et al., 2023)