Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 99 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 36 tok/s
GPT-5 High 40 tok/s Pro
GPT-4o 99 tok/s
GPT OSS 120B 461 tok/s Pro
Kimi K2 191 tok/s Pro
2000 character limit reached

Low-Rank Tensor Decompositions

Updated 22 August 2025
  • Low-rank tensor decompositions are techniques that approximate high-dimensional data arrays by exploiting multilinear structures to significantly reduce parameters.
  • Methods like Alternating Maximization and Newton's approach offer practical trade-offs between convergence speed and computational cost in achieving quality approximations.
  • CUR-based and sampling strategies provide scalable solutions for massive tensors, balancing computational efficiency with approximation accuracy.

Low-rank tensor decompositions are foundational tools for representing high-dimensional data arrays in condensed formats by exploiting linear or multilinear structure. They are central in data compression, dimension reduction, signal and image processing, genomics, and many areas of scientific computing and machine learning. Unlike the matrix (order-2 tensor) case, where the singular value decomposition provides an optimal low-rank approximation, higher-order (order-d>2d>2) tensors require more nuanced methods with distinct computational, theoretical, and practical challenges.

1. Fundamentals of Low-Rank Tensor Approximation

The prototypical goal is to approximate a given dd-mode tensor TRn1×n2××ndT \in \mathbb{R}^{n_1 \times n_2 \times \dots \times n_d} by a tensor with far fewer parameters. In the matrix case (d=2d=2), the best rank-kk approximation is efficiently solvable via the singular value decomposition (SVD): A=i=1kσiuivi,A = \sum_{i=1}^{k} \sigma_i u_i v_i^\top, where the σi\sigma_i are singular values, and uiu_i, viv_i are the corresponding singular vectors.

However, for d>2d>2 there is no canonical higher-order SVD. Instead, a common approach is the best \emph{multilinear rank} (r1,...,rd)(r_1, ..., r_d) approximation: find subspaces UiRniU_i \subset \mathbb{R}^{n_i}, dimUi=ri\dim U_i = r_i, and maximize the 2\ell_2 norm of the projection of TT onto the tensor product subspace i=1dUi\otimes_{i=1}^d U_i. The problem can be formulated as

minUiTPU1...Ud(T)2=T2maxUiPU1...Ud(T)2,\min_{U_i} \| T - P_{U_1 \otimes ... \otimes U_d}(T)\|^2 = \|T\|^2 - \max_{U_i} \| P_{U_1 \otimes ... \otimes U_d}(T)\|^2,

where the best approximation is the projection operator norm maximization (formula (1) in (Friedland et al., 2014)).

2. Key Algorithms: Alternating Maximization and the Newton Method

Alternating Maximization Method (AMM)

The dominant practical method is Alternating Maximization (AMM), in which all but one subspace UiU_i are fixed and the optimal UiU_i is updated, then the roles are rotated cyclically. Formally, the UiU_i update step involves solving an eigenvalue problem: Ai=j=1r, i[T×(iuj,)][T×(iuj,)]T,A_i = \sum_{\mathbf{j}_\ell=1}^{r_\ell,\ \ell\ne i} [T \times (\otimes_{\ell \ne i} u_{j_\ell, \ell})] [T \times (\otimes_{\ell \ne i} u_{j_\ell, \ell})]^T, with the leading rir_i eigenvectors forming the updated UiU_i (see formula (2) in (Friedland et al., 2014)).

AMM iterates the map F:ΨΨF: \Psi \to \Psi on the product of Grassmannians: F(U1,...,Ud)=(F1(U),...,Fd(U)),F(U_1, ..., U_d) = (F_1(U), ..., F_d(U)), where FiF_i is the maximizer given all other subspaces. This process converges (usually to a stationary point, possibly only locally optimal due to nonconvexity) (see (Friedland et al., 2014)).

Newton Method for Fixed Points

The Newton method is proposed to accelerate local convergence once near a fixed point of FF. In a local Euclidean parameterization (coordinate chart), the iteration takes the form: x()=x(1)[IDF(x(1))]1(x(1)F(x(1))),x^{(\ell)} = x^{(\ell-1)} - [I - DF(x^{(\ell-1)})]^{-1} (x^{(\ell-1)} - F(x^{(\ell-1)})), (formula (5) in (Friedland et al., 2014)) where DFDF is the derivative of FF. For rank-1 cases or highly structured subsets (such as on products of spheres), explicit fixed point equations and Jacobians can be written.

The Newton method requires computation of the Jacobian and inversion of a linear system whose dimensionality is L=i=1d(niri)riL = \sum_{i=1}^d (n_i - r_i)r_i, resulting in higher per-iteration cost compared to AMM. However, the number of iterations is often significantly reduced.

3. CUR Approximations and Sampling-Based Strategies

For large matrices and higher-order tensors, direct SVD or AMM may be computationally prohibitive. The CUR decomposition, originally from the matrix literature, approximates a matrix by selecting a small number of rows and columns and reconstructs using the intersection submatrix. This concept generalizes to tensors by matricizing (unfolding) the tensor in one or more modes, selecting important slices or fibers, and computing the pseudo-inverse on these small submatrices (see formula (11) in (Friedland et al., 2014)).

Variants for tensors involve selecting index sets for each mode, forming the subtensor (core), and reconstructing via projections. CUR-approximations become particularly valuable when only partial access to the data is possible or for extremely large datasets.

4. Numerical Comparison and Performance

Empirical results demonstrate:

  • AMM and its variants (such as MAMM and 2AMM/2AMMV) have relatively low computational cost per iteration but may converge slowly, with risk of stalling at local minima.
  • Newton methods require more resources per step due to Jacobian evaluation and inversion but often converge in fewer iterations, especially in the vicinity of stationary points.
  • For small target multilinear ranks (e.g., rank-1 or (2,2,2)(2,2,2)), Newton-type methods are highly competitive.
  • For large tensors and moderate ranks, alternating methods—specifically, the 2AMMV modification—can outperform Newton methods, particularly when exploiting parallelism.
  • Compared to Grassmann-manifold-based Newton methods (see work by Savas–Lim and [ES09]), the coordinate Newton approach is easier to implement for arbitrary tensor order and remains highly parallelizable, albeit sometimes marginally slower per Newton step.

5. Implementation Aspects, Complexity, and Limitations

The main computational bottlenecks arise from:

  • The eigenvalue problem in updating subspaces (O(iri)O(\prod_i r_i) per update).
  • Jacobian assembly and inversion for the Newton method (with cost scaling with L3L^3).
  • Tensor contractions required to compute projections and matrix multiplications in each iteration.

Alternating methods are suitable for large-scale problems where per-iteration cost must be minimized. Newton-based methods should be reserved for accelerating convergence when close to a fixed point or when high accuracy is required in moderate-sized problems.

CUR decomposition and sampling-based methods are advantageous when working with tensors that cannot be stored or accessed in their entirety, since they require only a subset of tensor entries and computations. Their effectiveness depends on the quality of the selected samples, with error controlled via the choice of submatrices/fibers.

6. Summary Table: Method and Properties

Method Convergence Speed Per-Iteration Cost Parallelizability Suitability
AMM/MAMM Slow-moderate Low High Very large tensors, quick first pass
Newton-1/2 Fast (locally) Moderate-high High Moderate/small tensors, high accuracy regions
CUR-based Very fast Low High Massive tensors, limited access/data compression

7. Conclusions and Research Directions

Low-rank tensor decompositions generalize ideas from matrix theory, but introduce unique algorithmic and theoretical questions. This synthesis, as in (Friedland et al., 2014), illustrates several key points:

  • The best multilinear rank approximation is the central principle for higher-order tensor compression.
  • AMM is a simple, flexible, and general-purpose tool, suitable for large-scale problems where moderate local optimality suffices.
  • Newton-based fixed-point methods offer significantly improved local convergence for higher accuracy requirements, but with nontrivial computational overhead.
  • CUR-type and sampling-based decompositions are vital for large-scale applications, offering practical tradeoffs, albeit with slightly worse approximation guarantees compared to SVD-based methods.
  • Further advances involve robust stopping criteria, improved initialization, and hybrids combining alternating and Newton steps, as well as methods tailored for sparse, incomplete, or streaming tensor data.

The landscape described is shaped by intrinsic geometric properties (product Grassmannians), practical computational tradeoffs, and the increasing demand for efficient decompositions in scientific computing, data analytics, and beyond (Friedland et al., 2014).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)